The best approach for moving files and retaining history

Paul Sander

2009-07-01 06:12:34 UTC

Post by Rez P
Ultimately, I would like to know if my approach is good or bad.

The two techniques are neither good nor bad just different each with
it's own advantages and disadvantages.
Alternatively if you are using CVSNT (on unix/linux/win/mac) you can use
the rename command.

Curiosity has struck me...
Plagiarizing some from Rez P
copy the files to a newly created folder, add, and commit, with a
cvs remove the originals and commit.
go to the server side and copy all the 'v' files, to the new folder,
then cvs remove originals and commit.
Method A makes it look like the files, in the new directory, have no history.
if comments are done properly cvs2cl shows the change over.
Method B has all the history, but can be a bit confusing if you
checkout an old tag.
Likely there will not be a comment in the new files to show that
they moved, so cvs2cl will be less than useful at detecting it.
what is the behavior of the CVSNT rename command?
what does it's repository markings look like in cvs2cl? (cvs2cl.pl or cvs2cl.py)
If I were trying to mimic it by hand what would it most look like?

There's a bit more to the problem, because you don't necessarily want
a rename to take effect on every branch of the file. And then there's
the problem of replicating the rename on some (but possibly not all)
branches of the file. And then there the problem where you might want
the result of a rename to originate from different locations in the
repository depending on the branch. These are very, very sticky
problems.

If you go back to early discussions on this list, Brian Berliner
recommended method B, then deleting version tags. But that predates
branches. You should consider deleting the uninteresting branch tags,
and possibly even the data for the versions on the unwanted branches.
But then subsequent renames on different branches of the same file may
add some of them back.

But this still is an incomplete solution because there may be a need
to merge from branches that are used exclusively in the pre-move
organization to the branch where the rename occurs and its newer
children.

I believe that renaming or copying RCS files ultimately is not the way
to implement renaming a file. I believe that a versioned mapping of
files in the workspace to RCS files in the repository is necessary.
This mapping is updated whenever an added, removed, or renamed file is
committed, and the current version of the mapping is somehow
represented in the user's workspace whenever it's updated. This
method has its drawbacks, particularly the evil twin condition in
which two essentially different files occupy the same place in the
sandbox filesystem at different times. Another big one is that its
implementation requires a redesign of CVS at a fundamental level. But
it also enables possible solutions for other problems in a context
where renaming is done, problems that relate to merging between
branches where the file on each branch contains a different type of
data.

Others in this forum have recommended creating new RCS files and using
threaded data structures within RCS files, using specialized comments
or perhaps other metadata stored in RCS newphrase phrases, to connect
the fragments of histories to simulate renames. Features such as "cvs
log" would be revised to understand the threaded structures and
present the history appropriately, and some other operations such as
merging would have to traverse the links to locate the proper tags.
This appears to be simple to implement at the outset but in practice
the number of special cases is daunting and even then there are
peculiar side-effects or limitations.

Bottom line: Don't expect a complete solution until CVS is redesigned
from the bottom up. The various manuals suggest several methods
besides the two mentioned in this thread, all of which solve some
subset of problems. You might find one that you can live with.

All that said, here is my recommendation using what's available today,
for renaming a single file:

1. Identify all branches that will survive the rename to the new
location, and get their owners to agree on a cutover time.
2. Audit all sandboxes for uncommitted changes to branches that will
survive the rename, and commit them.
3. Apply a pre-rename version tag to the top of all branches that
survive the rename.
4. Copy (or hard-link) the RCS file in the new location, which might
be the Attic.
5. Move the original RCS file to the Attic, if it's not already there.
6. For all branches surviving the rename, do the following in the
original RCS file: Mark them "dead", remove applicable floating tags.
7. For all other branches (old maintenance or frozen branches), do
the following in the new RCS file: Mark them "dead", remove
applicable floating tags.

This method has the following problems:

- Branch owners must agree on a cutover time.
- The sandbox audit is a big pain.
- Dead branches can be revived in the wrong locations.
- Undoing it is a pain.
- Reversing it is a royal pain, and it leaves side-effects.
- It doesn't support renaming file foo to bar on branch A, and file
baz to bar on branch B, at least not without much care and not without
introducing annoying side-effects.
- You lose the completeness of version history in both locations if
development continues on the old maintenance branches.
- Merging from maintenance branches to new development requires
computing and applying patches.
- Renaming the file repeatedly increases cruft and fragmentation.
- It's very time-consuming when done to every file in a large tree
(i.e. when renaming a directory).
- It's not represented well in "cvs history".
- Operations involving tags and timestamps are best avoided while the
rename procedure is running.
- Non-floating version tags remain on both branches after the rename.
You want them for diff, merge, and log but not for checkout and update.

Using this method, it's possible to migrate branches individually, but
extra care is required to verify that two existing RCS files are
really related in this way. Also, individual versions must be copied,
with all of their metadata, from one RCS file to the other on
applicable branches. This is a major pain.

Good luck!