Help understanding cvs logs using cvs2cl.pl

Discussion:

Shivani Rao

2012-02-27 19:09:41 UTC

My intention is to track ASPECTJ's software development as documented
by the cvs log. I wish to be able to check out the earliest revision
(1.1) and then follow changes along with the "tags" and "bugs" as
specified in the comments.

I found a great tool cvs2cl.pl that helped me get the change logs from
the cvs log command. However I found the following inconsistencies and
I was wondering if anybody could help me out. cvs2cl and cvs log gives
information in chronologically reversed order. So when I carry out the
following command

cvs2cl --chrono -T

I expected changes in order (2002 to 2011) with tag information
whenever the versions were tagged. But What I got was strange... All
tags were given in 2002... which I thought was strange. When I use

cvs2cl --T

I get tag information in reverse order (2011 and 2002) and the tags
appear in a more decent order.

For example,
< 2011-11-22 14:31 tag preJava7Merge
< 2011-10-03 19:54 tag V1_6_12
< 2011-10-03 19:54 tag V1_6_12RC1
< 2011-08-18 14:36 tag V1_6_12M2
< 2011-06-07 15:20 tag V1_6_12M1
< 2011-03-15 11:46 tag V1_6_11
...

a) How do tags really work? I have read tons of tutorials, but still
not clear. Does a symbolic name for a file stay after it is tagged?
b) If I want to track software evolution from one "tag" till "another"
can I use the order found out as stated above? Is the latest tag given
to a file, the truest way to find the latest revision (tag) it belongs
to?
c) Why is it that I se aspectj e the tags at a different date when cvs2cl uses
the chrono option?

Thanks for your help and inputs

--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao

Michael Haggerty

2012-02-28 08:13:20 UTC

Permalink

Post by Shivani Rao
[...]
a) How do tags really work? I have read tons of tutorials, but still
not clear. Does a symbolic name for a file stay after it is tagged?

A file tag is just a symbol that refers to a specific revision of the
file. If a whole repository was tagged at once (this is not necessarily
the case), then the collection of all files at their respective versions
can be treated as a single repository-wide tag.

Post by Shivani Rao
b) If I want to track software evolution from one "tag" till "another"
can I use the order found out as stated above? Is the latest tag given
to a file, the truest way to find the latest revision (tag) it belongs
to?
c) Why is it that I se aspectj e the tags at a different date when cvs2cl uses
the chrono option?

There is no metadata stored with a CVS tag. CVS does not record when
the tag was created or by whom. Therefore the only way for a tool to
associate dates with tags is by using heuristics based on the dates of
the revisions that were tagged and of other nearby revisions. Obviously
cvs2cl's heuristics do not give consistent results when run with
different options.

For that matter, even inferring repository-wide commits from the
file-by-file data stored by CVS is nontrivial and tools vary widely in
their ability to do this well.

cvs2svn [1], a tool for converting CVS repositories to
Subversion/git/bzr/hg, does a very careful job of inferring commits and
tag/branch dates from the CVS history. If you have access to the CVS
repository for aspectj, you might consider using cvs2svn to convert the
repository to git, then using git's excellent history-viewing tools to
analyze the project history. Alternatively you could use Subversion as
your vehicle, but IMHO Subversion's graphical tools are not as good as
git's. By this I do not mean that the project should change version
control systems [2]; I just mean that you can use a temporary copy in
another VCS for inspection.

[Disclaimer: I am the cvs2svn maintainer.]

Michael

[1] http://cvs2svn.tigris.org
[2] ...though they probably should :-)

--
Michael Haggerty
***@alum.mit.edu
http://softwareswirl.blogspot.com/

Arthur Barrett

2012-03-02 12:20:43 UTC

Permalink

Michael & Shivani,

Post by Michael Haggerty
There is no metadata stored with a CVS tag. CVS does not record when
the tag was created or by whom.

Incorrect, this meta data is stored in the CVSROOT/history. History has
problems I'll admit, but fixed in CVSNT 2.x and partly addressed in the
latest CVS 1.x

Post by Michael Haggerty
repository for aspectj, you might consider using cvs2svn to
convert the repository to git,

Or IBM's tools to convert your repository to ClearCase - it's widely
considered to be the best SCCM tool available (Gartner, Forrester, etc).

Or you could pick up a copy of one of the many CVS books and learn how
to track and manage project change using CVS. Yes tags are not the best
way to track change - they were not designed to be the be all and end
all of change management.

I think you are using bug id's - which is great - but CVS 1.x (nor SVN
nor Git) has native supoprt for user defined changesets (you guessed it,
ClearCase and CVSNT 2.x both do). But if you are recording that
information, it's in the log (or changeset) and you now just need to
extract it in some meaningful reports so you can see what bugs were
fixed in which releases. You can write that yourself, or use tools that
someone else has written (eg: CVSNT 2.x).

Disclaimer: I'm involved in the CVSNT project, CVSNT is a fork of CVS
1.x - it's not a rewrite, but it adds features that the developers of
CVS 1.x considered were not suitable for the CVS 1.x code.

Regards,

Arthur Barrett

Michael Haggerty

2012-03-02 13:17:30 UTC

Permalink

If I understand correctly, Shivani wanted help investigating the history
of a project, not managing the project in the future. Since he
mentioned cvs2cl, I assume that as part of his analysis he wanted to
deduce (cross-file) changesets from the CVS history. I suggested that
he use cvs2git to collate the changes into changesets to facilitate his
analysis. I explicitly stated that I was *not* trying to convince him
to switch to using git for future development.

Post by Arthur Barrett

Post by Michael Haggerty
There is no metadata stored with a CVS tag. CVS does not record when
the tag was created or by whom.

Incorrect, this meta data is stored in the CVSROOT/history. History has
problems I'll admit, but fixed in CVSNT 2.x and partly addressed in the
latest CVS 1.x

Post by Michael Haggerty
repository for aspectj, you might consider using cvs2svn to
convert the repository to git,

Or IBM's tools to convert your repository to ClearCase - it's widely
considered to be the best SCCM tool available (Gartner, Forrester, etc).

I don't understand this comment. How would converting to ClearCase help
him analyze his repository's history?

Post by Arthur Barrett
Or you could pick up a copy of one of the many CVS books and learn how
to track and manage project change using CVS. Yes tags are not the best
way to track change - they were not designed to be the be all and end
all of change management.

This sounds like a suggestion that the project change its workflow in
the future. Again, I don't see how this will help analyze the
repository's existing history.

Post by Arthur Barrett
I think you are using bug id's - which is great - but CVS 1.x (nor SVN
nor Git) has native supoprt for user defined changesets (you guessed it,
ClearCase and CVSNT 2.x both do).

I'm curious what "user defined changesets" are. Can you point me to
some docs?

Michael

--
Michael Haggerty
***@alum.mit.edu
http://softwareswirl.blogspot.com/

Arthur Barrett

2012-03-03 12:47:07 UTC

Permalink

Michael,

Post by Michael Haggerty
I'm curious what "user defined changesets" are. Can you point me to
some docs?

http://march-hare.com/cvsnt/features/changesets/

And that sums up the different approach you and I took. You assume
Shivani is interested in commits, I assume that that information is
useless for analysis of history or for future project management, what's
actually useful are the user defined change sets. Shivani specifically
mentioned these (bug numbers).

CVSNT has had user defined change sets since 2004, and SVN has been
talking about adding them for about the same length of time. User
defined changesets are most useful when combined with 'reserved
versioning' (not locking, reserving) which SVN also doesn't support -
but user defined change sets are useful anyway.

Our own implementations of change management for customers are largely
based on the research of Susan Dart whilst with the Configuration
Management Institute - in summary: change management is only effective
if it can ensure the integrity of all managed items at each development
stage and make the interrelationships clear.

These relationships are the business relationships: ie: this feature
request (document) led to this study (excel sheet) led to this
functional spec (doc) which led to these java changes, these table
changes and these test requirement changes (doc and script), the tables
and java changes were a part of release 1234, 1235, 1236, and 1237, and
release 1237 is the one that we promoted to UAT and was sent out on the
CD number 987654-321.

Now a month later when a developer is looking at some code and wondering
what it's all about, they can 'see' this is part of a larger changeset -
and click and see all the related components. Project managers can
promote (or branch/merge) using the changeset, and auditors can audit
the changeset.

An atomic changeset is what cvs2cl gathers - the things that were all
committed together - CVSNT identifies these with a unique atomic commit
id - SVN identifies them with a unique version number. Neither are
helpful for relating multiple changes over multiple commits by mulitple
people over time.

*rant on*

I've noticed a personally disturbing trend for companies to underfund
open source software defvelopment and keep these critical features in
'wrapper' systems: collabnet put user defined changesets in their
collabnet tool not in SVN core. Atlassian actively court CVS customers
promising to deliver this feature by 'wrapping' CVS around Jira. All
that'd be fine commerce - it it worked better that way - but your
repository is the place this information belongs (integrated with your
defect tracker, sure). That's my opinion anyway.

It makes sense for commercial software vendors to put this in the
proprietary code, not in the open code, because it's what people like
the CMI say is the most important data. So if that data is in your
proprietary tool, then you'll switch versioning engines (CVS, SVN, Git)
but always use their closed tools to wrap them. It's simple vendor lock
in.

The CVSNT project gets a lot of flak from some quarters for 'extending'
the 'extendable' RCS format to include this information. We could have
done a collabnet or an atlassian and stored this in a proprietary
format, but we didn't - we extended RCS/CVS in open source code. We
didn't introduce any proprietary code at all until we were 'forced to'
in 2010 (6 years after introducing user defined change sets in pure open
source code) and it is entirley non-business code, leaving all this
stuff that is useful to your business open source.

*rant off*

Regards,

Arthur

Shivani Rao

2012-03-03 21:12:19 UTC

Permalink

I want to track both, software evolution and the bugs
i do not think that all changes are made for a particular bug, sometimes a
change is made for feature addition too.

So I wanted to track both.

On Sat, Mar 3, 2012 at 7:47 AM, Arthur Barrett <

Post by Arthur Barrett
Michael,

Post by Michael Haggerty
I'm curious what "user defined changesets" are. Can you point me to
some docs?

http://march-hare.com/cvsnt/features/changesets/
And that sums up the different approach you and I took. You assume
Shivani is interested in commits, I assume that that information is
useless for analysis of history or for future project management, what's
actually useful are the user defined change sets. Shivani specifically
mentioned these (bug numbers).
CVSNT has had user defined change sets since 2004, and SVN has been
talking about adding them for about the same length of time. User
defined changesets are most useful when combined with 'reserved
versioning' (not locking, reserving) which SVN also doesn't support -
but user defined change sets are useful anyway.
Our own implementations of change management for customers are largely
based on the research of Susan Dart whilst with the Configuration
Management Institute - in summary: change management is only effective
if it can ensure the integrity of all managed items at each development
stage and make the interrelationships clear.
These relationships are the business relationships: ie: this feature
request (document) led to this study (excel sheet) led to this
functional spec (doc) which led to these java changes, these table
changes and these test requirement changes (doc and script), the tables
and java changes were a part of release 1234, 1235, 1236, and 1237, and
release 1237 is the one that we promoted to UAT and was sent out on the
CD number 987654-321.
Now a month later when a developer is looking at some code and wondering
what it's all about, they can 'see' this is part of a larger changeset -
and click and see all the related components. Project managers can
promote (or branch/merge) using the changeset, and auditors can audit
the changeset.
An atomic changeset is what cvs2cl gathers - the things that were all
committed together - CVSNT identifies these with a unique atomic commit
id - SVN identifies them with a unique version number. Neither are
helpful for relating multiple changes over multiple commits by mulitple
people over time.
*rant on*
I've noticed a personally disturbing trend for companies to underfund
open source software defvelopment and keep these critical features in
'wrapper' systems: collabnet put user defined changesets in their
collabnet tool not in SVN core. Atlassian actively court CVS customers
promising to deliver this feature by 'wrapping' CVS around Jira. All
that'd be fine commerce - it it worked better that way - but your
repository is the place this information belongs (integrated with your
defect tracker, sure). That's my opinion anyway.
It makes sense for commercial software vendors to put this in the
proprietary code, not in the open code, because it's what people like
the CMI say is the most important data. So if that data is in your
proprietary tool, then you'll switch versioning engines (CVS, SVN, Git)
but always use their closed tools to wrap them. It's simple vendor lock
in.
The CVSNT project gets a lot of flak from some quarters for 'extending'
the 'extendable' RCS format to include this information. We could have
done a collabnet or an atlassian and stored this in a proprietary
format, but we didn't - we extended RCS/CVS in open source code. We
didn't introduce any proprietary code at all until we were 'forced to'
in 2010 (6 years after introducing user defined change sets in pure open
source code) and it is entirley non-business code, leaving all this
stuff that is useful to your business open source.
*rant off*
Regards,
Arthur

--
Research Scholar,
School of Electrical and Computer Engineering
Purdue University
West Lafayette IN
web.ics.purdue.edu/~sgrao

Arthur Barrett

2012-03-04 06:37:11 UTC

Permalink

Shivani,

Generally a feature is just a bug with the attribute 'enahncement', but
when dealing with historical data you have to go with the hand you are
dealt.

Regards,

Arthur

-----Original Message-----
From: Shivani Rao [mailto:***@gmail.com]
Sent: Sunday, 4 March 2012 8:12 AM
To: Arthur Barrett
Cc: Michael Haggerty; info-***@nongnu.org
Subject: Re: Help understanding cvs logs using cvs2cl.pl

I want to track both, software evolution and the bugs
i do not think that all changes are made for a particular bug,
sometimes a change is made for feature addition too.

So I wanted to track both.