This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

April 03, 2005

Conversion to Subversion: Tags Revisited

In Conversion to Subversion: Tags, I explained how I cleaned up the tags portion of the CVS dump file in order to generate a subversion repository in the format I wanted. Lars Mentrup emailed to say that I had not been clear enough on a few points. After re-reading what I wrote, I have to agree.

In the original article, I referred to working on the initial copy from trunk. This was obviously missing a subtle point. All of my changes started with the dumpfile that had been filtered to contain the project and tag entries. Given this filtered file, I was talking about the initial copy of the tags. This is the equivalent of

  cp 'trunk' 'tags/FIRST_RELEASE'

Now, I really did not want the entire tree copied to the tags directory, and I wanted the tags directory structured differently. So, I changed the equivalent of the above command to the equivalent of

  cp 'project1/trunk' 'project1/tags/FIRST_RELEASE'

Now the dumpfile has a large number of extraneous delete commands that are the equivalent of

   rm tags/FIRST_RELEASE/project2
   rm tags/FIRST_RELEASE/project3
   rm tags/FIRST_RELEASE/project4
   ...

Since I changed the initial copy, these are no longer needed. So, I just delete them from the dumpfile.

The result of all of these changes is a single dumpfile with the 'project1' project information and the 'FIRST_RELEASE' tag information. I can use svnadmin load to load the project and its tags into the repository.

Hopefully, this will clarify what I meant. Thanks again to Lars for taking the time to let me know where I was unclear.

Posted by GWade at 08:30 PM. Email comments | Comments (0)

February 13, 2005

Conversion to Subversion: Tags

In the first article of this series, Conversion to Subversion, Part I, I described the problem I found in trying to convert a project from my CVS repository to Subversion. In my last article, Conversion to Subversion: The Project's Trunk, I described the solution that I used to convert a basic project with no tags or branches. This time I'll discuss the converting the tags on a project from CVS to Subversion.

As stated before, the tag directory structure generated by cvs2svn was not what I wanted. Assuming a CVS module of project1 and a tag of FIRST_RELEASE, the dump file would have a directory structure of tags/FIRST_RELEASE/project1. I wanted project1/tags/FIRST_RELEASE.

At first, I thought I could use the same approach that I had used for the main project. I would


  1. Filter the dump file to keep only the project and tags I care about.

  2. Copy the project and the tags over.

  3. Update the project path as before.

  4. Change the tag directory from tags/FIRST_RELEASE/project1 to project1/tags/FIRST_RELEASE.

Unfortunately, searching through the dump file did not turn up a tags/FIRST_RELEASE/project1. So I began looking at the dump file a little harder. The result was a little confusing. Apparently, cvs2svn treated each tag as if the entire repository had been tagged (everything in trunk was copied to tags/FIRST_RELEASE). Then, everything except the project1 directory was deleted. This generated a large number of extraneous revisions that do not accurately reflect what happened in the repository. The end result would have been correct in the repository with the old directory structure; but it wouldn't work with the new structure.

I modified the initial copy from trunk to copy from project1/trunk to project1/tags/FIRST_RELEASE in the dump file. Then, I deleted all of the extraneous delete directory commands in the dump file.

The new modified dump file would build the project with the tags I required. Just as importantly, the extraneous manipulation used to clean up the initial strange tagging request have been removed. This also solves the problem that would have been caused by attempting to change directories that had been filtered out of the dump.

I incorporated this change into my script that fixes up the dump file before I do the load. It seems to be working quite well. The new projects I've added with these changes appear to be intact with the appropriate tags in place. If I had any branches I wanted to keep, I could apply an equivalent approach to fix up the branches before loading.

Update

This entry has been updated a bit in the entry Conversion to Subversion: Tags Revisited to answer questions I've received by email.

Posted by GWade at 09:45 PM. Email comments | Comments (0)

January 29, 2005

Conversion to Subversion: The Project's Trunk

In Conversion to Subversion, Part I, I described the problems I found when I began converting my CVS repository to Subversion. In this article, I describe the work and surprises that came from the first project migration.

My first idea was to build the repository from the dump file and then fix the result using moves inside the repository. Unfortunately, that would have left the previous history in the wrong places in the hierarchy. Although this would not have prevented me from doing further development, looking at previous versions would be messier than I'd like.

So, obviously, I needed a was to make the repository right when projects were added for the first time. The section in Practical Subversion on importing from other systems suggested that the format was relatively easy to modify. Reviewing the relevant sections of Version Control with Subversion confirmed this information. If I could make the required changes to the dump file, then I could create a repository laid out the way I wanted it.

The First Project Migration

The first step in this process was to dump a particular project with its related tags and branches. Examining the projects I wanted to move gave me one that had neither tags nor branches. This would be about as simple a case as I could start with. To hide irrelevant details, let's call this project smallproject.

As I said in the previous article, the path for this project would have the form: trunk/Repository/smallproject. To extract this project from the main dump file named cvs2svn-dump, I used the following command:


svndumpfilter include trunk/Repository/smallproject \
--drop-empty-revs --renumber-revs \
< cvs2svn-dump > smallproject.dump

The --drop-empty-revs option removed revisions that did not have any relation to the project I want. The --renumber-revs option cleans up the numbering in the file. I found it more convenient to have contiguous revision numbers when examining the file.

Since I needed to do a relatively simple fixup to the new dump to change the path, I used a Perl one-liner to make the change:


perl -pe's!trunk/Repository/smallproject!smallproject/trunk!g;' \
smallproject.dump > smallproject2.dump

This just uses Perl's substitute operator (with '!' as a delimiter) to change the old path into the new path everywhere in the file. I put the output in a different file so I could compare them and make certain that there were no unexpected differences. After I verified that the paths in the file looked correct, I was ready to go.

One of the reasons I had picked this project was that I had decided that I wanted it in a different repository than the source code I had from my earlier experimentation. So I created a new repository using the command:


svnadmin create /home/svn/newrepos

where newrepos was actually the real name of this repository. But, we'll stick with this pseudonym for now. Then, I loaded the project with the following command:


svnadmin load /home/svn/newrepos < smallproject2.dump

This promptly failed with a message that smallproject/trunk was not found. Of course it wasn't found, I'm trying to create it.

After a bit more experimentation, I realized that the load was failing because the path /smallproject did not exist in the repository yet, so load could not create a subdirectory. So I recreated the repository and prepared to begin again.

With a clean repository, I created the beginning of the project with the following command:


svn add file:///home/svn/newrepos/smallproject \
file:///home/svn/newrepos/smallproject/tags \
file:///home/svn/newrepos/smallproject/branches \
-m "Migrate smallproject project."

I have left off the creation of the trunk subdirectory, otherwise the load would still fail when it attempted to create that directory. Then, I reran the load successfully. I used the svn tools to check out this project in the new repository and verify that everything appears to be as I expected.

The first actual migration worked. To simplify my work for later steps, I converted several of the command lines listed above into shell scripts to make running them a little less error prone. One other piece of insurance I started was to do a dump of any repository right before adding a new project to it. This gave me an easy way to recreate the previous state if/when something went wrong.

Next time, I'll explain how I dealt with a project with tags.

Update:

Thanks to Lars Mentrup for catching my cvsadmin/svnadmin goof. The text has been corrected.

Posted by GWade at 10:13 PM. Email comments | Comments (0)

January 25, 2005

Conversion to Subversion, Part I

For about a year now, I've been playing with Subversion on small projects. In order to protect my main repository in CVS from my experiments, I just created new projects under Subversion and worked with them there. All of my real projects continued under CVS control. This way if my experiments with Subversion were a disaster, I would only lose revisions from the new work.

Now, I've finally reached the point where I want to move some of my old projects over to Subversion. I could just add all of the projects in their current state, but I do not want to lose the history. Since this turned out not to be quite as easy as I expected, I figured it might be useful to document the process I am going through in case anyone wants to learn from my mistakes.<grin/>

My CVS Repository

To understand the examples, you will need a little background on the CVS repository that I am working from. This repository holds about thirty projects that I have worked on over the last few years. Some of the projects are big, some are small. Some are currently undergoing work, some are effectively dead. Some of these projects date back over ten years, some are relatively new.

The repository lives on a Linux box in the directory /home/cvs. The directory where the actual repository is stored is called Repository. I started keeping my repository under /home when I started keeping my /home on a separate filesystem. This makes backups and upgrades easier. Moreover, some of the items in the repository could be considered private, so putting the repository with the home directories reminds me to treat it with the same care as I treat my home directory.

The Goal

My goal is to move my current projects to Subversion repositories. The move must also meet the following additional goals:

  • All history must be retained.
  • All tags must be retained.
  • Branches may be retained.
  • Directory structure matches recommended practice for Subversion.

Although, I consider tags to be important, I have no work currently going on in any branches and all code from any branches has been merged into the trunk. I would prefer not to lose those branches, but it's not a requirement like the others. Additionally, I am experimenting with multiple Subversion repositories. So I may want to separate some projects into different repositories.

cvs2svn

My first idea was to just use the cvs2svn script that comes with Subversion to convert directly. While examining the program, I found that it has an option to just make a dump file without changing the Subversion repository. This would allow me to do some poking around before actually moving the data to the new repository.

From reading Practical Subversion recently, I was aware that the installation should include a program called svndumpfilter that allows extracting parts of a dump file. This could allow me to move individual projects instead of moving everything at once.

I needed to look at the dump file to determine the paths needed for svndumpfilter to extract my projects. This was when I found my first surprise. The structure of the revision tree in the dump file did not match the structure of repository I wanted to create. As an example, assume that I have a module in the CVS repository named project1. That project has a tag named RELEASE1. Finally, the project has a branch named major_rewrite. The directory structure from the dump file for this configuration would be:

   /trunk/Repository/project1
   /tags/RELEASE1/Repository/project1
   /branches/major_rewrite/Repository/project1

Unfortunately, this does not match the recommendations from any of the articles or books I have read on Subversion. Based on those recommendations, the structure of the Subversion repository should be more like:

   /project1
       /trunk
       /tags/RELEASE1
       /branches/major_rewrite

with the history stored in the /project1/trunk directory. In the time I've been working with Subversion, I have become accustomed to this structure and wanted to continue to use it.

The second surprise came when I examined the tags and branches. Both branches and tags are made in strange way in the dump file. The entire repository is copied for each tag (or branch), then any modules that are not supposed be part of that tag (or branch) are deleted separately. This means that there will be a series of revisions in the repository with tags/branches applied to projects that were never part of those tags/branches. None of this is visible in the final version of the repository, but it seems a bit inelegant.

In summary, this approach would result in all of the history from the CVS repository being copied to a new Subversion repository, but there are a few problems.

  • The new repository structure is not ideal.
  • Extraneous revisions with inaccurate information in tags and branches.
  • All of the projects in one repository.

None of these is a killer problem. I would just like to set up the new repositories in a cleaner way. Come back next time to see how I fix it.

Posted by GWade at 08:29 PM. Email comments | Comments (0)

August 08, 2004

Subversion

If you haven't tried Subversion yet, you really owe it to yourself to give it a try. I've used CVS for over a decade now and I've been trying Subversion for a little less than a year. I haven't yet moved most of my home projects to Subversion, but it's looking more probable every day.

The ability to rename and reorganize your files and directories without losing history is wonderful. The separation of status from update is great. I'm slowly coming to appreciate the properties system. It's really great to have a mime-type associated with each file and all the potential that goes along with that.

If you want to get started with Subversion, you can download a version at the URL above. You'll also want to read Version Control with Subversion, which is available on-line or in hard copy.

Posted by GWade at 11:04 PM. Email comments | Comments (0)

July 06, 2004

Review of Compiler Design in C

Compiler Design in C
Allen I. Holub
Prentice Hall, 1990

I decided to take a break from the relatively new books I've been reviewing and hit a real classic.

Over a decade ago, I saw Compiler Design in C when I was interested in little languages. A quick look through the book convinced me that it might be worth the price. I am glad I took the chance. This book describes the whole process of compiling from a programmer's point of view. It is light on theory and heavy on demonstration. The book gave an address where you could order the source code. (This was pre-Web.) All of the source was in the book and could be typed in if you had more time than money.

Holub does a wonderful job of explaining and demonstrating how a compiler works. He also implements alternate versions of the classic tools lex and yacc with different tradeoffs and characteristics. This contrast allows you to really begin to understand how these tools work and how much help they supply.

The coolest part for me was the Visible Parser mode. Compilers built with this mode displayed a multi-pane user interface that allowed you to watch a parse as it happened. This mode serves as an interactive debugger for understanding what your parser is doing. This quickly made me move from vaguely knowing how a parser works to really understanding the process.

Many years later, I took a basic compilers course in computer science and the theory connected quite well with what I learned from this book. Although the Dragon Book covers the theory quite well, I wouldn't consider it as fun to read. More importantly, nothing in the class I took was nearly as effective as the Visible Parser in helping me to understand the rules and conflicts that could arise.

Although this book is quite old, I would recommend it very highly for anyone who wants to understand how parsers work, in general. Even if you've read the Dragon Book cover to cover and can build FAs in your sleep, this book will probably still surprise you with some fundamentally useful information.

The book appears to be out of print, but there are still copies lurking around. If you stumble across one, grab it.

Posted by GWade at 10:29 PM. Email comments | Comments (0)

January 16, 2004

Unit tests that should fail

I was doing a little research on the Java JUnit test framework and ran across the article The Third State of your Binary JUnit Tests.

The author points out that in many test sets there are ignored tests as well as the passing and failing tests. As the author says, you may want to ignore tests that show bugs that you can't fix at this time. He makes a pretty good case for this concept.

The Perl Test::More framework takes a more flexible approach. In this framework you can also have skipped tests and todo tests in addition to tests that actually need to pass. These two different types of tests have very different meanings.

Skipped tests are tests that should not be run for some reason. Many times tests will be skipped that don't apply to a particular platform, or rely on an optional module for functionality. This allows the tests to be run if the conditions are right, but skipped if they would just generate spurious test failures.

Todo tests have a very different meaning. These tests describe the way functionaly should work, even if it doesn't at this time. The test is still executed. But, if the test fails, it is not treated as a failure. More interestingly, if a todo test passes, it is reported as a failure because the test was not expected to pass. This allows bugs and unfinished features to be tracked in the test suite with a reminder to update the tests when they are completed.

Unlike the idea in the referenced article, these two separate mechanisms don't ignore tests that cannot or should not pass. Instead, we can document two different types of non-passing tests and still monitor them for changes.

Posted by GWade at 12:58 PM. Email comments | Comments (0)