Programmer Musings: December 2005 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

December 28, 2005

Diff Debugging

Every now and then, I manage to pull myself away from reading and reviewing computer books (my hobby for the last year) or programming (my hobby for ... never mind), and spend a little time on various weblogs. It's important to see what the names in the field are thinking and talking about.

During one of these late night blog-reading sessions, I ran across MF Bliki: DiffDebugging. This turned out to be a wonderful find, not because the technique was new to me (I've been using it almost as long as I've used version control), but because he gave the technique a useful name.

Fowler explains it much better than I do, of course, but the technique is great for finding regressions in a code base. If you detect a bug in some code you are working on that you know worked before, use your version control software (VCS) to find a version that did not manifest the bug. Then, try to find the last version that did not have the bug and the first version that did. (These should be adjacent versions.) By only looking at what changed between these two versions, it may be possible to find the cause of the bug much more quickly.

I have often used the technique with dates instead of individual commits. This is the easier approach if your VCS does not support atomic commits (CVS, RCS, etc.). If your VCS does support atomic commits (Subversion, etc.), you can test individual commits, which may simplify focusing in on the change.

If you have good unit tests for your software, diff debugging is even faster. You can just run the unit test which shows the bug for each version that you check out. Success and failure are much easier to detect. However, formal unit tests are not required to use diff debugging, since I was using the technique long before I became test-infected. Automated tests do make the technique much easier, though.

One important point that Fowler does not make is that a binary search technique can be pretty useful if the range of versions is large. If you know the code worked a month ago and it doesn't work now, a good approach would be:

Check out the code from a month ago and verify that it actually worked.
Check out the code from two weeks ago and see if it worked.
- If it works, check out code from a week later to see if it works.
- If it doesn't, check out code from a week earlier to see if it works.

Repeat this technique with shorter time scales until you find the offending version. Of course, when you are reduced to a handful of versions, it is easier to just step through them (as in Fowler's example). I have often needed to the binary search technique in cases where a standard build and smoke-test procedure was not in place.

The most interesting part of Fowler's blog entry was the fact that he created a good name for this technique. If you don't have a name for something, it is really hard to talk about. It is also not as obvious a tool in your toolkit. By giving the technique a name, Fowler has improved my programming skills by turning an ad hoc technique I have applied in certain situations into a known tool in my programming toolkit. The technique has not changed, but the name makes it a more reusable tool.

This is an interesting fact about our field: concepts are our best tools. Naming a concept gives you the power to use it. Many of the most important breakthroughs in the past few decades have not been new techniques or algorithms, but the naming of techniques so that we can discuss them and reuse them.

Posted by GWade at 03:47 PM. Email comments | Comments (1)

December 20, 2005

Review of Secure Coding in C and C++

Secure Coding in C and C++
Robert C. Seacord
Addison-Wesley, 2006

One very real problem in software today is the rise in security exploits of one kind or another. Gone are the days when we can just assume that no user of our software will try to break it, or use the software to compromise an entire system. The more immediate problem is that most of us have no training in preventing security vulnerabilities in our code.

This book does a fairly good job of covering a number of sources of security problems and explaining how they can be exploited. Using this information and the recommended practices in the book, you can make your code much more secure. The book has a chapter devoted to each of several vulnerabilities. The author examines the reason for the problem, how it is likely to manifest, and the kinds of exploits that can be applied. He then makes suggestions for tools and techniques to use to reduce these problems.

The book covers topics such as strings, integers, dynamic memory, and formatted I/O, as well as others. In each case, the book carefully explains where the potential problems lie. In some cases, the author shows actual examples from code that was in live use. Although the delivery can be a bit dry at times, the material itself is sometimes scary in its implications.

Possibly the most important chapter in the book is the final one Recommended Practices. This chapter covers more than just techniques for solving a particular kind of pointer bug. This is the chapter that covers overall strategies, such as threat modeling, data sanitization, and defense in depth. If you have any background in computer security, these concepts will probably be familiar. If not, they are the most important things for you to learn from the entire book.

This book should be a requirement for anyone who develops software that will be used by more than just his co-workers. This includes software available over the web.

Posted by GWade at 09:56 PM. Email comments | Comments (0)

December 08, 2005

The IP Goose

One of the problems with being a software developer these days is company Intellectual Property agreements. I understand that companies want to protect the time, money, and expertise that they have invested. But some of the agreements I have seen go way past reasonable concern. Some IP agreements take the position that any programming a developer does should belong to the company. This includes software that was developed off company premises, without company resources, and in fields that have nothing to do with the company's business. On the face of it, this seems a bit overboard.

Just to be clear, I do not suggest that companies give up all IP agreements. A company needs to be able to protect the work that they have commissioned. They also need some form of protection against an unscrupulous individual taking their proprietary knowledge and going out to compete with them. On the other hand, taking too hard-line a stance can have consequences. I'd just like to suggest that companies consider the tradeoffs involved.

Consequences

So what are some of the consequences of a too-strict IP policy? The simple answer is that these agreements discourage programmers from programming in their free time. These kinds of agreements also make it impossible for a programmer to work on any open source projects. These projects normally require that any code a programmer submits to the project be unencumbered by any IP agreement. If everything a programmer touches belongs to the company, the programmer cannot work on open source projects.

Many management or legal types might take the position that protecting company assets is more important than a programmer's ability to program in his or her free time. Unfortunately, this is a very short-sighted view. This viewpoint ignores some important facts.

Programmers improve through practice.
Open source projects are a great way to improve skills.
Many programmers have (programming) interests outside of work.

Practice

Although many non-programmers don't realize this, a programmer requires specialized knowledge and skills to program well. These skills can only be improved through practice. Some people are born with more inherent programming aptitude than others, but all programmers need practice to enhance and maintain their skills. Really good programmers often find themselves writing code even when they don't have to. Interestingly, it seems to be less important what you are working on, than that you are continuing to use those skills.

Programming on different projects, with different groups of people is a great way to extend the programmer's skills and knowledge. Different experiences tend to make the programmer more flexible in his approaches and more aware of best practices. Open source projects are a great way to see how other people develop software and to be exposed to other methodologies and approaches. The more experience a programmer develops, the better his or her skill set.

Projects

One thing that would probably amaze the legal and management types that craft those IP agreements is that many of their programmers will tend to work on completely different kinds of programming outside of work. This makes it less likely that the software the programmer develops will have any relation to the company's business. There are some exceptions to this, but many of us program differently for fun than we do for work.

Although some programmers will work on work-related code in their free time, the work generated could be protected by much more restricted agreements than many companies generally use. Moreover, in many cases the programmer in question intends to provide the completed product to his employer anyway, but needs to work on it outside work hours for political, personal, or (in rare cases) technical reasons.

Word of Mouth

One important point that many companies forget is that programmers do discuss work environments with each other. Any company that employs tactics that make our work less fun or interesting will find that the word will get out, making it more difficult to find and hire the best. Conversely, a company that recognizes the tradeoffs involved in the IP agreement may find word of mouth makes it easier for them to hire better programmers.

The Goose

I remember a fairy tale from when I was young about a goose that laid golden eggs. It sometimes seems that many people have never heard this story. One version of the story can be found here. The important part of the story is that the farmer lost the steady flow of gold eggs by trying to get everything at once. In a way, this is similar to the strict IP policy. By trying to capture all of the output of each programmer, the policies may prevent a steady stream of innovations that come from better skills and wider knowledge.

Some very big companies have begun to realize the benefit of this approach. In recent years, IBM has been providing support, cash, and personnel to open source projects. This has allowed them to reinvent the company and has helped to make IBM a place where good programmers would want to work. I don't know what their IP agreement looks like, but I would guess that they have put more thought into it than many companies.

Posted by GWade at 07:09 PM. Email comments | Comments (0)