Programmer Musings: March 2004 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

March 30, 2004

Version Control and Test Driven Development

Almost fifteen years ago, I stumbled across the concept of version control in the form of RCS. I wasn't real sure that this tool was going to be useful for me. After all, it required more work that did not contribute directly to producing code. In order to try it out, I began using RCS to keep up with the source on a few projects I was working on at home. In a few days, I was hooked. I immediately began pushing version control at work despite opposition from the other developer (my boss).

The key to version control for me was a change in my work habits that version control allowed. I have recently come to the realization that Test Driven Development has had a similar effect on my work habits.

The realization that drove my use of version control was actually pretty pedestrian. I could experiment with new ideas or designs with almost no risk. I I wanted to see if a new algorithm would work better, I would try it out. If the new idea didn't work out, I could always go back to an earlier version of the code. If I wanted to try out new data structures, I could do the same thing. This was incredibly liberating! I could experiment without worry about messing up code that I had slaved over.

Before I had used RCS, I could do something similar by backing up the old code before trying out my idea. Then, if things didn't work out, I could throw everything away and go back to the old code. But with RCS, I could do backups on small changes and move forward and back through changes with relatively minor effort. Later, I found CVS and my reliance on version control became even stronger.

For the last couple of years, I've been experimenting with unit-test frameworks and Test Driven Development. A lot of the time, the tests really seem to put the fun back in programming. I seem to be moving almost as fast as I did without the tests, but my designs and code appear to be cleaner and more solid.

Although several people have said it different ways before, this is the first time I realized that TDD is giving me the same benefit I first recognized with version control. I can experiment (or refactor) again with almost no cost. If my unit tests are good enough, I can completely refactor my code or re-implement any part of the algorithm with confidence. If I broke anything, the tests will show me. If the tests don't give me the confidence to refactor, my tests aren't complete enough.

This has given me more incentive to work on the TDD way of doing things. It took a little while for me to reach the point of using version control for every project no matter how small. I guess it will probably take a while for me to get to that point with testing as well.

Posted by GWade at 09:16 PM. Email comments | Comments (0)

March 24, 2004

Review of Randomness

Randomness
Deborah J. Bennett
Harvard University Press, 1998

This book covers the subjects of randomness, probability, and statistics better than any other book for the layman I've ever seen. As a programmer, most of the books I've read on these topics are either heavily into the math and theory of the subject or focused on programming techniques. This book covers the subject matter in a way that is much more accessible.

The beginning of the book covers randomness and chance throughout our history. The author covers early randomizers like dice and lots. She also talks interestingly about how religions affected the perceptions of randomness and chance. Based on this background, she explains why people in general have extremely bad intuition regarding these subjects. Along the way we meet random sampling and the effects of random errors on the interpretation of tests. Late in the book, we finally come to the subject of random number sequences and comuter generation of such sequences.

This book is great for anyone who wants a better grasp on the concepts of randomness, probability, and statistics. It is also a relatively easy read.

Posted by GWade at 10:36 PM. Email comments | Comments (0)

March 20, 2004

The Paradigm Paradox

I've already written some of my ideas about paradigms, but the most important point still remains. I maintain that the real benefit of a paradigm is not the way of thinking that it defines, but the contrast between this way of thinking and others. Almost every programmer I've ever met that only knows one paradigm seems to be hampered by an inability to look a problems from more than one viewpoint. I believe that much of the benefit that is derived from changing to the newest paradigm comes from the change. This would also help to explain why the decision to train people only in this newer paradigm does not automatically create better programmers.

Most of the entry-level programmers that I've worked with have been about the same level of effectiveness no matter what paradigm they were trained in. This has been true of structured programmers, modular programmers, and object oriented programmers. This effect also applies to the languages they were trained in. I haven't seen any improvement in new Java-trained programmers over the older C-trained programmers or the Pascal-trained programmers before them.

Each generation of programmers seems to make a lot of similar kinds of mistakes. Although, the syntax of a new language may prevent you from making certain particular mistakes, there are certain classes of mistakes that remain. All of these entry-level programmers seem to start with a misunderstanding of complexity and an inability to look at a problem in an unusual way. They immediately begin to apply the paradigm without first understanding the problem. They also tend to solve problems in older code by adding more code, even when that is usually not the best solution.

No new paradigm that I've seen has solved these fundamental problems. I think that the effort to learn and work with more than one paradigm (or language) forces you to realize some of the differences between a good solution and an artifact of the paradigm (or language). For example, one of the most basic attributes of a good design is simplicity. This has been true in each of the programming paradigms that I have used. It is also true in mathematics, logic, and science.

Given two designs that meet all of the requirements, the simpler one is usually the best. One of the most important aspects of the OO paradigm is its ability to deal with complexity. A side effect of using OO is that many people add more complexity, because they can handle it. They are much more likely to build a framework of supporting classes, even for problems that don't require them. In this way the OO paradigm can lead an inexperienced programmer to build a worse design through the addition of unnecessary complexity.

If you have experience in more than one paradigm, you are more likely to recognize this unnecessary complexity than if you've just done OO programming. Why? Because if you could solve the problem more simply without the object approach, you would see that solution as well. The contrast between these two possible solutions would cause you to look for ways to get the benefits of OO but the simplicity of the other solution. The result is often a simpler design.

On the other hand, some problems actually only have complex solutions. When comparing two or more approaches in this case, the complexity is more apparent because you can see the difficulty with using the non-OO approach. Nevertheless, the second viewpoint can provide insight into other difficulties that might be masked by the OO design. In this case, the second viewpoint does not replace the OO design, it enhances it.

So, to reiterate, the most important feature of a new programming paradigm is the contrast between that paradigm and the ones that preceeded it.

My other blog entries on programming paradigms:

Posted by GWade at 10:15 PM. Email comments | Comments (0)

March 10, 2004

Bad Names and Good Bad Names

And just to prove that some ideas appear in many places all at once, Andy Lester explores naming again in O'Reilly Network: The world's two worst variable names [Mar. 07, 2004]. Apparently, Lester is reviewing the second edition of Code Complete, one of my favorite books on programming, and says there is "an entire chapter devoted to good variable naming practices."

The comments on the Lester's weblog are a spirited little discussion on variable naming. But I have my own thoughts, of course.

A large portion of programming is actually thinking. We spend time thinking about a problem and convert our thoughts and understanding into code. As such, names are critical to clear thought and code. One comment on Andy Lester's weblog suggested that the time spent coming up with good names was better spent writing code. This is an attitude I have heard many times in the past and have worked hard to remove when teaching.

Good names help when reading code. More importantly, good names help when thinking about the code, both during maintenance and during development. Humans are only capable of keeping a small number of distinct concepts in mind at once (7 +/- 2 according to George Miller), a good name can help by abstracting away a large amount of unnecessary detail. This allows you to focus on the important part of the code.

I try hard to always use good names for variables, functions, classes, programs, and any other artifact of my programming. I have learned over time that good names make my programming flow more easily. When I run into a method or variable that I can't name, I know that I don't truly understand that part. Many times I will give it a placeholder name until I understand it well enough that a name presents itself. At that time I try to always go back and rename the item to show my new understanding. To me, good naming is part of good design.

In contrast, if you use good names in general, there are places where bad names can be good ones. For example, i, j, and k are lousy names for variables. But, as loop variables, they become useful because we are used to them. Since the early days of FORTRAN these were the canonical loop control variables. Most people would recognize that without thinking about it.

One of my favorite bad variables is dummy. I only use it in one circumstance, I need a placeholder in an argument list that will receive a value that I will not use. Anybody looking at this code should understand the meaning pretty quickly. For example, in the C++ code below, dummy is used to receive a delimiter that I don't care about.

config >> row >> dummy >> column;

I also have one really bad program/script name that I use regularly. The name is so bad that it carries with it an important meta-meaning. I often write one-shot, throw-away programs or scripts and name each of them doit. These scripts are for jobs that are easy enough to rewrite that I would spend less time rewriting it than I would spend trying to remember what I named it last time. I often write the doit.pl Perl script for work that is a little too complicated for me to do as a one-liner, but not complicated enough to really build a real program for.

The meta-meaning of the doit program is interesting. If I find a doit script or program in any directory, I delete it. It was designed as a throw-away and I know that it is not really worth keeping. Every now and then one of my doit scripts evolves into something useful while I am working through a problem. At that point, I give it a real name and it becomes part of my toolkit, at least for that project.

The subject of names in programming is more important than many people realize. Some really bright people have written on the subject. Steve McConnell had some good advice in Code Complete. Andrew Hunt and David Thomas explain some of the reasons that bad names cause harm in The Pragmatic Programmer. Andy Lester had another weblog entry on names a short while back, O'Reilly Network: On the importance of names [Feb. 15, 2004]. In the comments to that entry, he pointed to Simon Cozen's article Themes, Dreams and Crazy Schemes: On the naming of things from a couple of years ago.

Posted by GWade at 09:32 PM. Email comments | Comments (0)

March 07, 2004

Review of Perl for Web Site Management

Perl for Web Site Management
John Callender
O'Reilly, 2002

This book may be one of the best books I've ever seen for getting Perl used in a place where it is sorely needed.

Many people developing web-based applications are not programmers. They learned to use some HTML authoring tool and progressed from there. Many of them never quite get the idea of automating the grunt work of their jobs. This book tries to introduce these people to Perl as a labor-saving device.

If you are doing any form of web site maintenance, this book is worth a read. If you are a programmer working on web sites, you are probably doing most of what this book teaches. But, if your background is not in programming, this book will probably increase your productivity dramatically.

Posted by GWade at 02:52 PM. Email comments | Comments (0)

Review of Modern C++ Design

Modern C++ Design
Andrei Alexandrescu
Addison-Wesley, 2001

This book just blew me away. I've had access to compile-time programming in other languages and had worked pretty hard to understand templates. I felt I had a better than average grasp of how C++ templates work and are used. The techniques in this book were astounding. I have since found many sites devoted to these techniques, but I remain impressed with the way Alexandrescu explains the basics of these techniques.

Warning: This book is definitely not for everyone. But if you really want to push the limits of what you can do with C++, you need to read this book.

Posted by GWade at 01:17 PM. Email comments | Comments (0)

March 03, 2004

Review of The Logic of Failure

The Logic of Failure
Dietrich Döner
Perseus Books, 1996

This is a spectacular book on why people make mistakes. This book gives wonderful insight into design and troubleshooting that you won't get from any traditional book on programming.

A large portion of the book describes how very intelligent people make what seem like blindingly stupid mistakes. It shows how people stop thinking logically when there is a delay in feedback between cause and effect. It also shows how people react when the situation seems to be getting worse despite their efforts. You have probably observed some of these behaviors in people when things really begin to go wrong.

Amusingly enough, a friend of mine pointed out that this book would have been much more successful except for the mistake of focusing the title on failure. With that focus, obviously some of people avoided the book because the don't want to contemplate failure.

Although not a normal book on programming, I would recommend this book to any programmer who needs to do troubleshooting or development under the gun. It may not solve your problems, but it may help you understand when people including you start reacting to a stressful situation.

Posted by GWade at 10:23 PM. Email comments | Comments (0)

The Law of Unintended Consequences

One of the fundamental laws of the universe could be called the Law of Unintended Consequences. This law is as universal as Murphy's Law, but not as well recognized. To me, the gut-level understanding of this law is one of the things that shows the difference between a really good programmer and someone who just writes code.

In its simplest form, the law could be stated as:

Every action, no matter how small, has consequences that were not intended.

This law applies in the physical world as well as the world of programming. Heat generated by friction is an unintended consequence of many physical activities. Many historians believe that organized crime was an unintended consequence of prohibition. I saw a report a few years ago that showed a relation between better auto theft deterrent devices and car-jackings.

Fortunately, most of the unintended consequences we deal with as programmers aren't quite this dramatic...right? Let's explore some of the consequences result from decisions and actions in our code.

For example, choice of language or paradigm has a large number of effects on your design from that point forward. If you choose a non-object-oriented language, you cut yourself off from many new techniques. (Unless, of course, you want to implement the OO primitives yourself.) On the other hand, choosing a strongly-typed OO language may slow you down if you are solving a quick scripting problem.

Choosing a development methodology also has many consequences. If you choose Extreme Programming, there is some evidence that you will be reducing the amount of documentation that accompanies the project. On the other hand, choosing a more structured methodology like the Rational Process does prevent very rapid development.

Most of us can see these kinds of consequences. In fact, those of us who have been working in the field for a while may even take the consequences into account as part of our decisions when starting a project. But, these aren't the only decisions and actions with unintended consequences.

The Y2K Problem

Many people pointed to the Y2K problem as an example of the short-sightedness of programmers, or of bad old programming practices. However, in many cases, these consequences were actually well understood by the people writing the code. In some cases, they had to decide between a data format that would significantly increase their data storage requirements and the possibility that this data or code would still be in use twenty or thirty years later. Remember in the sixties and early seventies, you couldn't run to a computer store and pick up a 120GB hard drive. Storage space was precious and expensive. A decision that decreased the storage requirements of a system by 10% could mean the difference between success and failure. On the other hand, the idea that the software and data would remain in use thirty years later was not very believable. But, this decision did not take into account how infrequently some kinds of systems are changed and how they change.

Many systems grow other programs that all communicate through the same data formats. Then, when these programs need to change, you are tied to the original formats because too much code would need to change at one time. This is a large risk for even a small change to the data format.

The Internet

Many of the protocols that the Internet is based on are based on straight ASCII text. Although many people try to improve these protocols by making them binary and therefore more efficient. They miss one of the important design decisions. Many of these protocols can be tested and debugged by a human being using telnet. If you have ever spent any time troubleshooting a binary protocol, you can appreciate the elegance of this solution. Many years ago, I was able to help my ISP solve an email problem by logging into the mail server directly and reporting to the tech support person how the server responded.

Every now and then someone attempts to "fix" one of these protocols by making it into a binary data stream. Invariably, one of the consequences of this action is to make the protocol harder to troubleshoot. This side effect almost always kills the attempt. One useful approach that has solved some of the bandwidth problems caused by using a straight ASCII protocol has been using a standard compression algorithm on top of the more verbose protocol. This often reduces the bandwidth of the protocol almost to the point of a binary implementation.

One place where I have seen this work particularly well is the design of Scalable Vector Graphics (SVG). SVG is an XML vocabulary for describing vector graphics. The main complaint from many people was that it was extremely verbose. Every graphic element is described with text tags and and attributes. However, the SVG committee had considered this issue. There are two defined formats that an SVG processor is required to understand. The normal SVG files (with a .svg extension) are XML. The second format is gzip-compressed SVG files (with a .svgz extension). The compressed files use a standard format to reduce size, but use the more verbose format for flexibility and extensibility.

HTML

One of the original decisions in the design of browsers was to make them forgiving in the HTML they would accept. In the days before HTML authoring tools, everyone who wanted to publish on the web had to do their HTML by hand. The thinking was that people would be more likely to continue using the thing if they didn't have to fight the browsers to make their pages display. Unfortunately, this had a consequence that drove the browser wars. Each different browser rendered invalid HTML slightly differently than the others. Before standard ways of describing the presentation of the HTML, people began using these differences as formatting tricks to get the displays they wanted.

Obviously, it wasn't long before someone would claim that because your browser didn't render their invalid HTML the same way as their browser, your browser was broken. In fact, one major browser added a special tag that was often used to break compatibility with another major browser.

We have spent years cleaning up the results of that decision. Arguably, the original decision probably did contribute to the speed at which the early web grew. One unintended consequence was incredibly complicated code in almost every browser to deal with the many ways that users could mess up HTML.

Security

The point of this rambling verbal walk through past code/design issues is to remind you that any code and design issues we make have unintended consequences as well. They may be as simple as an increase in memory use. More likely there are subtle consequences that may not be noticed for quite some time. In recent years, many unintended consequences have surfaced as security holes.

The buffer overflow problem that has been such a bane recently is definitely a result of unintended consequences. Although many scream sloppy coding when they hear of a buffer overrun bug, I don't always think that is the problem. Quite often, I think there is an unconscious assumption that the other end of a connection is friendly or a least not malicious. As such, code that was verified with reasonable inputs turns out to be flawed when the basic assumptions are violated.

I can just hear some of you screaming at this point. I'm not defending the decision that left a flaw in the code. I'm explaining how a (possibly unconscious) assumption had unintended consequences.

Conclusion...for now

This has already gone on longer than I intended. But, I believe that it is worth thinking about your assumptions and examining your decisions. When writing code, every one of your assumptions and decisions will have unintended consequences. I believe the more you think about it, the more benign the remaining unintended consequences may be.

Posted by GWade at 09:53 PM. Email comments | Comments (0)