Programmer Musings: February 2004 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

February 25, 2004

Review of Mastering Regular Expressions

Mastering Regular Expressions
Jeffery E. F. Friedl
O'Reilly, 1997

If you use regular expressions on a regular basis, or if you've used them a little but want to get better, this is a book you need. I had been working with regular expressions for over ten years when I first read this book. I was used to helping other people debug their regular expressions and thought I knew them very well. This book improved my regular expressions tremendously.

The book covers multiple regular expression implementations, including AWK, grep, Tcl, GNU Emacs and Perl 5 regular expressions. Friedl does a good job of covering basic regular expression concepts before going on to more advanced topics. Some of these include different types of regexp engines and their effects on the expressions you write, and greediness. He covers some regexp gotchas including regular expressions that may never terminate.

Throughout the book are a number of useful techniques that will improve your daily use of regular expressions.

Mastering Regular Expressions is highly recommended.

Posted by GWade at 06:48 AM. Email comments | Comments (0)

February 22, 2004

Paradigms Found

In my weblog entry Programmer Musings: Paradigms Lost from about five days ago, I talked about some of the negative effects of programming paradigms. That entry could give the impression that I don't believe in new paradigms. Nothing could be further from the truth. However, I think the main benefit of new programming paradigms is often overlooked.

Each programming paradigm that comes along gives us new tools to use in our quest to create great solutions to hard problems. As I said before, these new tools don't necessarily invalidate the last paradigm or its tools. So, what does a new paradigm give us?

Conventional wisdom says that each new paradigm has given us better tools and techniques for dealing with complexity. The object oriented paradigm allows us to deal with more complexity than the modular paradigm which came before it. These new tools and techniques come with a cost; a relatively high learning curve. It also came with some complexity of its own. This is not to say that the new complexity and learning curve aren't worth it, but it is important to acknowledge that cost. This cost is especially high on projects where the old paradigm was enough. A very good example of this cost is the standard Hello World program.

In the modular style of C, this would be


#include <stdio.h>
int main(void)
 {
  printf( "Hello World\n" );
  return 0;
 }

In the object oriented style of Java, this would be


class Hello
{
  public static void main( String [] args )
   {
     System.out.println( "Hello World" );
   }
}

In the object oriented style, we need to know about a class, static methods, public access, the signature of the main method, a String array, and the proper method of the out object in the System package to call for printing to the screen. In the modular style, we need to know the signature of the main function, how to include the I/O library, and the name of the library function to print a string.

Granted, in a larger, more complex program this overhead in understanding would not be quite so overwhelming. In a simpler or less complex program, this extra overhead may not be worthwhile.

You might ask "why not just learn one paradigm?" After all, if the latest paradigm handles complexity best, why not just use that one. Part of the answer is because of this understanding overhead. I've seen many cases where people did not automate a process they were going to manually execute dozens of times because it was not worth writing a program to do it. When you make this decision, the computer is no longer working for you. We're programmers. The computer is supposed to do the grunt work, not us.

Okay, why not just use the more advanced paradigm even on small quick-and-dirty programs. In a small program people often find that they must spend more time supporting the paradigm than solving the problem. This is why shell scripting is still used. Why write a program to automate a half dozen commands you type to get your work done when a short shell script or batch file can solve it for you? Agreed, it doesn't support the latest paradigms, but it does get the computer working for you, instead of the other way around.

However, we still haven't touched what I feel is the most important thing about new paradigms. New paradigms give you a different way to think about solving problems. This is not meant to imply a better way, just a different one. If you have two (or more) ways to look at a problem, you have a better chance of coming up with a great solution.

By having two or more ways to view a problem, you increase the number of approaches you can use to tackle the problem. Maybe you don't need a large object framework to handle this problem, maybe a straight-forward procedural filter implementation will do. In another problem, you might have too much required flexibility to deal with in a procedural program maybe the Strategy Pattern with appropriate objects is a better approach. Then again, possibly a little generic programming with the STL is exactly what you need.

The unfortunate problem with programming is that the complexity never goes away. Different paradigms just manage the complexity in different ways. Some paradigms handle some kinds of complexity better than others, but they do it by changing the form of the complexity, not by making it disappear.

The most important thing about knowing multiple paradigms is that it allows you to decide how to manage the complexity in any given problem. By giving yourself more options and more ways of looking at the problem, you increase the chances of finding a good match between solution and problem no matter what problem you are trying to solve. That, in my opinion, is the most important advantage of a new paradigm.

Posted by GWade at 10:10 PM. Email comments | Comments (0)

February 21, 2004

Review of Exceptional C++

Exceptional C++
Herb Sutter
Addison-Wesley, 2000

I had been working with C++ for a number of years before I read this book and I thought I knew the language.

This book provides 47 problems with included solutions. Trying to solve the problems is very important. Each one tests an area of C++ that some people find unclear. In some cases, I didn't realize that I was unclear on the topic until I solved the problem and finished reading the explanation. These problems will stretch your C++ skills and solidify your understanding of the language.

The second section of the book covers exception safety. In some ways, this may be the most important part of the book. Sutter really does a great job of converting gut-level intuition about exceptions into logical, useful knowledge. In this section, he covers different levels of exception safety and what each level guarantees. He then uses these levels and their guarantees to analyze and construct exception-safe code.

In addition to explaining why some things that work actually work, this book did a great job of showing when and where other good ideas will blow up in your face.

Exceptionally highly recommended.

Posted by GWade at 09:55 AM. Email comments | Comments (0)

February 17, 2004

Paradigms Lost

An earlier weblog entry, Programmer Musings: Paradigms limit possible solutions, spent some time on what paradigms do for programming.

Now, I'd like to consider a slightly different take on programming paradigms. Why do programming paradigms seem to take on the force of religion for so many in the software field?

Although I haven't been a professional programmer as long as some, I have been in the business long enough to weather several paradigm shifts. Now before you run screaming away from that annoyingly over-used phrase, I really mean it in the literal sense: shifting from one dominant paradigm to another.

When I started structured programming was king. That was later refined with modular programming. I remember a brief stint in literate programming that never really seemed to catch on. The current big thing, object oriented programming, has even spun off sub-paradigms like aspect oriented programming. Let's also not forget generic programming that hit the mainstream with the C++ Standard Template Library and Design Patterns introduced by the Gang of Four book. Along the way, I even dabbled in functional programming.

Each of these, in it's time, helped programmers to build better code. Each paradigm has strong points and weaknesses. What I don't understand is why the people adopting each new paradigm find it necessary to throw away good ideas because they were associated with an older paradigm. I am continually amazed that someone would look at a perfectly reasonable program in the structured programming style and dismiss it as not object oriented. So what. Does it do the job? Is it still plugging away after being ported to three different operating systems and who knows how many environments? If the answer to these questions is "yes", who cares how it was designed?

A couple of weeks ago, I saw the weblog entry O'Reilly Network: Overpatterned and Overdesigned [Jan. 28, 2004], where chromatic blasts the overuse of patterns. I was genuinely amused and heartened by his example and the comments that followed. One of the points of his entry was that design patterns can be overused. I am willing to go a little further, any paradigm can be overused.

The biggest problem I see with programming paradigms is a misunderstanding of what they really mean. A paradigm is a way of thinking about problems and solutions. Using a paradigm should not be a life altering event. In my experience, no paradigm is inherently better than all others. However, it seems that as each new paradigm is discovered, a large fraction of the people using it feel the necessity to ignore all of the lessons of previous paradigms, as if they somehow no longer apply.

The really funny thing is that, over time, we see the same lessons come back, often with different names. In the old structured/modular programming days, you could improve a program by reducing the coupling between unrelated code and increasing the cohesiveness of your modules. Of course, no one does structured programming now. Instead we would refactor our classes to increase independence between our classes possibly through the use of interfaces or abstract classes. We would also work to provide minimal interfaces for our classes. We want to make sure that those classes only do one thing. Sounds familiar, doesn't it.

Hopefully, one day, we will learn to treat a new paradigm as just another tool in our kit. Remember carpenters did not throw out their hammers and nails when screws and screwdrivers were invented. Each serves a slightly different purpose and provides useful options. I see programming paradigms in a similar light.

Posted by GWade at 06:44 PM. Email comments | Comments (0)

February 16, 2004

On Names, again

Isn't it interesting how some ideas will surface in different unrelated places at close to the same time.

O'Reilly Network: On the importance of names [Feb. 15, 2004] talks about how important the right name is for the success of a project. I think it may be more important not only for recognition but to give a good metaphor for how people will relate to the project.

I took a little different tack on this concept in Programmer Musings: True Names, although my comments were much more abstract.

Posted by GWade at 09:58 PM. Email comments | Comments (0)

February 15, 2004

Review of The Design of Everyday Things

The Design of Everyday Things
Donald A. Norman
Doubleday, 1988

I know this isn't a book on programming, but it is a very good book on design. Norman shows design mistakes in everyday items like door handles and stoves, as well as nuclear power plants and trains. He also introduces design principles that you can use when creating software. Some examples include

Controls that do different things should look different.
Make relevant parts visible.
Give an action obvious and immediate feedback.

He also discusses affordances and uses the concept to explain why we push on doors marked pull.

This is a must-read book anyone who does any kind of design. If you are writing or designing software, this means you.

Posted by GWade at 09:24 AM. Email comments | Comments (0)

Review of Software Craftsmanship

Software Craftsmanship
Pete McBreen
Addison-Wesley, 2002

This book is clearly written by someone who gets programming. I'm not sure how well the current business climate would accept his idea of Software Craftsmen, but I'm certain programmers (good and bad) will see an idea they like.

The main premise of the book is that programming is less like an engineering discipline and more like a craft. As such, McBreen suggests that the way to improve applications is to follow a craft model more precisely. New programmers should be apprenticed to more senior programmers. We should encourage pride in one's work. A program should not be something you code and then toss to maintenance. He argues that this would help to develop skilled craftsmen, instead of just people who have passed a certification course.

The ideas in this book push back against the kind of software engineers who seem to consider actually building the software and making it work as being a minor detail that occurs after the real work is finished. This kind of software engineer would probably regard this book as glorifying the worst kind of hacking. But, I think he or she would be doing this book and their own abilities a disservice by dismissing the book and its ideas that easily.

I've seen mentored or apprenticed programmers become very good in a short time. I also agree that pride in your skills and work seems to improve the results. So, some of what McBreen says resonates with my experience. But, I'm not sure I agree with all of his conclusions. One of the best programmers I know, suggested that this book was a not very methodical or logical in it's presentation. I saw it more as exploring an approach rather than presenting a road map for implementing it.

I'm not sure I agree with all of what McBreen says, but this book is definitely recommended. No matter what model of modern software development you subscribe to, this book has some ideas worth considering.

Posted by GWade at 09:11 AM. Email comments | Comments (0)

February 12, 2004

XML Living Up To Its Promise

XML.com: Opening Open Formats with XSLT [Feb. 04, 2004]

This article by Bob DuCharme is a great example of something we don't see enough of. He takes data from a defined XML application (OpenOffice.org Impress format). He uses standard tools (XSLT) to extract and format data useful to him.

This is not the normal If I put my data in XML everyone will be able to use it message we see everywhere. It is also not an example of business-based conversion of some consortium-sponsored format into some other consortium-sponsored format. It's not even the (often) contrived examples from XML books of how XML makes our lives better.

This is a great example of someone with a very specific need that is solved by standard tools and XML. This specific need is not something that the OpenOfice.org team would want to dedicate the time to satisfy. Since they chose an open XML file format, they don't have to. The user owns his data in a more meaningful way than if the same data were in a proprietary format.

In the 5 years I've been working with XML, I've almost never seen this kind of example. What's more amusing is that this use has been promised all along. I've spent a lot of my time working with special purpose XML applications that I've crafted for a handful of uses. This article is the wakeup call I needed to remind me to look at some other formats again.

Posted by GWade at 06:52 AM. Email comments | Comments (0)

February 11, 2004

Object Death

What does it mean for an object to die? In C++, there are several distinct and well-defined stages in the death of an object. Other languages do this a little differently, but the general concepts remain the same.

This is the basic chain of events for an item on the stack.

Object becomes inaccessible
Object destructor is called
Memory is freed

Once an object goes out of scope it begins the process of dying. The first step in that process is calling the object's destructor. (To simplify the discussion, we will ignore the destructors of any ancestor classes.) The destructor should undo anything done by the object's constructor. Finally, after all of the destruction of the object is completed, the system gets an opportunity to recover the memory taken by the object.

In some other languages, a garbage collection system handles recovering memory. Some systems guarantee destruction when the object leaves scope, even with automatic garbage collection. However, some of them focus so hard on memory recovery that they provide no guarantees about when, or even if, destruction of the object will occur.

Although many people pay a lot of attention to the memory recovery part of this process, it seems to be the least interesting part of the process to me. The destruction of the object often plays a vital role in the lifetime of the object. This destruction often involves releasing resources acquired by the object. Sometimes, memory is the only thing to be cleaned up, but many times other resources must be released. Some examples include

closing a file
releasing a semaphore or mutex
closing a socket
closing/releasing a database handle
terminating a thread

These are all issues that we would like to take care of as soon as possible. Also, they result in some consequence if the cleanup step is forgotten or missed.

Anytime I have a resource that must be initialized or acquired and shutdown or released, I immediately think of a class that wraps that functionality in the constructor and destructor. This pattern is often known as resource acquisition is initialization. Following this pattern gives you an easy way to tell when the resource is yours. Your ownership of the resource corresponds to the lifetime of the object. You can't forget to clean up, it is done automatically by the destruction of the object. Most importantly, the resource is even cleaned up in the face of exceptions.

In the systems where destruction may be postponed indefinitely, this very useful concept of object death and the related concept of object lifetime is discarded.

Posted by GWade at 05:49 PM. Email comments | Comments (0)

February 10, 2004

Review of Questioning Extreme Programming

Questioning Extreme Programming
Pete McBreen
Addison-Wesley, 2003

This book starts off with the premise that XP is really not appropriate for many projects. However, far from being a bash XP session, it turns into a very hard-nosed look at many methodologies. He compares many of the strengths and weaknesses of the major development methodologies in a way that shows that none qualifies as the silver bullet.

The author does a very good job of skewering some of the hype about XP while simultaneously showing many of its strengths. If you are considering XP, you really need to read this book for his analysis of what projects can succeed with XP. I think the most important insight from this book was the evaluation of what is needed to stop doing XP.

If you have a strong opinion, either way, about XP, this book will make you question your position.

Posted by GWade at 10:09 PM. Email comments | Comments (0)

February 07, 2004

The Forgotten OO Principle

When talking about Object Oriented Programming, there are several principles that are normally associated with the paradigm: polymorphism, inheritance, encapsulation, etc.

I feel that people tend to forget the first, most important principle of OO: object lifetime. One of the first things that struck me when I was learning OO programming in C++ over a decade ago, was something very simple. Constructors build objects and destructors clean them up. This seems obvious, but like many obvious concepts, it has subtleties that make it worth studying.

In an class with well-done constructors, you can rely on something very important. If the object is constructed it is valid. This means that you generally don't have to do a lot of grunt work to make sure the object is set up properly before you start using it. If you've only worked with well-done objects, this point may not be obvious. Those of us who programmed before OO got popular remember the redundant validation code that needed to go in a lot of places to make certain that our data structures were set up properly.

Since that time, I have seen many systems where the programmers forgot this basic guarantee. Every time this guarantee is violated in the class, all of the client programmers who use this class have a lot more work on their hands.

I'm talking about the kind of class where you must call an initialize method or a series of set methods on the object immediately after construction, otherwise you aren't guaranteed useful or reliable results. Among other things, these kinds of objects are very hard for new programmers to understand. After all, what is actually required to be set up before the object is valid? There's almost no way to tell, short of reading all of the source of the class and many of the places where it is used.

What tends to happen in these cases is the new client programmer copies code from somewhere else that works and tweaks it to do what he/she needs it to do. This form of voodoo programming is one of the things that OO was supposed to protect us from. Where this really begins to hurt is when a change must be made to the class to add some form of initialization, how are you going to fix all of the client code written with it. Granted, modern IDEs can make some of this a little easier, but the point is that I, as the client of the class, will need to change the usage of the object possibly many times if the class implementation changes.

That being said, it is still possible to do some forms of lazy initialization that save time at construction time. But, the guarantee must still apply for a good class. After construction, the object must be valid and usable. If it's not, you don't have an object, you have a mass of data and behavior.

The other end of the object's lifetime is handled by a destructor. When an object reaches the end of it's life, the destructor is called undoing any work done by the constructor. In the case of objects that hold resources, the destructor returns those resources to the system. Usually, the resource is memory. But, sometimes there are other resources, such as files, database handles, semaphores, mutexes, etc.

If the object is not properly destroyed, then the object may not be accessible, but it doesn't really die. Instead, it becomes kind of an undead object. It haunts the memory and resource space of the process until recovered by the death of the whole process. I know, it's a little corny. But, I kind of like the imagery.

This concept also explains one of the problems I have with some forms of garbage collection. Garbage collection tends to assume that the only thing associated with an object is memory. And, as long as the memory is returned before you need it again, it doesn't really matter when the object dies. This means that we will have many of these undead objects in the system at any time. They are not really alive, but not yet fully dead. In some cases, you are not even guaranteed that the destructor, or finalizer will be called. As a result, the client programmer has to do all of the end of object clean up explicitly. This once again encourages voodoo programming as we have to copy the shutdown code from usage to usage throughout the system.

So keep in mind the importance of the lifetime of your objects. This is a fundamental feature of object oriented programming that simplifies the use of your classes, and increases their usefulness.

Posted by GWade at 12:16 PM. Email comments | Comments (0)

February 05, 2004

XML Data Representation

I had an interesting thought in an email conversation with a friend yesterday. One problem many people have when using XML for data is a misunderstanding of what the XML is.

(If you don't believe in the data in XML approach, feel free to ignore me.<grin/>)

It's easy to make the mistake of treating the XML as if it is the data when you are first learning to use XML this way. But it is really important to realize that the XML represents the data, it is not the same as the data.

You would never have problems with the concept that a line chart or pie chart is not the data, they are just representations of the data. XML is just another representation.

How does that help? In much the same way that you decide to add or remove information from a line chart to make it serve its purpose better, you can do the same with XML. Let's look at some of the representation only issues you consider when making a line chart. The most obvious information removed is the actual values. On a line chart the trends and relative levels appear to be more important. On the other hand, many line charts color is often added to provide differentiation between different kinds of data or different levels. Error bars are sometimes added to enhance your understanding of the fuzziness of the data.

All of these changes do not actually change the data, they just change the representation. In some cases, they might add implied information (error range, data grouping) or remove extra unneeded details (values). But, the data remains.

I have realized that the same is true of XML (when used for data). You may include structure or grouping in data that isn't evident in the raw values. You may add scaling or units that are implied in the original data. You can even add links to explanations of results. This allows for a richer representation of the data. So you really aren't limited to how you represented your data inside your application. I have sometimes marked up a data set with exactly those pieces of implied information that have always given me problems when communicating between programs. Since I was using XML as an interchange format, making the implied assumption explicit simplifies the overall project.

Posted by GWade at 05:45 PM. Email comments | Comments (0)

February 02, 2004

Review of the Java Cookbook

Java Cookbook
Ian F. Darwin
O'Reilly, 2001

In spite of the fact that it is supposedly written in the style of the Perl Cookbook, this book was a real disappointment. The cookbook format is intended for showing solutions to common problems. However, the author of this book appeared to be trying to force a tutorial and an API reference into a cookbook format. The result is not really a cookbook, or a tutorial, or a reference.

The chapter on threading was a particular disappointment. The author regularly misused the term deadlock and did not cover useful threading classes like Timer. If you know almost nothing about Java or are very rusty with the language (as I was when I read this book) it is possible to learn a few things from this book. However, I have to believe that it would be done better elsewhere.

Overall, I can't think of any reason to recommend this book.

Posted by GWade at 10:25 PM. Email comments | Comments (0)

February 01, 2004

More Debugging Information

In my weblog entry from a couple of days ago More Thoughts About Debugging, I forgot to add the information that prompted me to write in the first place.

I was looking around a few weeks ago and found a series of links providing various pieces of debugging and troubleshooting information.

The site The NEW Troubleshooters.Com, provides troubleshooting suggestions on a different subjects including computers and cars. I found the layout a little unusual, but the information was good.

The site Softpanorama Debugging Links provides a large number of individual pages aimed at different aspects of debugging. It includes information on some programs, some kinds of bugs, and a few other subjects, as well.

For a completely different approach to looking at debugging, the Algorithmic and Automatic Debugging Home Page focuses on automatic ways of finding bugs, formal methods, and research.

I was reading DDJ and ran into a reference for the Debugging rules! site. This site claims to be able to help you "Find out what's wrong with anything, fast." It does focus on basic debugging steps that could be applied to most problems. It also has a section devoted to debugging war stories that is
worth a read.

This ACM Queue article suggests that source code analysis tools might be able to reduce bugs in real code. Does not provide much proof, but does reference other articles that might. ACM Queue - Uprooting Software Defects at the Source - So many bugs, so few bug hunters. Fight back!

This ACM Queue survey article asks people what tools and techniques they use to debug. (ACM Queue - Another Day Another Bug - Which bugs make you want to call it quits?)

Finally, ACM Queue - Debugging in an Asynchronous World - How do you even begin to understand the behavior of asynchronous code? covers the topic of debugging asynchronous programs. This is a completely different kind of problem.

Posted by GWade at 10:15 AM. Email comments | Comments (0)