Programmer Musings: September 2005 Archives

This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

September 26, 2005

Review of Code Reading

Code Reading
Diomidis Spinellis
Addison-Wesley, 2003

I have had this book on my list of books to read for some time. Although I have been reading code for years, everything I had heard about this book suggested that it would help improve my reading skills. Unfortunately, halfway through the book I was mostly disappointed. This is not to say that the book had not covered loads of useful information, it was just stuff I had already found on my own.

As I progressed through the book, however, I found more and more useful tips and suggestions. Although a few were things I had never seen before, many were techniques I had not used in a while and had forgotten. By the end of the book, I am firmly of the belief that this book is a must-read. Even if you find parts of it to be obvious, there is enough material to extend almost everyone's understanding.

The book starts off with basic techniques, recognizing language constructs and coding idioms. It then moves on to data types and more advanced control flow. Eventually we pass through documentation and architecture, on our way to tools that help your code reading. A few of the pieces of advice are more useful for more junior code readers. Such as, pointing out that

for(i = 0; i < len; i++)

should be read as loop len times. You should be able to recognize this without actually parsing the code. The author also spends time unraveling boolean expressions and showing how changing the whitespace in a complicated expression may make its meaning more obvious.

It also covers more advanced advice like reminding you not to expect a single over-arching architecture in a piece of code. You may spend time trying to find something that is just not there. The book also covers the important trade-offs involved in using documentation attached to the code. There are also wonderful maxims like

An hour of code reading can save you a minute of reading the documentation.

and

Documentation may often provide a misleading view of the source code.

In addition to documentation and the code itself, Spinellis supplies advice on using comments in the code and the change logs under version control to help understand the code. He does not suggest that the comments and change logs should be taken as the truth. However, he does point out that they may help to clarify the purpose or intent of the code.

One of the most fundamental points made by the book is that how you read code depends on what you are trying to do with it. If you are trying to repair or modify a small piece of a large body of code, you don't need to understand the whole thing. I know that I am personally guilty of trying to get deeper into code than is necessary. On the other hand, the book shows examples of needing to dig further when your understanding does not match the program.

The book includes a relatively extensive list of references that might be worth reviewing to extend your knowledge of this field further. The book comes with a CD containing a reasonably large body of open source software that you can use to practice your code reading skills. The exercises in the book are geared toward using that code base.

This book is a must read for any junior or intermediate software developer. If you need to maintain anyone else's code, this book gives lots of strategies to make your work easier. Despite my initial impression, this book provides an good refresher for more senior developers. It does a very good job of reminding you of techniques that you may not have used or may not have used recently.

Posted by GWade at 06:45 AM. Email comments | Comments (0)

September 25, 2005

Unintuitive Multithreading: Simpilicity

Continuing my exploration of misunderstandings about multithreading, this essay is about simplicity. If you are interested in the previous essays in this series, they are listed below:

Unintuitive Multithreading: Speed explained that multithreading does not inherently speed up a process.
Unintuitive Multithreading: Waiting for Performance showed how to achieve better performance through waiting.
Unintuitive Multithreading: Communication Between Threads discussed a common cause of reduced concurrency in multihreaded code.

This time I plan to start with an assertion that almost everyone will agree with and push it more than most people would. The assertion is fairly obvious:

Simple multithreaded code works better than complex multithreaded code.

It seems that most programmers make code more complicated than it needs to be. Sometimes, we get a chance to simplify or refactor complex code; but more often it just continues to get more complicated. This is as true of single-threaded code as multithreaded code. Sometimes that complexity is necessary because the problem that we are solving is complex. Sometimes the complexity is just because we did not have the time to make it simpler.

With single-threaded code, you have a chance of bulling through the complexity and forcing it to work even if a simpler solution is possible. I contend that you do not have this luxury with multithreaded code. It is said that the biggest limitation in developing code is the limited size of developers' brains. We can only understand so much complexity. A multithreaded application is usually much more complex than an equivalent single-threaded application (ignoring for the moment that an equivalent program may not be possible). The problem is in the number of interactions between threads, both intended and unintended.

Given this complexity problem, the safest approach is to make each thread as simple as possible. One good approach is to make each thread perform only one function. This is the same approach to complexity we have been using to improve functions and objects for years, so it should not come as much of a shock. However, I have seen quite a few designs where people load functionality into individual threads until they are small applications of their own. The idea of a worker thread or a processing thread in a pipeline is to separate the functionality into bite-sized chunks. If the functionality of a given thread is simple enough, you can actually understand the interactions between it and the other threads.

Although there may be an application where some threads need to be extremely complex with extensive communications between threads, this is much less likely than people seem to believe. There are a handful of proven threading approaches. In general, you are probably safer using one of them than creating your own unstudied approach. In many cases, using one supervisor or master thread to farm out work to simpler threads is a good solution. The pipeline model also works well for jobs that work well in stages.

An Example

One example that I can remember was a variation of the worker thread approach. A master thread parceled out work to a small set of worker threads that it created. To reduce the overhead of creating threads, the master thread separated the work into a separate queue for each thread. The worker threads would then process the work in the queue and place the output on separate output queue for each thread. Another thread (one per worker) would copy the data from this output queue to the main output queue. When the worker thread finished it's queue it would go back to the master thread for more work.

Given N worker threads, this approach uses 2N+2 threads:

1 master thread
N worker threads
N output queue copy threads
1 output thread

In addition, this approach used 2N+2 queues.

1 master input queue, only touched by the master thread
N worker input queues
N worker output queues
1 output queue

Some of the unneeded complexity results from having two queues for each worker thread, one for input and one for output. Each of these queues needed to be maintained by the worker thread, making the worker thread more complicated. In addition, the master thread was now more complicated by the need for configuration options for determining how much of the work should be partitioned to each worker at a time. Additionally, we had a series of output threads, that just took results off the individual output queues and placed them on one master output queue.

The (potential) advantage is that the individual threads' input and output queues need not be synchronized. However, this advantage was nullified by the fact that the main thread became a bottleneck for loading input queues and the main output queue still needed to be synchronized.

A simpler approach would be to reduce the number of queues. We only need one work queue and one result queue. Each of these would need to be synchronized, of course, because multiple threads would manipulate them at the same time. The worker threads are now much simpler. They read from the work queue and write to the output queue. The threads that copied from the individual output queues to the master output queue are now gone. This leaves us with N+2 threads and 2 queues.

This design is less flexible, but it is much easier to understand and maintain. It also makes better use of concurrency. Each worker thread processes one piece of work from the queue at a time. If it finishes its work early, it can get a new chunk from the work queue. The only time a worker thread will not be able to get more work is when all of the work is done or in process. In the original approach, a thread could finish all of its work while the other threads still had items waiting in their queues. This thread is now left idle, even though there is work to be done.

The real irony of this, is that the original design was actually configured to only place one work item at a time in the individual work queues. This means that the extra flexibility was not being used for the case we examined. The extra mechanisms and flexibility made the code more complicated with absolutely no benefit.

Redundant Work

Another mistake I see in multithreaded code is in having multiple threads doing exactly the same work. I don't mean doing the same work on different data, I mean doing the same work. Obviously, if one thread repeats the work done by another thread we've reduced the efficiency of our application.

Asynchronous I/O is one place where I often see this problem. In most cases, I see it as unnecessary overhead because the user code ends up duplicating part of the work that the OS has already done. The OS

determines that a piece of I/O has completed,
finds the thread that has to process the work
and wakes up that thread.

The thread then

looks at the information it is responsible for
determines which of the outstanding I/O operations has now completed
processes the I/O

The first two steps just repeat work that the OS has already done.

While this approach is necessary in the OS kernel or in some embedded systems code, it seems to be redundant when dealing with user-level code. I have seen many arguments for asynchronous I/O in user programs, but I have not yet been convinced.

Summary

This essay turned out to be longer than I intended. Explaining simplicity is harder than it looks. As I said in the beginning,

Simple multithreaded code works better than complex multithreaded code.

Keeping this in mind and striving for simplicity in each thread will make your multithreaded code easier to get right and to maintain.

Posted by GWade at 04:36 PM. Email comments | Comments (0)

September 03, 2005

Unintuitive Multithreading: Communication Between Threads

This essay continues my exploration of misunderstandings about multithreading. In the first essay of this series, Unintuitive Multithreading: Speed, I explained that multithreading does not inherently speed up a process. In the second essay Unintuitive Multithreading: Waiting for Performance, I showed how to achieve better performance through waiting. In this essay, I plan to attack a common cause of reduced concurrency that foils most multithreaded projects. This is going to be a long one, so hold on.

In most of the disappointing multithreading projects that I have seen, the most common problem is that the programmer does not understand the concept of concurrency. Many programmers try to multithread code by haphazardly partitioning the code into separate threads without a clear plan. Then, they sprinkle mutexes around until the code appears to work. Programs built this way never work consistently and very rarely run much faster than the single-threaded version would have (or did).

A key concept of concurrent programs is that any communication between threads reduces concurrency. Said another way, multithreaded programs are most efficient when there is no communication between the threads. Now obviously, very few programs would be useful if there is no communication between the threads. If we want to be really precise, there should be as little synchronous communication between threads as possible. The more synchronous communication we have, the worse the performance will be. Whenever we have synchronous communications, we require one or more threads to wait for other threads.

Any time one thread may need to wait for another to finish some activity, the threads are said to synchronize. Obviously, when one or more threads are waiting for another thread to complete some action, they are not running concurrently with that thread. Although this statement is obvious, it is still worth pointing out because the larger the number of synchronization points in the code, the lower the overall potential concurrency in the program. This also explains why the common approach of scattering mutexes through code to make it multithreaded does not work well in practice.

Reduce Explicit Synchronization

To carry this point through a little further, we need to reduce synchronous communications between threads to increase concurrency. The standard Producer/Consumer model is a good place to start. The idea is to have Producer threads generating chunks of work that are placed in a protected queue. Now, one or more Consumer threads can take work items out of the queue and work on them. This asynchronous communication saves the Producers from waiting until a Consumers is ready. The only time Consumers will wait is when there is no work to do. The one synchronization point is the queue, itself. If the system is written correctly, putting something into the queue or taking something off the queue should not take much time, so no thread should have to wait for long on that operation. Other than this one point, there should be no further communication between the Producers and the Consumers. In addition, no two Consumers should communicate with each other.

You normally see across two special cases of the Producer/Consumer model. One has a single Producer thread and many Consumer threads. The other has many Producer threads and one Consumer thread. Depending on the amount of work each thread needs to do, both of these variations work quite well.

A common model uses a master thread to pass work to worker threads through a work queue. This approach works particularly well if the workers can complete their tasks independently of the master or the other workers. (Think web server.)

Watch for Implicit Synchronization

Possibly the most surprising form of communication between threads is implicit synchronization caused by system calls or library calls. The one that bites people the most often is probably memory allocation. Many memory allocation libraries have a single memory arena shared by all threads. Therefore, any time two threads allocate memory at the same time, one will have to wait. The library must synchronize access to the internal memory structures, otherwise the heap could be corrupted by two or more threads changing those structures at the same time.

There are a few ways to avoid this overhead. Some systems provide a special multithreaded memory allocator that keeps separate memory accounting for each separate thread. While this minimizes synchronization, it may waste a lot of memory. A second approach is to write your own thread-specific allocator for critical classes. This is relatively hard to do correctly. The final approach is to try to reduce the amount of dynamic allocation you do in the threads. For example, if most of your objects are built on the stack, you won't have a the synchronization problem. If you use the worker thread model above, you can strive to have the master thread do the allocations and have the worker threads focus on working with the allocated memory.

There is no single, ideal solution to the implicit synchronization problem. You will need to measure and examine your costs and benefits, and make the appropriate trade-offs.

If you are working in Java, the memory allocator is not really under your control. You also don't have the option of building objects on the stack. On the bright side, you can hope that the memory allocator has been written with threads in mind.

Don't Synchronize on Read-only Resources

Probably the most useless form of synchronization I am aware of is providing mutex protection for a read-only resource. If the resource does not change, no synchronization is needed. There is no way for the resource to be in an indeterminate state, because it does not change.

Let's say for the sake of argument that someone writes a program that uses a dictionary to convert keywords into a set of other words to use for further processing. If this dictionary is read from disk at program startup and never changes, then the various threads in the program can all access it at the same time without synchronization and there is no problem. As long as no thread ever modifies the dictionary when other threads could be using it, a mutex is a waste of time.

On the other hand, if this dictionary is updated by any thread, it will need some protection. If the updates are relatively infrequent compared to the number of reads, a read/write lock can provide a much smaller overhead for multiple readers and only really cost if a thread needs to write.

This shows one way to recognize resources that do not need to be protected from asynchronous accesses. If a resource is never written after the threads are started, it doesn't need protecting. If a resource is only infrequently written, protect it with a read/write lock. If a resource is written regularly by multiple threads, protect it with a mutex.

Summary

One of the biggest enemies of multithreading performance is synchronous communication between threads. Any time threads must synchronize to do their jobs they are not running concurrently. This reduces performance. This is the fundamental reason why putting mutexes throughout a program is not the way to make a multithreaded program.

Posted by GWade at 11:42 PM. Email comments | Comments (0)