This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

October 23, 2005

Unintuitive Multithreading: Troubleshooting

In the previous four articles in my Unintuitive Multithreading series, I focused mostly on design of a multithreaded application. This time, I plan to cover a few techniques that I have found helpful in troubleshooting multithreaded applications.

To some extent, the normal techniques you have used for troubleshooting will work in a multithreaded program. But, some of the problems in a multihreaded program will defy your normal tactics. A large fraction of these problems will turn out to be race conditions. By their nature, race conditions are hard to duplicate and are not easy to reason about. There are a set of models that I have found that sometimes shed light on these kinds of problems. These models are:

  • Lock Step
  • Leap Frog
  • Simultaneous Execution
  • Instruction Interruption

Each of these approaches helps you to visualize ways in which threads may interact to cause you problems.

Lock Step

The Lock Step model is fairly simple. Pick two threads that might interact and
mentally execute them one statement at a time, thread A, then thread B, then A,
then B... Some of the most classic race conditions are instantly identifiable
with this technique. For example, in trying to avoid the cost of a mutex, some programmers new to multithreading will try something like this.

  static bool locked = false;

while(locked) ; // wait on the lock

locked = true;
do_critical();
locked = false;

The lock step scenario shows the fallacy here immediately.

  1. Thread A executes the while.
  2. Thread B executes the while.
  3. Thread A sets locked to true.
  4. Thread B sets locked to true.
  5. Thread A executes the do_critical().
  6. Thread B executes the do_critical().
  7. Thread A sets locked to false.
  8. Thread B sets locked to false.

Without an understanding of multithreading, a junior programmer might convince himself that the code would work. And it will. Most of the time. Since it has to work all of the time, we have a problem.

Let's take this example a little farther. Say our junior programmer decides to get a bit more clever.


static int locked = 0;

while(locked++) // wait on the lock
--locked;

do_critical();
locked = 0;
non_critical();

This won't fall to the lock step model from before, but it will cause problems if we lockstep with an offset.

  1. Thread A executes the do_critical().
  2. Thread B executes the while(locked++).
  3. Thread A sets locked to 0.
  4. Thread B decrements locked to -1.
  5. Thread A executes the non_critical().
  6. Thread B executes the while(locked++) and continues to be locked out.

Walking through the code in lock step with a couple of threads can point out subtle timing issues in your MT code. The threads don't have to be executing the same code for this approach to work. There can be an offset as in the second example, or the threads could be executing completely separate pieces of code.

It should be obvious that if you don't do the lock step exactly correct, you may miss a race condition. This is why most race conditions succeed most of the time. The key to using this technique is to walk through the suspected code with an awareness for ways that things might go wrong. If something looks suspicious, change the offset between the threads and walk through again. You'll often find that you become more and more suspicious of a piece of code as you get closer to the right timing.

Leap Frog

The Leap Frog model is similar to lock step except that you execute more than one instruction at a time, with first one thread leading then the other leading. This approach is often useful in cases where using lock step had made you suspicious of some code but had not actually triggered a race condition.

   if(ptr)
{
void *tmp = ptr;
ptr = 0;
free( tmp );
}

  1. Thread A tests ptr.
  2. Thread B tests the ptr.
  3. Thread B saves ptr in tmp.
  4. Thread B resets ptr to null.
  5. Thread A saves ptr in tmp.
  6. Thread A resets ptr to null.
  7. Thread A executes the free() on a null pointer.
  8. Thread B executes the free() on the real pointer.

Running this in lockstep would have spotted a potential double free. Looking at this code in leap frog mode shows a different problem. It is possible to free() a null pointer which is undefined behavior in C and C++. Depending on which symptom you are trying to match, different scenarios help you to see different things.

Simultaneous Execution

On a single CPU machine, you are guaranteed that the CPU can only be executing 1 thread at a time. Even though it appears that two or more threads are executing simultaneously, the reality is that we are executing a little of one, then a little of the next, and so on. The machine loops through the list of runnable threads often enough that they all appear to be running at once to us.

On a multiple CPU machine, this is no longer true. You can have, at most, one thread per CPU executing at the same time. In other words, on a dual CPU machine, two threads can execute the following expression at exactly the same time.

unlocked++;

This can lead to a situation even more pathological than the lock step case we explored above. This is why some "multi-threaded" code works fine until the day it is run on a multiple CPU machine for the first time. There may have been race conditions that did not manifest because no threads were actually executing at exactly the same time. Once that situation occurred, these new race conditions manifested themselves.

Instruction Interruption

Many new MT programmers make the implicit assumption that each C expression is atomic. Actually, even on a single processor machine, we can be interrupted between any two assembler instructions. Here's an example to show what I mean.

  static int initialized = 0;
struct singleton onlyOnce;

if(!initialized++)
{
initialize( &onlyOnce );
}

The idea is that we only want to initialize the onlyOnce structure the first time through this code. The problem is that the if is much more complicated than it looks. In a kind of pseudo-assembler, the if would look something like this:

  move reg1, initialized
add reg2, 1, reg1
test reg1, 0
jumpne label

This piece of code could be interrupted before it starts, after it ends, or at any of the three points between instructions. Almost every C or C++ statement transforms into a set of instructions like this. Even on a single CPU machine, that means that you can be interrupted in the middle of almost any expression. Unlike many junior programmers seem to believe, ++i is no more atomic an operation than i = i + 1. (At least, you are not guaranteed that it will be any different.)

Summary

To summarize, many of the hardest problems to troubleshoot in a multithreaded application are race conditions. Race conditions are particularly hard to find, because they involve the interaction between two threads. This article lists four different ways to approach these interactions to help you see potential race conditions.

Posted by GWade at 04:21 PM. Email comments | Comments (0)

October 10, 2005

Review of Beyond the C++ Standard Library

Beyond the C++ Standard Library
Bjorn Karlsson
Addison-Wesley, 2006

If you've been programming in C++ in the past few years you've probably heard of the Boost library. Boost is a large peer-reviewed set of classes and libraries designed to augment the C++ standard library. Beyond the C++ Standard Library serves as an introduction to some of the classes and libraries available through the Boost project.

Although the book does not cover every library developed by the Boost community, it does introduce many of the most generally useful. Several of the libraries covered in this book have already been accepted by the C++ Standard committee as official extensions to the standard library under Technical Report 1 (TR1). So these libraries will soon be coming to a C++ compiler near you.

Almost every chapter began with two important sections:

  • How Does the X Library Improve Your Programs?
  • How Does the X Fit with the Standard Library?

The first section shows the benefits that you and your programs will gain from using this library. This section serves to describe the library's reason for existence. In some cases (like the shared_ptr classes), the purpose of the library is to replace hundreds of slightly incompatible, possibly broken, shared pointer implementations with a set that really work. In other cases (like the Lambda library), the purpose may be to provide you with functionality that you may not have even believed was possible in C++.

The second section shows how the library integrates with or enhances the functionality in the standard library. This may be the biggest difference between the Boost libraries and those written in your average programming shop. Many of the people that work with the Boost project were part of the C++ standards process. This means that they truly understand the library. These sections describe difficulties with the standard libraries that the library is meant to overcome or ways where the standard library can be improved with the benefit of almost a decade of real-world use.

These two sections are probably the most important part of each chapter. The rest of each chapter provides an overview of the library and some examples of usage. The overview lists the important members of each class with descriptions. The examples are generally useful and clear.

This book is definitely more than a rehash of the documentation at the Boost site. Each library is covered with enough information and examples to get you started working with the library.

If you are thinking about using the Boost libraries in any of your projects, I would recommend this book. I would also recommend the book if you are interested in becoming familiar with the TR1 additions that will soon be part of the standard.

Posted by GWade at 08:25 PM. Email comments | Comments (0)

October 06, 2005

To XML or not to XML...

I've been seeing a lot of comments in various forums similar to this comment by Christopher Diggins: XML down a slippery slope. In most of them, there is the implied belief that XML is the solution to all problems and that anything that violates the spirit of XML is to be condemned.

I have personally been working with XML since just after XML 1.0 became a recommendation. I've been through the everything in XML phase and come through it alive. The most important issue that people need to understand is that XML is just a data (or document) format; it is not a religion. It is a very useful data format. It can be a powerful way to represent some kinds of data, but it is not always the best way.

For example, I doubt that anyone would really recommend that we use the following XML anywhere:

<number type="integer" sign="positive">
  <thousand>2</thousand>
  <hundred>0</hundred>
  <ten>0</ten>
  <ones>5</ones>
</number>

Obviously, 2005 is more useful, even though marking up the number as above would be more in the spirit of XML. After all, you might need to do something specific with all numbers that have a 2 in the thousands position and you'll need a micro-parser to deal with this embedded format if you don't mark it up.

Some may consider this to be a bit of a strawman argument, but I would like to propose that it is actually one point along a continuum. Individual numbers and words obviously do not (necessarily) need to be marked up in XML. Just as obviously, a complicated, nested data structure or document greatly benefits from XML markup or something similar.

Diggins references a previous article, XML.com: Painting by Numbers with SVG, which covers a discussion on some of the reasons the SVG recommendation uses the micro format for the path element. According to the SVG Working Group, using an element-based path format could easily result in documents that were twice the size of the chosen approach. The working group decided that this was not acceptable.

In fact, their foresight has paid off. One of the areas where SVG has done very well is in mapping. Maps tend not to have many regular shapes. They are mostly built from paths. On those kinds of documents, the increase in size may be a factor of three or higher. In addition, some of these maps are very large. A factor of three for a multi-megabyte file is much different than a factor of three for a 10K file.

The important thing to remember is that XML is not a religion, it is only a tool we use in solving problems. Every tool involves tradeoffs. Sometimes we decide in the direction of purity of expression, sometimes we bow to practicality. While I personally don't particularly like the path element, I understand the tradeoffs involved. Just as importantly, I'm not sure I would have done differently in their shoes.

It may be useful to consider that the people who worked on these standards were not stupid. They thought and worked on the standard for a long time. Any compromises they made to purity were probably carefully considered. This does not mean that they are always right, but it does mean that second-guessing them without considering all of the use cases they considered might be a bit rash.

Posted by GWade at 07:14 AM. Email comments | Comments (0)