This site will look much better in a browser that supports web standards, but is accessible to any browser or Internet device.

Anomaly ~ G. Wade Johnson Anomaly Home G. Wade Home

October 23, 2005

Unintuitive Multithreading: Troubleshooting

In the previous four articles in my Unintuitive Multithreading series, I focused mostly on design of a multithreaded application. This time, I plan to cover a few techniques that I have found helpful in troubleshooting multithreaded applications.

To some extent, the normal techniques you have used for troubleshooting will work in a multithreaded program. But, some of the problems in a multihreaded program will defy your normal tactics. A large fraction of these problems will turn out to be race conditions. By their nature, race conditions are hard to duplicate and are not easy to reason about. There are a set of models that I have found that sometimes shed light on these kinds of problems. These models are:

  • Lock Step
  • Leap Frog
  • Simultaneous Execution
  • Instruction Interruption

Each of these approaches helps you to visualize ways in which threads may interact to cause you problems.

Lock Step

The Lock Step model is fairly simple. Pick two threads that might interact and
mentally execute them one statement at a time, thread A, then thread B, then A,
then B... Some of the most classic race conditions are instantly identifiable
with this technique. For example, in trying to avoid the cost of a mutex, some programmers new to multithreading will try something like this.

  static bool locked = false;

while(locked) ; // wait on the lock

locked = true;
do_critical();
locked = false;

The lock step scenario shows the fallacy here immediately.

  1. Thread A executes the while.
  2. Thread B executes the while.
  3. Thread A sets locked to true.
  4. Thread B sets locked to true.
  5. Thread A executes the do_critical().
  6. Thread B executes the do_critical().
  7. Thread A sets locked to false.
  8. Thread B sets locked to false.

Without an understanding of multithreading, a junior programmer might convince himself that the code would work. And it will. Most of the time. Since it has to work all of the time, we have a problem.

Let's take this example a little farther. Say our junior programmer decides to get a bit more clever.


static int locked = 0;

while(locked++) // wait on the lock
--locked;

do_critical();
locked = 0;
non_critical();

This won't fall to the lock step model from before, but it will cause problems if we lockstep with an offset.

  1. Thread A executes the do_critical().
  2. Thread B executes the while(locked++).
  3. Thread A sets locked to 0.
  4. Thread B decrements locked to -1.
  5. Thread A executes the non_critical().
  6. Thread B executes the while(locked++) and continues to be locked out.

Walking through the code in lock step with a couple of threads can point out subtle timing issues in your MT code. The threads don't have to be executing the same code for this approach to work. There can be an offset as in the second example, or the threads could be executing completely separate pieces of code.

It should be obvious that if you don't do the lock step exactly correct, you may miss a race condition. This is why most race conditions succeed most of the time. The key to using this technique is to walk through the suspected code with an awareness for ways that things might go wrong. If something looks suspicious, change the offset between the threads and walk through again. You'll often find that you become more and more suspicious of a piece of code as you get closer to the right timing.

Leap Frog

The Leap Frog model is similar to lock step except that you execute more than one instruction at a time, with first one thread leading then the other leading. This approach is often useful in cases where using lock step had made you suspicious of some code but had not actually triggered a race condition.

   if(ptr)
{
void *tmp = ptr;
ptr = 0;
free( tmp );
}

  1. Thread A tests ptr.
  2. Thread B tests the ptr.
  3. Thread B saves ptr in tmp.
  4. Thread B resets ptr to null.
  5. Thread A saves ptr in tmp.
  6. Thread A resets ptr to null.
  7. Thread A executes the free() on a null pointer.
  8. Thread B executes the free() on the real pointer.

Running this in lockstep would have spotted a potential double free. Looking at this code in leap frog mode shows a different problem. It is possible to free() a null pointer which is undefined behavior in C and C++. Depending on which symptom you are trying to match, different scenarios help you to see different things.

Simultaneous Execution

On a single CPU machine, you are guaranteed that the CPU can only be executing 1 thread at a time. Even though it appears that two or more threads are executing simultaneously, the reality is that we are executing a little of one, then a little of the next, and so on. The machine loops through the list of runnable threads often enough that they all appear to be running at once to us.

On a multiple CPU machine, this is no longer true. You can have, at most, one thread per CPU executing at the same time. In other words, on a dual CPU machine, two threads can execute the following expression at exactly the same time.

unlocked++;

This can lead to a situation even more pathological than the lock step case we explored above. This is why some "multi-threaded" code works fine until the day it is run on a multiple CPU machine for the first time. There may have been race conditions that did not manifest because no threads were actually executing at exactly the same time. Once that situation occurred, these new race conditions manifested themselves.

Instruction Interruption

Many new MT programmers make the implicit assumption that each C expression is atomic. Actually, even on a single processor machine, we can be interrupted between any two assembler instructions. Here's an example to show what I mean.

  static int initialized = 0;
struct singleton onlyOnce;

if(!initialized++)
{
initialize( &onlyOnce );
}

The idea is that we only want to initialize the onlyOnce structure the first time through this code. The problem is that the if is much more complicated than it looks. In a kind of pseudo-assembler, the if would look something like this:

  move reg1, initialized
add reg2, 1, reg1
test reg1, 0
jumpne label

This piece of code could be interrupted before it starts, after it ends, or at any of the three points between instructions. Almost every C or C++ statement transforms into a set of instructions like this. Even on a single CPU machine, that means that you can be interrupted in the middle of almost any expression. Unlike many junior programmers seem to believe, ++i is no more atomic an operation than i = i + 1. (At least, you are not guaranteed that it will be any different.)

Summary

To summarize, many of the hardest problems to troubleshoot in a multithreaded application are race conditions. Race conditions are particularly hard to find, because they involve the interaction between two threads. This article lists four different ways to approach these interactions to help you see potential race conditions.

Posted by GWade at 04:21 PM. Email comments | Comments (0)

September 25, 2005

Unintuitive Multithreading: Simpilicity

Continuing my exploration of misunderstandings about multithreading, this essay is about simplicity. If you are interested in the previous essays in this series, they are listed below:

This time I plan to start with an assertion that almost everyone will agree with and push it more than most people would. The assertion is fairly obvious:

Simple multithreaded code works better than complex multithreaded code.

It seems that most programmers make code more complicated than it needs to be. Sometimes, we get a chance to simplify or refactor complex code; but more often it just continues to get more complicated. This is as true of single-threaded code as multithreaded code. Sometimes that complexity is necessary because the problem that we are solving is complex. Sometimes the complexity is just because we did not have the time to make it simpler.

With single-threaded code, you have a chance of bulling through the complexity and forcing it to work even if a simpler solution is possible. I contend that you do not have this luxury with multithreaded code. It is said that the biggest limitation in developing code is the limited size of developers' brains. We can only understand so much complexity. A multithreaded application is usually much more complex than an equivalent single-threaded application (ignoring for the moment that an equivalent program may not be possible). The problem is in the number of interactions between threads, both intended and unintended.

Given this complexity problem, the safest approach is to make each thread as simple as possible. One good approach is to make each thread perform only one function. This is the same approach to complexity we have been using to improve functions and objects for years, so it should not come as much of a shock. However, I have seen quite a few designs where people load functionality into individual threads until they are small applications of their own. The idea of a worker thread or a processing thread in a pipeline is to separate the functionality into bite-sized chunks. If the functionality of a given thread is simple enough, you can actually understand the interactions between it and the other threads.

Although there may be an application where some threads need to be extremely complex with extensive communications between threads, this is much less likely than people seem to believe. There are a handful of proven threading approaches. In general, you are probably safer using one of them than creating your own unstudied approach. In many cases, using one supervisor or master thread to farm out work to simpler threads is a good solution. The pipeline model also works well for jobs that work well in stages.

An Example

One example that I can remember was a variation of the worker thread approach. A master thread parceled out work to a small set of worker threads that it created. To reduce the overhead of creating threads, the master thread separated the work into a separate queue for each thread. The worker threads would then process the work in the queue and place the output on separate output queue for each thread. Another thread (one per worker) would copy the data from this output queue to the main output queue. When the worker thread finished it's queue it would go back to the master thread for more work.

Given N worker threads, this approach uses 2N+2 threads:

  1. 1 master thread
  2. N worker threads
  3. N output queue copy threads
  4. 1 output thread

In addition, this approach used 2N+2 queues.

  1. 1 master input queue, only touched by the master thread
  2. N worker input queues
  3. N worker output queues
  4. 1 output queue

Some of the unneeded complexity results from having two queues for each worker thread, one for input and one for output. Each of these queues needed to be maintained by the worker thread, making the worker thread more complicated. In addition, the master thread was now more complicated by the need for configuration options for determining how much of the work should be partitioned to each worker at a time. Additionally, we had a series of output threads, that just took results off the individual output queues and placed them on one master output queue.

The (potential) advantage is that the individual threads' input and output queues need not be synchronized. However, this advantage was nullified by the fact that the main thread became a bottleneck for loading input queues and the main output queue still needed to be synchronized.

A simpler approach would be to reduce the number of queues. We only need one work queue and one result queue. Each of these would need to be synchronized, of course, because multiple threads would manipulate them at the same time. The worker threads are now much simpler. They read from the work queue and write to the output queue. The threads that copied from the individual output queues to the master output queue are now gone. This leaves us with N+2 threads and 2 queues.

This design is less flexible, but it is much easier to understand and maintain. It also makes better use of concurrency. Each worker thread processes one piece of work from the queue at a time. If it finishes its work early, it can get a new chunk from the work queue. The only time a worker thread will not be able to get more work is when all of the work is done or in process. In the original approach, a thread could finish all of its work while the other threads still had items waiting in their queues. This thread is now left idle, even though there is work to be done.

The real irony of this, is that the original design was actually configured to only place one work item at a time in the individual work queues. This means that the extra flexibility was not being used for the case we examined. The extra mechanisms and flexibility made the code more complicated with absolutely no benefit.

Redundant Work

Another mistake I see in multithreaded code is in having multiple threads doing exactly the same work. I don't mean doing the same work on different data, I mean doing the same work. Obviously, if one thread repeats the work done by another thread we've reduced the efficiency of our application.

Asynchronous I/O is one place where I often see this problem. In most cases, I see it as unnecessary overhead because the user code ends up duplicating part of the work that the OS has already done. The OS

  1. determines that a piece of I/O has completed,
  2. finds the thread that has to process the work
  3. and wakes up that thread.

The thread then

  1. looks at the information it is responsible for
  2. determines which of the outstanding I/O operations has now completed
  3. processes the I/O

The first two steps just repeat work that the OS has already done.

While this approach is necessary in the OS kernel or in some embedded systems code, it seems to be redundant when dealing with user-level code. I have seen many arguments for asynchronous I/O in user programs, but I have not yet been convinced.

Summary

This essay turned out to be longer than I intended. Explaining simplicity is harder than it looks. As I said in the beginning,

Simple multithreaded code works better than complex multithreaded code.

Keeping this in mind and striving for simplicity in each thread will make your multithreaded code easier to get right and to maintain.

Posted by GWade at 04:36 PM. Email comments | Comments (0)

September 03, 2005

Unintuitive Multithreading: Communication Between Threads

This essay continues my exploration of misunderstandings about multithreading. In the first essay of this series, Unintuitive Multithreading: Speed, I explained that multithreading does not inherently speed up a process. In the second essay Unintuitive Multithreading: Waiting for Performance, I showed how to achieve better performance through waiting. In this essay, I plan to attack a common cause of reduced concurrency that foils most multithreaded projects. This is going to be a long one, so hold on.

In most of the disappointing multithreading projects that I have seen, the most common problem is that the programmer does not understand the concept of concurrency. Many programmers try to multithread code by haphazardly partitioning the code into separate threads without a clear plan. Then, they sprinkle mutexes around until the code appears to work. Programs built this way never work consistently and very rarely run much faster than the single-threaded version would have (or did).

A key concept of concurrent programs is that any communication between threads reduces concurrency. Said another way, multithreaded programs are most efficient when there is no communication between the threads. Now obviously, very few programs would be useful if there is no communication between the threads. If we want to be really precise, there should be as little synchronous communication between threads as possible. The more synchronous communication we have, the worse the performance will be. Whenever we have synchronous communications, we require one or more threads to wait for other threads.

Any time one thread may need to wait for another to finish some activity, the threads are said to synchronize. Obviously, when one or more threads are waiting for another thread to complete some action, they are not running concurrently with that thread. Although this statement is obvious, it is still worth pointing out because the larger the number of synchronization points in the code, the lower the overall potential concurrency in the program. This also explains why the common approach of scattering mutexes through code to make it multithreaded does not work well in practice.

Reduce Explicit Synchronization

To carry this point through a little further, we need to reduce synchronous communications between threads to increase concurrency. The standard Producer/Consumer model is a good place to start. The idea is to have Producer threads generating chunks of work that are placed in a protected queue. Now, one or more Consumer threads can take work items out of the queue and work on them. This asynchronous communication saves the Producers from waiting until a Consumers is ready. The only time Consumers will wait is when there is no work to do. The one synchronization point is the queue, itself. If the system is written correctly, putting something into the queue or taking something off the queue should not take much time, so no thread should have to wait for long on that operation. Other than this one point, there should be no further communication between the Producers and the Consumers. In addition, no two Consumers should communicate with each other.

You normally see across two special cases of the Producer/Consumer model. One has a single Producer thread and many Consumer threads. The other has many Producer threads and one Consumer thread. Depending on the amount of work each thread needs to do, both of these variations work quite well.

A common model uses a master thread to pass work to worker threads through a work queue. This approach works particularly well if the workers can complete their tasks independently of the master or the other workers. (Think web server.)

Watch for Implicit Synchronization

Possibly the most surprising form of communication between threads is implicit synchronization caused by system calls or library calls. The one that bites people the most often is probably memory allocation. Many memory allocation libraries have a single memory arena shared by all threads. Therefore, any time two threads allocate memory at the same time, one will have to wait. The library must synchronize access to the internal memory structures, otherwise the heap could be corrupted by two or more threads changing those structures at the same time.

There are a few ways to avoid this overhead. Some systems provide a special multithreaded memory allocator that keeps separate memory accounting for each separate thread. While this minimizes synchronization, it may waste a lot of memory. A second approach is to write your own thread-specific allocator for critical classes. This is relatively hard to do correctly. The final approach is to try to reduce the amount of dynamic allocation you do in the threads. For example, if most of your objects are built on the stack, you won't have a the synchronization problem. If you use the worker thread model above, you can strive to have the master thread do the allocations and have the worker threads focus on working with the allocated memory.

There is no single, ideal solution to the implicit synchronization problem. You will need to measure and examine your costs and benefits, and make the appropriate trade-offs.

If you are working in Java, the memory allocator is not really under your control. You also don't have the option of building objects on the stack. On the bright side, you can hope that the memory allocator has been written with threads in mind.

Don't Synchronize on Read-only Resources

Probably the most useless form of synchronization I am aware of is providing mutex protection for a read-only resource. If the resource does not change, no synchronization is needed. There is no way for the resource to be in an indeterminate state, because it does not change.

Let's say for the sake of argument that someone writes a program that uses a dictionary to convert keywords into a set of other words to use for further processing. If this dictionary is read from disk at program startup and never changes, then the various threads in the program can all access it at the same time without synchronization and there is no problem. As long as no thread ever modifies the dictionary when other threads could be using it, a mutex is a waste of time.

On the other hand, if this dictionary is updated by any thread, it will need some protection. If the updates are relatively infrequent compared to the number of reads, a read/write lock can provide a much smaller overhead for multiple readers and only really cost if a thread needs to write.

This shows one way to recognize resources that do not need to be protected from asynchronous accesses. If a resource is never written after the threads are started, it doesn't need protecting. If a resource is only infrequently written, protect it with a read/write lock. If a resource is written regularly by multiple threads, protect it with a mutex.

Summary

One of the biggest enemies of multithreading performance is synchronous communication between threads. Any time threads must synchronize to do their jobs they are not running concurrently. This reduces performance. This is the fundamental reason why putting mutexes throughout a program is not the way to make a multithreaded program.

Posted by GWade at 11:42 PM. Email comments | Comments (0)

August 23, 2005

Unintuitive Multithreading: Waiting for Performance

This essay continues my exploration of misunderstandings about multithreading. In the first essay of this series, Unintuitive Multithreading: Speed, I explained that multithreading does not inherently speed up a process. In this essay, I plan to show how to not achieve more performance from a multithreaded system.

Many new multithreading (MT) developers make the same mistake when trying to speed up their first MT application. Assume a new MT programmer named George has just rewritten his program to use threads and finds that the performance is not much better than it was before threads. What is he going to do? George digs through the documentation for his threading library and finds that something called 'priority' determines which threads get executed before other threads. Obviously, he just needs to raise the priority of his most important thread and all will be well.

Unfortunately, George finds that this does not work as he expects. Sometimes his code executes a little faster, but sometimes it executes a lot slower. Once or twice it even locks up and has to be terminated externally. Like many first-timers, George immediately begins tinkering with the priorities of other threads. Each change results in similar inconsistent results. What could he possibly do to fix the problem?

Although George is not a real person, the example follows a progression I've seen several times. The original problem was either that the program was not well-suited to multithreading or George made a mistake partitioning the code into threads. Neither of these would be fixed by changing priorities.

After dealing with this problem time and time again, I've come to the conclusion that the first rule of thread priority is:

1. Don't change the priority of your threads.

Two likely results of changing thread priorities are priority inversion and thread starvation. Both occur because a thread has a higher priority than it should in the context of the surrounding program. This leads directly to the second rule of thread priority:

2. If you are absolutely sure that you know what you are doing and still want to raise a thread's priority, you are wrong.

If you think this rule is harsh, you haven't dealt with as many well-meaning, over-confident Georges as I have. In actuality, this rule is similar to the rule about avoiding hand-optimizing code. Modern compilers do a much better job of tweaks to improve the run-time of code than most programmers ever can hope to. Likewise, many modern threading systems perform minor tweaks to a thread's priority as needed. If you mess with a thread's priority, you will probably defeat that optimization.

The problem is that programmers are, in general, no better at recognizing priority issues than they are at recognizing the 10% of their code that consumes most of the running time. To make this point, I'll use an example we used to teach in one of my programming classes to explain this. Let's say I have a program that consists of three threads. All three threads must complete before the program is complete. At a high level, the threads are

  • Thread A does pure computations. It is just dependent on the CPU.
  • Thread B is writing data to the hard disk.
  • Thread C is writing reports directly to an old printer.

This is obviously contrived, but humor me so we can figure out how to set the priorities for these threads. Now George looks at this problem and decides that thread A is doing the most work and requires the CPU to do it, so it should have the highest priority. The printer will take forever to write anything, so he decides that thread C should be the lowest priority. That way no other thread will be waiting on it. That puts thread B as with the middle priority.

Once again, George has chosen his priorities in a way that makes some sense, but does not work well for a multithreading problem. In this particular case, we will generate the worst possible runtime. Thread A will run to completion with the least number of interruptions (and context switches). Then the program will spend a lot of time waiting and doing nothing as each of the other two threads slowly send out their data. Remember that the goal was to get all of the threads to complete as quickly as possible, not to get the calculation over with as fast as possible.

The key to this problem is to make the thread that does the least work in between blocking calls the highest priority. Multithreaded apps that make use of the hurry up and wait principle are often the best performers. So thread C needs to send individual bytes down to the printer and wait a long time for the printer to do it's thing. So we want to start on this task as soon as possible every time we can write. Thread B can write more to disk faster than our printer thread, so it should be lower priority than thread C. That way if both are ready to work, C will get a chance to start first and go back to waiting.

The lowest priority is now thread A. It will now be interrupted more and will therefore have a longer run-time. But, thread A's time will be interleaved with the waiting for the other two threads. So the overall performance is better. This is one of the surprising effects that makes MT worthwhile even though it slows down CPU-bound threads.

The bad news is that most MT problems are not this easy to characterize, so there may not be a clear-cut answer. The threading system can monitor the performance dynamically and optimize better than we could. This shows once again that we are better off leaving priorities to the threading system.

The third rule of thread priority is likely to confuse more people than the previous two combined.

3. If you must change a thread's priority, lower it to speed up the code.

Most modern threading systems will automatically raise the priority of a thread that spends most of its time waiting. They will also lower the priority of a thread that uses all of its time-slice. If your system does not do this, consider lowering the priority of any CPU-bound threads by a small amount to allow blocked threads to get run and go back to waiting. It is often surprising how this can produce better performance.

This will only really work if the MT code is structured correctly. We'll look into that issue a little more next time.

Posted by GWade at 10:46 PM. Email comments | Comments (0)

August 20, 2005

Unintuitive Multithreading: Speed

This begins a short series of essays on what most programmers get wrong about multithreading. Over and over again, I've seen programmers make the same mistakes in multithreaded applications on multiple operating systems and in multiple languages. So I decided to give you the benefits of what little insight I have into the problem.

The first problem most programmers have with multithreading is extremely fundamental: multithreading an application does not speed it up. In fact, the decision to multithread a program is rarely about raw speed. I expect that some of the people reading this are going to decide that I have no idea what I'm talking about at this point. However, I can back up this statement.

Given a single CPU machine and process that is CPU-bound, adding threads must slow down the program. In addition to the work that we were already doing, we now have to spend extra time doing context switches. Therefore, for the CPU-bound program, threading is guaranteed to slow down the program. (Of course, this only applies to preemptive multithreading.)

If raw speed is not the issue, why would we multithread a program? On a single CPU system, there are only two real reasons to use threads.

  • responsiveness
  • resources that block

Responsiveness

The main reason that preemptive multitasking has become popular in recent years is responsiveness. One process or thread should not be able to lock up the whole machine when the user wants access. The average user does not care if some process running in the background takes a few seconds longer to run as long as the computer responds immediately when a key is pressed or the mouse is moved. Although the computer is actually doing things slower, it feels faster because the computer is more responsive.

Many years ago, I was working in medical research. There was a program written by one of the programmers that normally ran over the weekend. Unfortunately, the guy that wrote it did not write any output to the screen or disk until the program was finished running. So, on Monday morning, we were often confronted with a blank screen and we didn't know if the program was still running, locked up, or what. We had no feedback at all. (This was back in the days of DOS, so we couldn't do anything else while the program was running either.) Eventually we changed the program to add a little progress counter to the screen, so we could at least tell if the program was still running. We used to joke about how much faster this made the program. Even though we knew the extra output was slowing down the real work, the feedback made it seem faster.

This concept is what drives most multithreading development. We want instant feedback and responsive computers. Things can actually run slower, as long as we get these two things.

Blocking

The other reason for multithreading is that not all processes are CPU-bound. At some time, most programs must do something that blocks progress. We might need to read data from disk to do further calculations, or retrieve data from a database, or even get user input. While this process or thread is blocked, it would be nice to allow the CPU to do other work. Threading makes this possible. (Actually, we used to do this with a cruder form of multitasking years ago. But, threads do make it easier.)

With multithreaded systems, the CPU can ignore threads that are blocked and continue with threads that are ready for work. This makes better use of the CPU and allows us to appear to be doing more than one thing at a time. In fact, by carefully organizing our code to keep the processor busy, we can appear to be running faster by keeping the CPU busy. In this case, we are not really speeding up the code. We are just no longer wasting CPU cycles.

Multiple CPUs

Of course, the entire discussion above assumes a single CPU. If we have multiple CPUs in a system, the situation is only slightly different. We can get more work done at a time because there is more than one CPU, but the problem remains the same. We can not get more work done than we have CPUs.

In a later essay, I'll delve into a piece of advice I received a long time ago that you should never have more threads than you have CPUs (plus a few extra to to take advantage of blocking calls). This turns out to be like most multithreading advice, part right, but mostly wrong.

Posted by GWade at 10:17 PM. Email comments | Comments (0)

December 01, 2004

Mutexes Protect Resources

This is a very simple concept that seems to escape many people. In fact, most of the multi-threaded code I've seen shows a distinct misunderstanding of this basic fact. Very often people seem to think that a mutex should protect a section of code. I think this misconception may have been caused by the term critical section or critical region.

In my computer science courses, the concept of a critical region was used to describe a section of the code that must not be interrupted. Somewhat later, this concept was weakened to say that we did not want some certain other sections of code to execute at the same time as this one. In some forms of programming like embedded systems, real-time systems, and some kernel development, you really have sections of code that are timing sensitive and cannot be interrupted. Most of the rest of us don't really have those kinds of limitations.

In most cases, what we really need is a way to prevent two threads from performing conflicting operations on some resource. The aim is the same, but the focus is different. This change in focus helps solve many of the mutex-related problems I have seen in multi-threaded code.

Other than just forgetting to unlock a locked mutex, almost all of the mutex-related problems I've seen fall into three major categories:

  1. Not locking everywhere locking is needed.
  2. Trying to do too much inside the lock.
  3. Smearing mutexes on some code to make it thread-safe.

Each of these bugs can be traced directly back to not connecting the mutexes with a resource. The first problem is usually caused by locking code that changes a resource without properly locking the code that reads the resource. I've also seen two (or more) mutexes used to lock different forms of access to the resource. This makes sense if you are preventing a particular piece of code from executing in two threads at once, but it does not properly protect the resource.

The second problem is often shown by either acquiring too many mutexes or attempting to execute a large amount of code without interruption by holding a mutex. Another symptom of this problem is acquiring the mutex in one function/method and then having to remember to call another function to release it. This is often caused by trying to protect code instead of a resource.

The last problem is particularly a problem for people who only vaguely understand what goes on in a multi-threaded program. These programmers start out by adding a mutex to prevent some problem that they are seeing and acquiring and releasing that mutex in various places until the program seems to work. When another problem comes along, they repeat this "successful" strategy to solve the new problem. Eventually, the code is peppered with mutexes, locks, and unlocks, and no one can predict how it will work.

In trying to solve these problems, you normally have to apply some relatively simple rules to your use of mutexes.

  1. Only acquire the lock when necessary.
  2. Acquire the lock everywhere that is necessary.
  3. Don't hold the lock longer than necessary.
  4. Don't acquire more locks than you need.

By focusing on the resource instead of the code that we don't want to interrupt, a simpler set of rules arises.

  1. One one mutex per shared resource.
  2. Acquire the lock everywhere the shared resource is accessed.
  3. Don't hold the lock longer than necessary.

The first important point is that only shared resources need protection. If the resource is never accessed from more than one thread, it doesn't need this kind of protection. The second important point is that access to the shared resource should be controlled through a small number of methods. This allows you to control the acquisition/release of the mutex much more carefully. If you can't control access to the shared resource and must have acquire/release code scattered around, then you need to revisit your design.

In my experience, the simpler the use of shared resources and their associated mutexes, the more likely you are to get them right. In single-threaded code, a simpler design and implementation is more likely to be correct. Simplicity is even more important when multiple threads are involved. Using mutexes only to protect resources seems to generate a simpler design.

Posted by GWade at 08:28 AM. Email comments | Comments (0)

January 15, 2004

Thread death

There is a subtle aspect of threads that may be worth exploring having to do with the concept of joining. Many threading systems support the possibility of waiting until a thread completes in order to retrieve its exit state. This is often started through a function called join(). In order to support this feature, a terminating thread may remain in memory until it is joined. This allows the join() function to work the same way (except without suspending) on a thread that is running or on one that has already finished.

Unfortunately, these dead threads consume system resources. At a minimum, they are taking up a spot in the thread maintenance data structures in the OS. They may also retain memory or other resources. In order to clean up these resources, the thread must be join()ed.

This can be inconvenient. In some cases, the threads are launched without any need to know how they exited. Some threading systems allow setting the state of a thread such that it is not joinable. This means that when it finishes executing, the thread will be discarded automatically.

A related issue has to do with exitting the program and running threads. In a multithreaded program, how does the system decide when to exit the process that contains these threads? There are basically three approaches:

  • The process exits when all threads exit.
  • The process exits when the main thread exits.
  • The process exits when all important threads exit.

The first can be inconvenient, even if it is easy to understand. The second is easy to mess up. For example, you can launch all of your worker threads in the main thread and then exit because the main thread has nothing left to do. Suddenly, the program is gone and no work got done. This can be solved by having the main thread wait for some signal from other threads that it is time to shut down.

The Java threading library takes a different approach. Normally, a program runs until all of its threads are dead. However, a thread can be marked as a daemon. In the Java terminology, the process exits when all non-daemon threads are dead. This allows one to choose any of the other two approaches or something else entirely.

The only problem I have with this approach is the name. A daemon thread or process has certain connotations on a Unix-like system. I don't think the fact the thread will not keep the program alive is the defining characteristic of a daemon.

Posted by GWade at 11:03 AM. Email comments | Comments (0)

Worker thread patterns

Any system that uses the worker pattern may also want a dispatcher thread that wakes up workers and sends them on their way. In this approach, the dispatcher thread handles setting up the data needed to finish the task. The dispatcher may either choose a worker from a pool of suspended threads or create a new thread to perform the work. Many server programs use a similar approach. In that case, the dispatcher thread may be waiting on a socket and passes the request to a worker when the request is received.

Another approach would be to make all of the threads identical (no dispatcher thread). The worker threads could wait on a synchronized work queue object or socket. As each request is received, a thread is awakened by the OS and given control of the request. Although it sounds a little strange not to have control over which thread does what work, this can actually be a very efficient way to work.

This latter approach is called the team model in Tanenbaum's Modern Operating Systems.

Posted by GWade at 10:58 AM. Email comments | Comments (0)

Threading Patterns

Possibility of thinking of multithreading design patterns as a way to organize threading code.

Some possible patterns include:

  • Fire and Forget (background thread)
  • Periodic (Timer)
  • IO thread
  • Producer/Consumer
  • Worker thread (work queue)
  • Copier thread
  • Pipeline (must be careful)
  • Swarm?

The background thread could be joined at a later time or a thread that runs it's course and dies.

We need a good term for the latter. Java calls them daemon threads. I don't think the term conjures the right connotations.

Posted by GWade at 10:55 AM. Email comments | Comments (0)