High Performance Readers-Writer Locks

Evgeniy Makeev

 

 

Everyone experienced in multithreaded programming is (or at least should be) familiar with the concept of Readers/Writer locks.

Readers/Writer lock is one of the fundamental concepts of multithreaded programming, it allows multiple ÔreadersŐ to access a shared data or execute a critical section of the code in parallel while granting an exclusive access to a synchronized data or the critical section of the code to a ÔwriterŐ.

 

 

 

If multiple application threads share a common object or a critical section of the code, three scenarios of the object access contention are possible:

1) all threads read the shared object which is never modified,

2) all threads modify the shared object and

3) some threads read and other threads modify the object.

 

In the first scenario no multithreaded synchronization is needed since the object is effectively constant and can be accessed safely in any concurrent manner.

The second scenario calls for an exclusive lock by a single thread – only one thread can modify the object at any given time; the rest should wait their turn. A simple mutex or Win32 Critical Section would be the most efficient means of synchronization in this case.

The third case often requires a more comprehensive solution – Readers/Writer lock.

 

A Readers/Writer Lock (or RWLock for short) is a mechanism that allows an arbitrary number of reader threads or alternatively, only one writer thread to access a shared object or resource at any given time. It insures the integrity of a shared resource in multithreaded environment while optimizing the applicationŐs access to it.

 

In most Readers/Writer scenarios, shared data is read more often then it is modified (written).

A dynamically configurable multithreaded network server application is an example of such a scenario. When the serverŐs thread serves a request it needs a read access to the server configuration, which is a resource shared between all server threads.

Ones in a while an administrator may want to reconfigure server on fly – without restarting it which in effect would modify the configuration data. One technique used to accomplish such dynamic reconfiguration requires usage of readers/writer locks.

If we were to use a simple mutex lock here we would pretty much loose most of the benefits of multithreading since only one application thread would be able to serve a request which requires accessing the shared configuration data.

 

Obviously, since the shared data is not modified very often, we want to optimize reader access in this type of applications.

There are many different Readers/Write Lock implementations ranging from the classic implementations, which use simpler OS synchronization primitives such as Critical Sections and Semaphores on Windows or UNIX pthread mutexes and conditional variables to using low level Spin Locks and raw interlocked operations.

Readers/Writer locks are the most beneficial when an application requires frequent read access to a data or a critical section and only occasional write access. And, as any synchronization object/concept, readers-writer locks introduce an extra overhead.

Unfortunately, in many classical readers-writer lock implementations this overhead disproportionally penalizes the locksŐ read operations, which are supposed to be lightweight and fast.

 

In this paper I am going to describe one simple readers-writer lock implementation that I have written over ten years ago for a high performance multithreaded Win32 network server.

 

This lock uses a combination of fast atomic operations and standard synchronization mutex (or Win32 critical section) to achieve fast reader locking as well as safe writer locking and to avoid writer starvation.

 

I will also evaluate performance of various readers/writer locks and compare them to the lock introduced in this paper.

 

While both operating systems provide atomic/interlocked functions, I decided to implement them in inline assembly for Intel CPU under Windows and Linux.

The primitives are: atomic increment/decrement, atomic compare and exchange, etc.

I will also use a standard exclusive lock provided by the corresponding operating system such as mutex or CriticalSection. The class SystemLock will wrap this locking functionality.

 

Each lock implementation will provide the following essential locking methods:

 

void enter_read();

void leave_read();

void enter_write();

void leave_write();

 

 

LetŐs look at NonblockingRWLock class below (the full source can be found here: http://makeev.org/rwlock/src/html/rwlock_8h-source.html):

 

class NonblockingRWLock

{

public:

   

    NonblockingRWLock() : readers(0), writers(0) {}

    virtual ~NonblockingRWLock() {}

//

// Must be called at the beginning of any read/data access.

//

    void enter_read()

    {

      atomic_inc(&readers);

      if(writers > 0)

      {

            atomic_dec(&readers);

            l.lock();

            atomic_inc(&readers);

            l.unlock();

      }

    }

 

//

// Must be called at the end of any read/data access.

//

    void leave_read()

    {

      atomic_dec(&readers);

    }

 

    //

    // Must be called at the beginning of a write operation

    //

    void enter_write()

    {

      l.lock();

      atomic_inc(&writers);

      while(readers) thread_yield();

    }

    //

    // Must be called at the end of a write operation

    //

    void leave_write()

    {

      atomic_dec(&writers);

      l.unlock();

    }

 

private:

    volatile int readers;

    volatile int writers;

    SystemLock l;

};

 

Figure 1.

 

Notice that reader lock/unlock do not use SystemLock if there is no contention with a writer, it means that reader locks are very fast and have a minimal overhead in the absence of a writer and they are only slightly slower then a regular mutex or critical section locking when there is the contention. The writer locks on another hand always use SystemLock and have a minimal overhead in addition to it.

 

Below, is the performance comparison of different readers/writer lock implementations.

The full sources of the locks as well as the testing framework and additional performance data can be found here:  http://makeev.org/rwlock/rwlocks.html

 


Figure 2.

Figure 3.

Figure 4.