Array Based Queuing Locks

Array-Based Queuing Lock (ABQL) is an advanced lock algorithm that ensures that threads spin on unique memory locations thus ensuring fairness of lock acquisition coupled with improved scalability.

Overview

Synchronization is a major issue in the designing and programming of shared memory[1] multiprocessors. The common problem with lock implementations is the high network contention due to the processors spinning on a shared synchronization flag or memory location. Thus the scalability of the lock is reduced significantly in terms of the number of contending processors.

Array Based Queuing Lock is an extension to ticket lock algorithm which ensures that on a lock release only one processor attempts to acquire the lock resulting in decrease in the number of cache misses. This can be achieved by having all the processors spin on unique memory locations.[2] One analogy used to explain the lock mechanism is the relay race where the athlete passes on the baton to the next athlete in queue which ensures that only one athlete acquires the baton.

ABQL also guarantees fairness in lock acquisition by using first in, first out (FIFO) queue based mechanism. Additionally, the number of invalidation is reduced significantly as opposed to ticket based lock implementation since only one processor incurs a cache miss on a lock release.

Implementation

The foremost requirement of the implementation of array based queuing lock is to ensure that all the threads spin on unique memory locations. This is achieved with an array of length equal to the number of threads which are in contention for the lock. The elements of the array are all initialized to 0 except the first element which is takes the value 1, thus ensuring successful lock acquisition by the first thread in the queue. On a lock release, the hold is passed to the next thread in queue by setting the next element of the array to 1. The requests are granted to the threads in FIFO ordering.

Pseudo Code example is listed below.[3]

ABQL_init(int *next_ticket, lnt *can_serve)
{
*next_ticket = 0;
for (i=l; i<MAXSIZE; i++)
can_serve[i] = 0;
can_serve [0] = 1; 
}

ABQL_acquire{int *next_ticket, int *can_serve)
{
*my_ticket = fetch_and_inc(next_ticket);
while (can_serve [*my_ticket] != 1) {}; 
}

ABQL_release (int *can_serve )
{
can_serve[*my_ticket + 1] = 1;
can_serve [*my_ticket] = 0; // prepare for next time
}

To implement ABQL in the pseudo code above, 3 variables are introduced namely can_serve, next_ticket and my_ticket. The roles of each are described below:

In the initialization method (ABQL_init), the variable next_ticket is initialized to 0. All the elements of the can_serve array except the first element are initialized to 0. Initialization of the first element in the array can_serve to 1, ensures successful lock acquisition by the first thread in the queue.

The acquire method uses an atomic operation fetch_and_inc to fetch the next available ticket number (after incrementing the ticket number by 1) that the new thread will use to spin on. The threads in the queue spin on their locations until the value of my_ticket is set to 1 by the previous thread. On acquiring the lock the thread enters the critical section of the code.

On release of a lock by a thread, the control is passed to the next thread by setting the next element in the array can_serve to 1. The next thread which was waiting to acquire the lock can now do so successfully.

The working of ABQL can be depicted in the table below by assuming 4 processors contending to enter the critical section with the assumption that a thread enters the critical section only once.

Execution Steps next_ticket can_serve my_ticket(P1) my_ticket(P2) my_ticket(P3) my_ticket(P4) Comments
initially 0 [1, 0, 0, 0] 0 0 0 0 initial value of all variables is 0
P1: fetch_and_inc 1 [1, 0, 0, 0] 0 0 0 0 P1 attempts and successfully acquires the lock
P2: fetch_and_inc 2 [1, 0, 0, 0] 0 1 0 0 P2 attempts to acquire the lock
P3: fetch_and_inc 3 [1, 0, 0, 0] 0 1 2 0 P3 attempts to acquire the lock
P4: fetch_and_inc 4 [1, 0, 0, 0] 0 1 2 3 P4 attempts to acquire the lock
P1: can_serve[1] = 1;

can_serve[0] = 0

4 [0, 1, 0, 0] 0 1 2 3 P1 releases the lock and P2 successfully acquires the lock
P2: can_serve[2] = 1;

can_serve[1] = 0

4 [0, 0, 1, 0] 0 1 2 3 P2 releases the lock and P3 successfully acquires the lock
P3: can_serve[3] = 1;

can_serve[2] = 0

4 [0, 0, 0, 1] 0 1 2 3 P3 releases the lock and P4 successfully acquires the lock
P1: can_serve[3] = 0 4 [0, 0, 0, 0] 0 1 2 3 P4 releases the lock

Performance Metrics

The following performance criteria can be used to analyse the lock implementations:

Advantages

Disadvantages

See also

References

  1. "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors".
  2. https://cs.unc.edu/~anderson/papers/survey.pdf
  3. Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. pp. 265–267. ISBN 978-0-9841630-0-7.
This article is issued from Wikipedia - version of the 11/25/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.