Assignment 3
SAS Synchronization
The goal of this programming problem is to implement various synchronization primitives for shared address space programs. In particular, you will implement different lock and barrier synchronization algorithms for the LOCK/UNLOCK and BARRIER calls in the m4 macros API. For this purpose, you may need to use inline assembly code to get access to atomic assembly instructions. You can implement the assignment on any shared memory parallel system. The suggested platform is the shared memory platofrm of assignment 2.
SC hardware shared memory platform
Implement a library with the following lock synchronization algorithms:
A simple spin lock using some atomic assembly instruction in the system you are using for the implementation. You can find the available atomic instructions in the architecture manual of the processor you use. Example code for atomic primitives for x86/linux (opteron CPUs) you can use either as is or modifed is given in the atomic.h file.
Implement ticket-based locks.
Implement MCS locks.
Extra credit: Any additional algorithm.
Implement a library with the following barrier synchronization algorithms:
A centralized barrier that uses an atomic operation and a counter/flag.
Extra credit: A software combining tree barrier.
Modify c.m4.linux to include macros for the locks/barriers you have implemented. You will also need to adjust the types of the lock and barrier variables. The user should be able to choose among lock/barrier algorithms by using a Makefile parameter.
Use the SAS LU, WaterNsquared (from SPLASH-2) and your matrix multiply (MM) programs of the previous assignment to measure:
Program running time and speedup on 1,2 CPUs with spin locks/centralized barriers.
Program running time and speedup on 1,2 CPUs with mcs locks/tree barriers.
(extra credit) RC SC hardware shared memory platform
Repeat 1-4 above assuming the underlying system is RC. In this case you have no base atomic operations to use (why? could this be different?), but you may use basic memory synchronizing primitives (e.g. memory barriers), provided by the platform:
Implement locks: spin locks, mcs
First, assume the platform is SC and write the code for the primitives without use of hardware atomic primitives, but rather using simple load/store.
Then, use memory synchronizing (memory barriers) primitivies to mark memory acceses in each algorithm to make it work on a release consistent system.
Implement barriers: counter-based, [Extra credit: tree barriers]
Follow the same steps as in (1) above.
Use the SAS LU, WaterNsquared (from SPLASH-2) and your matrix multiply (MM) programs of the previous assignment to measure:
Program running time and speedup on 1,2,...,N CPUs with original locks/barriers.
Program running time and speedup on 1,2,...,N CPUs with spin locks/centralized barriers.
Program running time and speedup on 1,2,...,N CPUs with mcs locks/centralized barriers.
Extra credit: Program running time and speedup on 1,2,...,N CPUs with mcs locks/tree barriers.
References
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors. J. M. Mellor-Crummey and M. L. Scott. ACM Trans. on Computer Systems, February 1991.
Submission
Turn in (by mail to b i l a s @ c s d . u o c . g r) a tar file that contains your solutions and a README file stating assumptions or special features.