Assignment 2
Parallel Programming
The goal of this programming problem is to implement a matrix multiplication program in both shared memory and message passing. You can implement this on any parallel system. The suggested platform is the research cluster at FORTH-ICS. You will have access to 8 nodes of the cluster.
The platform
Each
node in the sub-cluster is a dual-processor AMD opteron system.
The
cluster nodes you will use are physically interconnected with a
Gigabit
Ethernet network. Although you may use this system as both a
shared
memory (over
a software shared memory abstraction) and a
message passing platform, in this assignment you will only use
it as a message passing system over a standard library, Message
Passing
Interface (MPI). For the shared memory part of the assignment
you will
use a single node in the cluster that has two, quad-core CPUs,
for a
total of eight (8) CPUs. You can access the cluster by logging
in
via ssh to shark dot ics dot forth dot gr. You can only
access shark only from UoC systems. From shark, you can then access
the 4-node sub-cluster for MPI
(piranha63,71,72,73) and the eight-core
node
for SAS (penguin3) via ssh.
User accounts for the cluster will be
distributed in class.
The assignment
(a) SAS programming
To write a shared memory parallel program for the eight-core system you can use the ANL m4 macros (Argonne National Laboratory) that allow you to create processes (threads), allocate global memory, and use synchronization primitives. The file ~cs527/sas/macros/c.linux.m4 contains this set of macros. The ~cs527/sas/macros/c.null.m4 file may be handy for running the sequential versions of the SPLASH-2 programs.
Tasks:
- README_FIRST.SAS.
Get,
compile, and run FFT from the ~cs527/sas/applications.
This
version of the application is similar to the original
SPLASH-2 version
at the SPLASH-2 web
page, with minor modifications (mainly to support
data placement
and 64-bit addresses).
- Write a
shared memory program that reads two NxN matrices from a
file and
multiples them on a system with P processors. You don't
have to worry
too much about corner cases (for instance you can assume
that N is a
power of P). For the format of the input file use one
array element per
line, and elements are linearized in a row-wise fashion.
Output the
result on the standard output in the same format. The
program should
report the time it took to compute the result (not
including
initialization, reading files, or outputting results) to
the standard
output.
- Run your SAS program on 1,2,3,4,5,6,7,8
cores and
create a speedup curve.
- README_FIRST.MPI. Install MPICH locally in your account.
- Copy,
compile, and run the int_pi2 program from ~cs527/mpi
(runmpi.txt). This
application computes the value of pi. Read the
instrucitons in
~cs527/mpi/Readme for compiling an MPI program.
Experiment with
the number of approximation intervals: try 100, 1000, and
1000000. Why
is the error lower with 1000 approximation intervals than
with 100? Why
does the error increase for large numbers?
- Write an MPI program that reads two NxN matrices and multiples them in the same way as task 2 above.
- Run your program on 1,2,3,4 processors (one processor per node) and on 2,4,6,8 processors (two processors per node) and create two speedup curves.
- Put the three speedup curves (one from SAS and two from MPI) on a single graph with appropriate legends, indicating application, programming abstraction, and input size.
References
- GNU m4 macro preprocessor. Most unix systems include the m4 macro preprocessor. Type "man m4" for more information.
- ANL macros
- Message Passing Interface
- Open MPI
Submission
Turn in (by mail to b i l a s @ c s d . u o c . g r) a tar file that contains your solutions and a README file stating assumptions or special features.