| Task |
Chapter |
Title |
Description |
Due Date |
Check |
| 1 |
1 |
Jacobi Iteration |
Page 3. This is ZPL 1.0 code.
You need to make the adjustment so that it works with ZPL 2.0. |
||
| 2 |
There are no programming
deliverables from chapter 2 in the Programmer's Guide. |
||||
| 3 |
There are no programming
deliverables from chapter 3 in the Programmer's Guide. |
||||
| 2 |
4 |
Sample Statistics |
Page 42. |
||
| 3 |
4 |
Correlation Coefficient |
Page 44. |
||
| 4 |
4 |
Histogram 1 |
Page 46 |
||
| 5 |
4 |
Uniform Convolution Shit/Add |
Page 47 |
||
| 6 |
4 |
Uniform Convolution Scan |
Page 49 |
||
| 7 |
4 |
Counting Connected Components |
Page 51 |
||
| 5 |
There are no programming
deliverables from chapter5 in the Programmer's Guide. |
||||
| 8 |
6 |
Cannon's Matrix Multiply |
Page 84 |
||
| 9 |
6 |
Summa Matrix Multiply |
Page 86 |
||
| 10 |
6 |
Tridiagonal Matrix Multiplication |
Page 89 |
||
| 11 |
6 |
Ranking |
Page 90 |
||
| 12 |
6 |
Histogram 2 |
Page 92 |
||
| 13 |
6 |
Lossy Data Compression |
Page 94 |
||
| 14 |
6 |
Odd/Even Transposition Sort |
Page 96 |
||
| 7 |
There are no programming
deliverables from chapter 7 in the Programmer's Guide. |
||||
| 8 |
There are no programming
deliverables from chapter 8 in the Programmer's Guide. |
||||
| 15 |
9 |
Inconsequential Code Fragments |
Page 123. Create a short, but
complete program that highlights the import of page 123. |
||
| 16 |
9 |
Score |
Page 126. Create a full example.
I recall doing this. It took me awhile to get my head around it. |
||
| 17 |
9 |
N-Body Computations 1 |
Page 133. |
||
| 18 |
9 |
N-Body Computations 2 |
Page 134. |
||
| 19 |
9 |
Random Number Generation |
Page 137. |
||
| 10 |
There are no programming
deliverables from chapter 10 in the Programmer's Guide. |
| Task |
Chapter 1 |
Title |
Description |
Due Date |
Check |
| 1 |
There are no programming
deliverables from chapter 1. |
||||
| 2 |
|||||
| 1 |
3 |
Greetings! |
Pages 41/42. |
||
| 2 |
4 |
Serial Trapazoidal Rule |
Pages 55/56 |
||
| 3 |
4 |
Trapazoidal Rule 1 |
Pages 57,58,59. This defines a,
b, and n at compile time. No broadcast required. |
||
| 4 |
4 |
Trapazoidal Rule 2 |
Pages 61, 62. This uses the
first Get_Data. Embed Get_Data into your trapazoidal rule
solution so that a, b, and n are input by the user and then broadcasted
to the other processes via Get_Data's naive sequential message strategy. |
||
| 5 |
5 |
Trapazoidal Rule 3 |
Section 5.1. Re do the
Trapazoidal Rule program using Get_data1. This solution relies upon a
user defined tree broadcast. |
||
| 6 |
5 |
Trapazoidal Rule 4 |
Section 5.2. Adjust your
solution to the Trapazoidal Rule program so that it uses Get_data2.
This solution relies upon MPI_Bcast. |
||
| 7 |
5 |
Serial Dot Product |
Build a C program that uses the
function on page 75 to compute the dot product of two vectors. |
||
| 8 |
5 |
Dot Product 1 |
Build a parallel C/MPI program
that uses the function given on page 76. It relies upon the MPI_Reduce
function call. |
||
| 9 |
5 |
Dot Product 2 |
Build a parallel C/MPI program
that uses the MPI_Allreduce function call as discussed in Section 5.6. |
||
| 10 |
5 |
Serial Matrix Vector |
Code up a C program that
computes a serial matrix vector product as mentioned on page 78. |
||
| 11 |
5 |
Mat Vec 1 |
Build a parallel C/MPI program
that calls just MPI_Gather to compute a matrix vector product. It does
not call MPI_Allgather. This is outlined in section 5.7 and section 5.8. |
||
| 12 |
5 |
Mat Vec 2 |
Build a parallel C/MPI program
that calls MPI_Scatter to compute a matrix vector product. This is
outlined in section 5.7. |
||
| 13 |
5 |
Mat Vec 3 |
Build a parallel C/MPI program
that calls MPI_Allgather as shown on page 83. |
||
| 6 |
At present we will not implement
any of the ideas in chapter 6. |
||||
| 14 |
7 |
Serial Matrix Mult. |
Implement the standard serial
order n-cubed matrix multiply in C. This is given on page 112. |
||
| 15 |
7 |
MM Naive |
Implement the naive and costly
parallel matrix multiply mentioned at the bottom of page 112. |
||
| 16 |
7 |
Fox's MM |
Implement Fox's MM given in
chapter 7 of your textbook. |
||
| 8 |
At present, there are no
programming assignments for chapter 8. However, the code given is
useful as it simplies writing I/O routines for C/MPI programs. |
||||
| 9 |
At present, there are no
programming assignments for chapter 9. This chapter concerns debugging
parallel programs. |
||||
| 10 |
At present, there are no
programming assignments for chapter 10. This may change. |
||||
| 11 |
At present, there are no
programming assignments for chapter 11. |
||||
| 12 |
At present, there are no
programming assignments for chapter 12. |
||||
| 13 |
At present, there are no
programming assignments for chapter 13. However, the hypercube strategy
is used in parallel bitonic sort and can also be used in parallel 1D
FFT. |
||||
| 17 |
14 |
Serial Bitonic Sort |
Implement the serial bitonic
sort given in the text book. Make sure that it works. Make sure that
you understand it. This is located in section 14.3 |
||
| 18 |
14 |
Parallel Bitonic Sort |
Implement the parallel bitonic
sort given in the text book. Make sure that it works. Make srue that
you understand it. This is located in section 14.4. |
||
| 15 |
At present, there are no
programming assignments for chapter 15. |
||||
| 16 |
At present, there are no
programming assignments for chapter 16. |
| Task |
Name |
Description |
Due Date |
Check |
| 1 |
Trapazoidal Rule |
|||
| 2 |
Dot Product |
|||
| 3 |
Square Deal |
Traverse a matrix using a inside
out counter clockwise spiral path. Deposit prime numbers when they
exist. |
||
| 4 |
Transpose |
Write a ZPL program that will
transpose a 2D matrix. |
||
| 5 |
Fox's MM |
Write a ZPL program that will
multiply matrices using Fox's strategy. |
||
| 6 |
Mandlebrot |
Implement a program that will
compute and display a Mandlebrot image as was described in class by Nat
Tuck. You can use any display mechanism that you wish. It does not need
to be the GTK/Glade/Perl/Perl version that Nat presented. |
||
| 7 |
Eigenvalues 1 |
Implement a program that will
compute the eigenvalues of a 2D symmetric matrix using the standard
Jacobi tranformational strategy. |
||
| 8 |
NAS EP |
Implement a ZPL program that
will compute the NAS EP benchmark. |
||
| 9 |
NAS IS |
Implement a ZPL program that
will compute the NAS IP benchmark. |
||
| 10 |
Bitonic Sort |
Implement a ZPL program that
will compute a Bitonic Sort. It might be necessary to relies on ZPL
2.0. The enhancements given to us by Steven Dietz. It would be nice to
see if we could do this without Dietiz's enhancements to get a feel of
the limitations of ZPL 1.0. |
||
| 11 |
FFT w/ ZPL1.0 |
Implement a ZPL program that
will compute a 1D FFT without using the language extensions in Dietz's
thesis. In so doing, we can get a feel for the issue. |
||
| 12 |
FFT w/ ZPL2.0 |
Implement a ZPL program that
will compute a 1D FFT using the language extensions offered to us by
Dietz is his thesis. This is ZPL 2.0 |
||
| 13 |
Convex Hull Divide and Conquer |
|||
| 14 |
Matrix Inverse 1 |
|||
| 15 |
Matrix Inverse 2 |
|||
| 16 |
Matrix Inverse 3 |
| 1 |
Sieve |
Implement the Sieve of
Erathanones. |
Due Date |
Check |
| 2 |
NAS EP |
Implement the NAS EP Benchmark. |
||
| 3 |
Convolution 1 |
Implement a program that the
compute the convolution of two vectors without using a fourier
transform. Just use the definition of convolution. |
||
| 4 |
Fourier
Matrix |
Implement a program that will
input a value n from the command line and then it will create an n by n
complex fourier matrix. |
||
| 5 |
Convolution 2 |
Implement a program that will
compute the convolution of two complex vectors by first taking the
discrete order N-squared fourier transform of each vector. Then the two
transformed vectors are elementwise multiplied to form a vector, say
FV. This is followed by applying the inverse N-squared transform to the
vector FV. You should verify that we get the same result as if we
convolved two original vectors. |
||
| 6 |
Convolution 3 |
Implement a program that will
compute the convolution of two complex vectors by first taking the
discrete order n log n fourier transform of each vector. This
implementation must use the recursive, divide and conquer version
discussed in class. |
||
| 7 |
Convolution 4 |
Implement a program that will
compute the convolution of two complex vectors by first taking the
discrete order n log n fourier tranform of each vector. This
implementation must use the iterative, divide and conquer version
discussed in class. |
||
| 8 |
Mandlebrot |
Implement a program that will
compute and display a Mandlebort image as was described in class. I
gave a hand out on this. |
||
| 9 |
2D Fourier
Transform |
Implement a program that will
input a simple image. Say, for example, a white square on a black
background. Using GTK/GDK or some other display software, compute the
2D fourier transform of the image using the pixel values as elements of
the two dimensional input matrix. Now that the image has been moved up
into the frequency domain, display the picture. For each point, you
should take the real part and imaginary part and compute the magnitude
of the complex. Remap the values of the magnitude into either a black
and white picture or a grey scale picture. |
||
| 10 |
NAS IS |
Read the NAS IS (integer sort)
benchmark specification and get it to work. |
||
| 11 |
NAS FT |
Read the NAS FT benchmark
specification and get it to work. |
||
| 12 |
Convex Hull Divide & Conquer |
| Task |
Name |
Description |
Due Date |
Check |
| 1 |
Sieve |
Using C/MPI write a parallel
version of the Sieve of Erasthones. |
||
| 2 |
NAS EP |
Using C/MPI write a parallel
version of the NAS EP benchmark. This differs just a bit from the
serial version since the starting seed values for each processor needs
to be computed first, So there is a little bit more work to be done
here. |
||
| 3 |
2D FFT |
Using C/MPI write a program that
will input an image and output its 2D FFT. You will need to display the
magnitude of the real and imaginary part remapped to a scale. You
can do this simply have each process compute a local, serial 1D FFT on
its own. |
||
| 4 |
1D FFT |
Using C/MPI write a program that
will convolve two vectors using a distributed 1D FFT algorithm. I have
given a handout for doing this in class. The hypercube strategy is
probably the way to go for us. You should have a way for convincing
yourself and me that your program is doing its job. |
||
| 5 |
NAS IS |
Using C/MPI write a program that
will run the NAS IS (integer sort) benchmark. |
||
| 6 |
NAS FT |
Implement the NAS FT Benchmark. |
||
| 7 |
Convex Hull |
|||
| 8 |
||||
| 9 |
||||
| 10 |
||||
| 11 |
||||
| 12 |
||||
| 13 |
||||
| 14 |
||||
| 15 |
| Task |
Name |
Description |
Due Date |
Check |
| 1 |
ZPL Operations |
Read section 5.1 thoroughly.
Submit a small packet of code that includes your six little ZPL
programs. These programs are given as statements in Figure 8 of the
WYSIWIG paper. Attach to your code nice looking graphs that are similar
to those presented in Figure 8. You are submitting a fourth row of the
figure. For now, use 1x1, 4x4, and 6x6. However, we should have 64
cores shortly and when we do, you can use 8x8 too. I would like
you to attach a written paragraph or two as well. Your prose should
discuss your findings. Do they match those in the paper or not?
Technology has moved forward since the paper was written. You might
need to increase the number of elements per processor 100 x 100 doubles
to something much larger. |
||
| 2 |
MM |
Read section 5.2 thorougly.
Submit a paper packet of code that includes your three versions of MM
in ZPL: Summa, Fox, and Cannon. If you do not include Fox for whatever
reason, at least submit Summa and Cannon. I want you to replicate the
experiment discussed in the paper on our machine. Include Fox's timings
as well. Does your result agree with the spirit of the paper? Any
anomolies? Or, is it as you expected? |
||
| 3 |
Table 3 |
I want you to extend Table 3 by
adding a row. Label the row Fox. Fill out the table with the entries
you think should be there. |
||
| 4 |
Uniform Convolution Table |
I want you to go back and review
both the Shift/Add and the Scan solutions of the Uniform Convolution
problem given in the ZPL Programmer's Guide. Create a table
similar to Table 3 in the Wysiwig paper. Fill in the entries with the
values you believe are correct. Pass this table in to me. Make the
table look reasonable. Be respectful of my of 56 year old eyes. |
||
| 5 |
Uniform Convolution Figure |
Please re-run the experiment
discussed in section 5.2 of the paper. However, this time use the two
competing solutions for the uniform convolution problem. Generate three
bar graphs that are similar to a row of Figure 10. Ideally, we will
have our 64 cores up and running and one of your bar charts is an 8 x 8. |
| Task |
Name |
Description |
Due Date |
Check |
| 1 |
Sorting 1 |
I want you to create an
experiment that will compare the performance of the MPI/C Parallel
Bitonic Sort with the ZPL Odd/Even Transposition Sort. Have some fun
doing it. Initially take timings and create a graph that show how
well each method scales. Try to sort a large number of integers. You
can rig it so that each integer falls within a range, say 0 to 9999.
Perhaps you can output the smallest value, the median value, and the
largest value to gain some confidence. Vary the number of processors.
Vary the number of integers. What are your findings? If you
successfully coded up a ZPL Bitonic sort throw that into your analysis
too. If you have a parallel integer sort (NAS IS) working, throw that
one in as well. |
||
| Sorting 2 |
Sort N integers as fast as you
can on our machine. Make N as large as you can and then go. What is the
fastest sort you can build for our machine? |
| Day |
What we did |
Assignment |
| 1 |
We talked about writing parallel
programs using C with MPI calls and using the ZPL 2.0 language. We will
run these programs on our local cluster. Students are to email Nat Tuck
so that they can get an account on ermine. We also discussed that
we will explore the new programming language called Chapel. Chapel is
the creation of Brad Chamberlain and others. Brad also is the main
implementor of ZPL 1.0. I doodled the sequential bitonic sort
algorithm in class using 16 unique integers. Students could see what is
going on. I then displayed the C code which implements the bitonic
sort. We noted that the C w/ MPI is a bit painful, while ZPL is
easier to code. However, ZPL is limited. We gave hand motions
that simulated flood and reduce. Most of the programs will be to
program traditional algorithms. We hope to learn a bit about weather
models and one of them will form the bedrock of a larger project.
Perhaps this project will occur in a later semester. Everybody went
home happy, including me. |
Students need to implement the
sequential bitonic sort algoirthm in C. Students need to email
Nat Tuck and receive their account. Students should read chapter 1 and
chapter 2 of the MPI book. Students should download the ZPL
Programmer's Guide from the University of Washintgon website. They
should also download some of the other papers listed there. |
| 2 |
We discussed chapter 3 and
chapter 4 of the course textbook. We illustrated the parallel greetings
program in chapter 3. We walked through the three versions of the
trapazoidal rule program in chapter 4. The first one is a serial
version. The second one is a parallel version with hard wired input.
The third one is a naive parallel version with a sequential broadcast
of the inputs. It does illustrate the use of the tag parameter.
Students were asked to paper copy hand in their sequential bitonic
sort. I will accept this next time too. We also showed a ZPL version of
the trapazoidal rule problem. This ZPL version is both a sequential and
a parallel version. It is much more succinct. We reviewed the Jacobi
smoothing program found in chapter 1 of the ZPL programmer's guide. We
noted that this version worked for ZPL 1.0, but it will need to be
altered for the ZPL 2.0 that we have. Memory must be explicitly
allocated in ZPL 2.0. This meant that we needed to create a region
called BigR as well. We started to run timings, but it was getting late. |
Students must obtain an ermine
account from Nat. They should verify that they can log on. They should
enter the greetings.c, serial_trap.c, and trap1.c programs and get them
to run. They should also enter the Jacobi.z and the trap.z programs
too. This is good practice will it will enable you to ramp up quickly.
Do these things. Students make sure to read chapter 3 and chapter 4 of
the textbook closely. The students should read chapter 1 of the ZPL
Programmer's guide. |
| 3 |
We discussed section 5.1 in the
textbook. It is a user-defined tree broadcast. Using a tree broadcast,
this is an enhancement of the earlier trap.c programs. We also
discussed section 5.2. It is an MPI based tree broadcast. We discussed
a serial version of the sieve of erasthothanes. We handed out the
statement of the NAS Benchmark EP. We reviewed it. |
Students should get the user
defined tree broadcast to work. Students should run the MPI based tree broadcast. Students should read Sections 5.1, 5.2, 5.3, and 5.4. They should implement and get working all the versions of the trap rule program. This includes using reduce as described in section 5.4. Students should get the sieve of erasthonanes to work in C. Students should read the description of the NAS EP program with an eye towards getting it to work in C. Students should try to implement the serial version of the NAS EP benchmark in C. I can offer suggestions if needed. |
| 4 |
||
| 5 |