It can hardly be stressed enough how important it is to utilize shared memory when doing GPU computing. Shared memory allows us to re-arrange memory before writing, to take advantage of coalescing, and to re-use data to reduce the total number of global memory accesses.
In this exercise, you will start from a working CUDA program which
applies a linear filter to an image. The
original image is shown to the left, and the filtered image to the
right (using code that is almost the same as in "interactivejulia" from
Your task is to accelerate this operation by preloading image data into shared memory. You will have to split the operation to a number of blocks and only read the part of the image that is relevant for your computation.
It can be noted that there are several ways to do image filtering that we do not take into account here. First of all, we can utilize separable filters. We may also use texture memory in order to take advantage of cache. Neither is demanded here. The focus here is memory addressing and shared memory.
You need to use __shared__ for declaring shared memory, e.g. "__shared__ unsigned char;" for allocating a 64-byte array.
After loading data to shared memory, before processing, you should __syncthreads(); to synchronize.
Consider using the compilation switch --device-emulation to run in emulation while debugging. Then you can printf() from the kernel.
ppmfilter.cu is a naive CUDA implementation which is not using shared memory. ppmfilter.c is a C implementation, included only as reference. ppmread.c and ppmread.h read and write PPM files (a very simple image format).
QUESTION: How much data did you put in shared memory?
QUESTION: How much data does each thread copy to shared memory?
QUESTION: How did you handle the necessary overlap between the blocks?
QUESTION: If we would like to increase the block size, about how big blocks would be safe to use in this case? Why?COMPETITION QUESTION: How much speedup did you get over the naive version?
QUESTION: Were there any particular problems in adding this
That is all for lab 5. Write down answers to all questions and then
show your results to us.