On this page, I am uploading demo programs shown in the lectures. The collection here is quickly growing so it might not fit in one page soon.
You may use the code included in or linked from this page for any purpose, as long as any derivative cites its source and document changes. You may upload it to your own web page granted that you cite me as author.
If you do something interesting related to this code (including improvements), please let me know. My E-mail address is ingemar at the domain ragnemalm, top domain se (Sweden).
Hello world!
The real thing!
This is my version of Hello World! for GPUs. How can you make a program that just prints the string “Hello world!” while still processing it in parallel? Answer: I take the string “Hello “ plus an array of offsets, and have each thread add the offset to one character, producing the string “World!” Done! Short, simple, and doing the job. This is made in several versions.
There is also an even smaller version of Hello World for CUDA:
This is using managed memory, which takes away the memory management code. Compute capability 3.0 required.
I have also produced Hello World! for OpenCL:
and Hello World! for GLSL fragment shaders:
This uses old style OpenGL. Here is a version uses modern OpenGL. The “common” version is shorter since it depends on my course material for model loading and compiling shaders.
Finally, I also have made Hello World for OpenGL compute shaders:
OpenGL interoperability demos
Here are two demos for CUDA OpenGL interoperability. Both are based on NVidia’s demos, with some changes by me, in order to simplify the code.
Matrix multiplication
Matrix multiplication is an excellent demo of using shared memory. This is a brand new version as of 2021 where I have rewritten most of it in order to be more clear and to the point.
Raycasting/constant memory demo
A raycaster, computing with and without contant memory. This is inspired by another demo that I found valuable but a bit dull since it did not produce any animation at all.
Texture access demo
The world’s simplest demo of CUDA texture interpolation?
Interactive Julia
This demo comes in three variations, intjulia 2 for CUDA, intjulia2cpu doing the same thing on CPU, single tread, and intjulia3 which is a slight variation of 2 with nicer coloring.
Interactive Julia for OpenCL
Almost the same thing but for OpenCL.
Process array with fragment shader
This is a super simple example of computing with fragment shaders. This version is simplified even further by not using FBOs.
Compute shaders
Compute shaders have little focus in TDDD56, so these demos are mainly for TSBK03.
Hello-world in three variants, and the equally simple “simple"
"Image hack” shows how to generate an image with a Compute Shader and display it with ordinary OpenGL, which is fortunately not very complicated. This needs the full “common” folder.
The Julia fractal is an extension of the “Image hack” in order to make a real-time rendered fractal. This to needs “common”.
“Transpose” tests the performance of compute shaders, comparing CPU, naive compute shader and using shared memory with variable problem sizes.
Upcoming demos
I plan to upload the following ASAP:
Transpose
Device properties
and more.