On this page, I am uploading demo programs shown in the lectures. The collection here is quickly growing so it might not fit in one page soon.

You may use the code included in or linked from this page for any purpose, as long as any derivative cites its source and document changes. You may upload it to your own web page granted that you cite me as author.

If you do something interesting related to this code (including improvements), please let me know. My E-mail address is ingemar at the domain ragnemalm, top domain se (Sweden).

Hello world!

The real thing!

This is my version of Hello World! for GPUs. How can you make a program that just prints the string “Hello world!” while still processing it in parallel? Answer: I take the string “Hello “ plus an array of offsets, and have each thread add the offset to one character, producing the string “World!” Done! Short, simple, and doing the job. This is made in several versions.

There is also an even smaller version of Hello World for CUDA:

This is using managed memory, which takes away the memory management code. Compute capability 3.0 required.

I have also produced Hello World! for OpenCL:


and Hello World! for GLSL fragment shaders:


This uses old style OpenGL. Here is a version uses modern OpenGL. The “common” version is shorter since it depends on my course material for model loading and compiling shaders.



Finally, I also have made Hello World for OpenGL compute shaders:


OpenGL interoperability demos

Here are two demos for CUDA OpenGL interoperability. Both are based on NVidia’s demos, with some changes by me, in order to simplify the code.

Matrix multiplication

Matrix multiplication is an excellent demo of using shared memory. This is a brand new version as of 2021 where I have rewritten most of it in order to be more clear and to the point.

Raycasting/constant memory demo

A raycaster, computing with and without contant memory. This is inspired by another demo that I found valuable but a bit dull since it did not produce any animation at all.

Texture access demo

The world’s simplest demo of CUDA texture interpolation?

Interactive Julia

This demo comes in three variations, intjulia 2 for CUDA, intjulia2cpu doing the same thing on CPU, single tread, and intjulia3 which is a slight variation of 2 with nicer coloring.


Interactive Julia for OpenCL

Almost the same thing but for OpenCL.


Process array with fragment shader

This is a super simple example of computing with fragment shaders. This version is simplified even further by not using FBOs.


Upcoming demos

I plan to upload the following ASAP:


Device properties

and more.

This page is maintained by Ingemar Ragnemalm