Whereas the fastest supercomputer of 1998 could compute 1.34 trillion double precision floating point operations per second (TFLOPS) , today’s consumer-level (sub-$500) graphics cards such as the NVidia GeForce GTX 480 can compute 1.35 TFLOPS (single precision) . The rise of multi- and many-core processing has certainly introduced new urgency to teaching parallel programming. In this paper, we focus on lab exercises at the undergraduate level. Three undergraduate students and one faculty member spent several weeks on CUDA lab exercises, starting with the recent book by Kirk and Hwu .We describe our experiences and lessons learned working with the book and its accompanying labs. We discuss extended labs including the game of life, curvature flow, and ray tracing, all of which may appeal to an even wider audience of today’s learners.