Powered by
/src$ make
  • Home
  • About
  • Directory
  • Contact
  • Home
  • About
  • Directory
  • Contact

C++ GPU Programming With CUDA - Install + Hello World Code

10/27/2018

 

Introduction - GPU Programming 

One of the main advantages of using C++ is that you have very finely tuned performance - you can control basically everything about your program, and don't use unnecessary resources of have the overhead that languages like Java and C# have. As such, C++ programmers should be very familiar with how CPUs and RAM work. 

However, accessing the GPU is very beneficial: GPUs are specialized for performing mathemetical calculations, and so being able to do work (or offload work) onto a GPU, in addition to a CPU, makes for strong programming. In this blog post, we'll look at GPU programming by using CUDA. 

Prerequisite: You Need A GPU On Your Computer

Obviously, to program with a GPU, you need to actually have a GPU. Some laptops use CPUs with integrated graphics cards, which probably aren't CUDA enabled. For a list of CUDA-enabled GPUs, click here. 

And if you're using Windows, you need Visual Studio installed. I recommend having Visual Studio 2017. (You should honestly have it anyway.) Here's a link to download it. 

Installation 

To be able to compile C++ code that runs on the GPU, you'll need the CUDA toolkit. Click here to download it. Choose your operating system, architecture, and version. For installer type, choose local. (Either is fine, but I like local.) Click the download button, and follow the instructions that are written. 
After the download finishes, launch the installer. Follow the on-screen instructions. (I recommend doing a Custom install, and making sure everything is checked.) 

If you're on Windows, you need to make sure the PATH environment variable points to your visual studio bin, otherwise you'll get a "can't find cl.exe" problem when trying to compile. See the video to see how to set this. 

Hello World Code

So we need some actual C++ code to actually utilize our GPU. (Note, this code is taken (but modified) from this tutorial.) Put the following code in a file named main.cu
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// main.cu 
/* Compile and run with: 
nvcc main.cu -o run
./run
*/
#include <iostream>
#include <math.h>

__global__ // This keyword means the code runs on the GPU.
void add(int n, float *x, float *y)
    {
    // At each index, add x to y.
    for (int i = 0; i < n; i++)
        {
        y[i] = x[i] + y[i];
        }
    }

int main(void)
    {
    int N = 100;
    float *x, *y;

    // Allocate Unified Memory – accessible from CPU or GPU
    cudaMallocManaged(&x, N*sizeof(float));
    cudaMallocManaged(&y, N*sizeof(float));

    // Initialize our x and y arrays with some floats.
    for (int i = 0; i < N; i++) 
        {
        x[i] = 1.0f;
        y[i] = 2.0f;
        }

    // Run the function on using the GPU.
    add<<<1, 1>>>(N, x, y); // Notice the brackets.

    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();
    
    // Check for errors (all values should be 3.0f)
    float maxError = 0.0f;
    for (int i = 0; i < N; i++)
        {
        maxError = fmax(maxError, fabs(y[i]-3.0f));
        }
    std::cout << "Max error: " << maxError << std::endl;
    
    // Free memory
    cudaFree(x);
    cudaFree(y);
  
    return 0;
    }
The code is simple: we have a function that adds to arrays together (into the second array). We have some code in main that allocates memory for two arrays, calls the add functions, checks if the computation worked, and then frees the memory. Basic programming.

Compiling and Running The Code

So we have our code file, now we need to compile it. Open a terminal in your current directory (for Windows, see the video if you don't know how that works), and to compile the code, in the terminal run:
nvcc main.cu -o run
This will generate our executable file. To run it, next in the terminal, run: 
./run
The output will show "Max error: 0".

Next Steps 

So we installed some CUDA tools and successfully ran some GPU enabled code on our computer. Now we need to do some more advanced programming to learn how to optimize our code and do even more. I recommend reading through this article and following the links there. It's possible that in the future I'll do more on this topic, so make sure to follow me using the links below.
​Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels. 

Twitter: @srcmake
Discord: srcmake#3644
Youtube: srcmake
Twitch: www.twitch.tv/srcmake
​Github: srcmake

Comments are closed.

    Author

    Hi, I'm srcmake. I play video games and develop software. 

    Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
    Metamask tip button
    License: All code and instructions are provided under the MIT License.

    Discord

    Chat with me.


    Youtube

    Watch my videos.


    Twitter

    Get the latest news.


    Twitch

    See the me code live.


    Github

    My latest projects.

Powered by Create your own unique website with customizable templates.