Top 1K Features Creators Events Podcasts Books Extensions Interviews Blog Explorer CSV

CUDA

< >

CUDA, aka Compute Unified Device Architecture, is a programming language created in 2007.

#39on PLDB 17Years Old 18kRepos
Homepage · Download · Blog · Wikipedia · FAQ · Release Notes · Docs

CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU (General-Purpose computing on Graphics Processing Units). The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels. Read more on Wikipedia...


Example from hello-world:
#include <stdio.h> __global__ void hello_world(){ printf("Hello World\n"); } int main() { hello_world<<<1,1>>>(); return 0; }
// Hello world in CUDA #include <stdio.h> const int N = 16; const int blocksize = 16; __global__ void hello(char *a, int *b) { a[threadIdx.x] += b[threadIdx.x]; } int main() { char a[N] = "Hello \0\0\0\0\0\0"; int b[N] = {15, 10, 6, 0, -11, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}; char *ad; int *bd; const int csize = N*sizeof(char); const int isize = N*sizeof(int); printf("%s", a); cudaMalloc( (void**)&ad, csize ); cudaMalloc( (void**)&bd, isize ); cudaMemcpy( ad, a, csize, cudaMemcpyHostToDevice ); cudaMemcpy( bd, b, isize, cudaMemcpyHostToDevice ); dim3 dimBlock( blocksize, 1 ); dim3 dimGrid( 1, 1 ); hello<<<dimGrid, dimBlock>>>(ad, bd); cudaMemcpy( a, ad, csize, cudaMemcpyDeviceToHost ); cudaFree( ad ); cudaFree( bd ); printf("%s\n", a); return EXIT_SUCCESS; }
Example from Linguist:
#include <stdio.h> #include <cuda_runtime.h> /** * CUDA Kernel Device code * * Computes the vector addition of A and B into C. The 3 vectors have the same * number of elements numElements. */ __global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < numElements) { C[i] = A[i] + B[i]; } } /** * Host main routine */ int main(void) { // Error code to check return values for CUDA calls cudaError_t err = cudaSuccess; // Launch the Vector Add CUDA Kernel int threadsPerBlock = 256; int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock; vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements); err = cudaGetLastError(); if (err != cudaSuccess) { fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Reset the device and exit err = cudaDeviceReset(); return 0; }
Example from Wikipedia:
import numpy from pycublas import CUBLASMatrix A = CUBLASMatrix( numpy.mat([[1,2,3]],[[4,5,6]],numpy.float32) ) B = CUBLASMatrix( numpy.mat([[2,3]],[4,5],[[6,7]],numpy.float32) ) C = A*B print C.np_mat()

Language features

Feature Supported Example Token
Standard Library printf("Hello, World!\n");
Comments /* A comment */
MultiLine Comments /* A comment */ /* */
Print() Debugging printf
Case Insensitive Identifiers X
Semantic Indentation X
View source
- Build the next great programming language · About · Search · Keywords · Livestreams · Labs · Resources · Acknowledgements · Part of the World Wide Scroll