Multithreading - ITS / Research Computing

If parallel processes need to share access to resources, then you may need to write your code to execute across multiple threads within a single task. In this case, multiple threads have access to the same shared memory pool:

This can be accomplished in many different ways and generally depends on the programming language. As an example, consider the following simple C program which uses OpenMP to multiply every number in an array by a fixed constant provided as a command line argument:

$ test.c #include <stdio.h> #include <stdlib.h> #include <omp.h> int f(int x, int c) { return x * c; } int main(int argc, char *argv[]) { int array[6] = {0, 1, 2, 3, 4, 5}; int results[6]; int c = atoi(argv[1]); #pragma omp parallel for for (int i = 0; i < 6; i++) { results[i] = f(array[i], c); } for (int i = 0; i < 6; i++) { printf("f(%d, 2) = %d\n", array[i], results[i]); } }

The code can be executed by the following batch script:

#!/bin/bash #SBATCH --job-name=openmp_test #SBATCH --partition=shared #SBATCH --time=00:10:00 #SBATCH --output=slurm_%x_%j.out #SBATCH --error=slurm_%x_%j.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=6 #SBATCH --mem-per-cpu=500mb module load openmpi/4.1.1-gcc.9.2 module load gcc/9.2.0 cd (path to working directory) gcc -fopenmp test.c -o test ./test 2

The script will compile the program with OpenMP enabled, and then execute it as a single process with 6 threads. Each thread has access to the results array, and deposits the result of computing the function f into it. Note that the environment variable OMP_NUM_THREADS, which determines the number of threads accessible to OpenMP, is automatically set equal to SLURM_CPUS_PER_TASK.

A single node on Andromeda is equipped with a maximum of 64 CPU cores and 250GB of memory. Since multithreading is generally much easier to accomplish than multiprocessing with communication (i.e. with OpenMPI; see below), it is very common to request these resources, effectively allocating an entire node for a single job.

As an example, we can sum a very large array by splitting it into 64 chunks and performing the sum on each chunk in a separate thread with the following code:

$ bigtest.c #include <stdio.h> #include <omp.h> #define ARRAY_SIZE 1000000 int main() { int array[ARRAY_SIZE]; // Initialize array with values for (int i = 0; i<ARRAY_SIZE; i++) { array[i] = i + 1; } long long sum = 0; #pragma omp parallel { // Each thread executes this block of code int thread_id = omp_get_thread_num(); int chunk_size = ARRAY_SIZE / omp_get_num_threads(); int start = thread_id * chunk_size; int end = (thread_id == omp_get_num_threads() - 1) ? ARRAY_SIZE : (thread_id + 1) * chunk_size; long long local_sum = 0; // Each thread sums a portion of the array for (int i = start; i < end; i++) { local_sum += array[i]; } // Define a 'critical' section, which will only be executed on a single thread at a time, // preventing multiple concurrent accesses to the same variable #pragma omp critical { sum += local_sum; printf("Thread %d contributed %lld to the total sum.\n", thread_id, local_sum); } } // Should output the sum of integers between 1 and ARRAY_SIZE + 1 // = ARRAY_SIZE * (ARRAY_SIZE + 1) / 2 = 500000500000 printf("Sum of elements: %lld\n", sum); return 0; }

And a corresponding batch script:

$ bigtest.sl #!/bin/bash #SBATCH --job-name=big_openmp_test #SBATCH --time=00:10:00 #SBATCH --output=slurm_%x_%j.out #SBATCH --error=slurm_%x_%j.err #SBATCH --ntasks=1 #SBATCH --cpus-per-task=64 #SBATCH --mem=250GB module load openmpi/4.1.1-gcc.9.2 module load gcc/9.2.0 cd (path to working directory) gcc -fopenmp bigtest.c -o test ./test 2