MATLAB Parallel Computing on Andromeda

Initial Setup

1. Load and Open the MATLAB Module

Note: Normal text denotes user input, code blocks denote response from software.

[johnchris@l001 ~]$ rm -rf .matlab
[johnchris@l001 ~]$ module load matlab/2024a
[johnchris@l001 ~]$ matlab

Note: There is a space between rm and -rf, as well as another one between -rf and .matlab

2. Now we are in a MATLAB session. Run the MATLAB command:

Note: Type each individual line, do not copy-paste these commands as a block of code or you may see errors.

>> configCluster


>> c = parcluster;
>> c.saveProfile;

3. Exit the MATLAB Session

>> exit
[johnchris@l001 ~]$

Configuring Jobs

1. Create a Slurm File (parallel_matlab.sl) that calls your MATLAB code file which you will construct next.

#!/bin/bash
#SBATCH –job-name=sample # Job name
#SBATCH –time=24:00:00 # Time limit hrs:min:sec (max: 120 hours)
#SBATCH –ntasks=1 # Number of tasks
#SBATCH –nodes=1 # Number of nodes requested
#SBATCH –partition=shared # Node choice
#SBATCH –cpus-per-task=1 # Number of CPU processors per task
#SBATCH –mem-per-cpu=4G # RAM per node (Max: 180GB/48 cores, 250GB/64 cores)
#SBATCH –mail-type=ALL # Mail notifications (NONE, BEGIN, END, FAIL, ALL)
#SBATCH –mail-user=username@bc.edu # BC email address

module load matlab/2024a
cd /mmfs1/data/johnchris/parallel_matlab # Directory of MATLAB code file
matlab -batch parallel_matlab >&parallel_matlab.out

Note: Bold denotes input individual users MUST change. Ensure the time limit is sufficient for MATLAB execution.

2. Add the Parallel Computing Command to MATLAB Code File (parallel_matlab.m)

c = parcluster;
%Corresponding to the time limit, max is 5 days (120:00:00)
c.AdditionalProperties.WallTime=’6:00:00′;
%Corresponding to CPU memory, per core
c.AdditionalProperties.MemPerCPU=’6G’;
%Number of processors requested per node, same as the number in parpool()
c.AdditionalProperties.ProcsPerNode=10;
%Can be different from the node choice
c.AdditionalProperties.Partition=’shared’;
saveProfile(c)
%Total CPU cores requested for the program, cannot exceed 512 cores
parpool(10)
%End of the parallel computing command, program codes start here
tic
n = 20000;
A = 500;
a = zeros(1,n);
parfor i=1:n
if mod(i, 100) == 1
i
end
a(i) = max(abs(eig(rand(A))));
end
toc
delete(gcp(‘nocreate’));

Note: Andromeda provides multiple partitions for CPU & GPU nodes. For CPU:

  • Andromeda (A1): Exclusive and shared
  • Andromeda2 (A2): Long, medium, and short

Submitting a job requires editing c.AdditionalProperties to specify computational resources.

Submitting Jobs

  1. Upload the Slurm file (parallel_matlab.sl) and the MATLAB code file (parallel_matlab.m) to the directory specified in your Slurm file (/mmfs1/data/johnchris/parallel_matlab is given in the example, but you must change it to reflect your own file path) on the Andromeda cluster:

Note: It is advised to upload the Slurm file and the MATLAB code file in the same directory.

[johnchris@l001 ~/parallel_matlab]$ ls

  1. Once logged in to the Andromeda cluster, open the directory you created on the terminal:

[johnchris@l001 ~]$ cd parallel_matlab
[johnchris@l001 ~/parallel_matlab]$

  1. Submit your job (the Slurm .sl file) onto Andromeda:

[johnchris@l001 ~/parallel_matlab]$ sbatch parallel_matlab.sl

  1. You can check on the status of your jobs using the following code. “R” indicates that the job is running.

[johnchris@l001 ~/parallel_matlab]$ squeue | grep username

Note: Job 2146825 (sample) is what the Slurm file calls directly. Job 2146826 (Job1) is the parallel pool that the code in parallel_matlab.m calls within MATLAB.

  1. Once your job is done, a file will appear (parallel_matlab.out), as named in the Slurm file, with the output of your MATLAB session.

[johnchris@l001 ~/parallel_matlab]$ ls

Running Interactively

If you do not wish to use a Slurm file, you may instead write code directly in the terminal after entering an interactive session by using the command ‘srun’. This will take you to an available compute node on Andromeda. Do not run MATLAB on the login node (node you are on when you enter the cluster)

[johnchris@l001 ~]$ srun –job-name=sample –nodes=1 –ntasks=1 –time=24:00:00 –mem=20G –pty bash -I
[johnchris@c116 ~]$ module load matlab/2024a
[johnchris@c116 ~]$ matlab

Note: You get assigned to a node on the cluster (eg. c116 in this case). Then, you get the following message and the MATLAB window opens interactively on the terminal.

You can now enter the code in MATLAB interactively as below:

Note: Type each individual line, do not not copy paste these commands as a block of code or you may see errors.

>> c = parcluster;
>> c.AdditionalProperties.MemPerCPU=’6G’;
>>c.AdditionalProperties.WallTime = ’24:00:00′;
>>saveProfile(c)
>>parpool(10)

>>(insert your desired MATLAB code)

Screenshot: Example of MATLAB code and interactive output from the run.

>>exit
[johnchris@c116 ~]$

Batch Job

You can also use the ‘batch’ command to submit asynchronous jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted job.

  1. Configure the MATLAB code (parabatch_matlab.m) with desired parallel computing process as a function:

function t = parabatch_matlab()
t0 = tic;
n = 20000;
A = 500;
a = zeros(1,n);
parfor i = 1:n
if mod(i, 100) == 1
i
end
a(i) = max(abs(eig(rand(A))));
end
t = toc(t0);
end

2. Open the specified directory on Andromeda:

[johnchris@l001 ~]$ cd parallel_matlab
[johnchris@l001 ~/parallel_matlab]$

  1. Use ‘srun’ to go to a Compute node in Andromeda. Then start a MATLAB session, and
    open a parcluster:

[johnchris@l001 ~/parallel_matlab]$ srun –job-name=sample –nodes=1 –ntasks=1 —
time=24:00:00 –mem=20G –pty bash -I

[johnchris@c001 parallel_matlab]$ module load matlab/2024a
[johnchris@c001 parallel_matlab]$ matlab

Note: You get assigned to a node on the cluster (eg. c001 in this case). Then, you get the following message and the MATLAB window opens interactively on the terminal.

Note: Type each individual line, do not not copy paste these commands as a block of code or you may see errors.

>>c = parcluster;
>>c.AdditionalProperties.MemPerCPU=’6G’;
>>c.AdditionalProperties.WallTime = ’24:00:00′;
>>saveProfile(c)
>>parpool(10)

Note: Please ensure to specify an appropriate wall time for your job as the MATLAB parallel process may encounter errors if wall time is not specified in the above code.

  1. Now call your function file using the batch command. The number after ‘Pool’ specifies
    the number of workers to be used (change this according to your needs). The batch
    command returns a job object, j.

>>j = c.batch(@parabatch_matlab,1{},’Pool’,10,’Currentfolder’,’.’,’AutoAddClientPath’,false)

Note: Be aware that in addition to the number of workers that you ask for in your ‘Pool’ argument, you will also receive an additional “Orchestrator” worker that is required for job execution. Therefore, a request of 10 pool workers will result in 11 total tasks from the scheduler.

5. ‘j.State’ tells you whether the job is queued (waiting to start), running, or finished.

>>j.State

  1. Use ‘j.fetchOutputs’ to retrieve function output arguments. Use ‘j.fetchOutputs{:}’ to display all contents in it. In this case, t = 198.7795 indicates that the program took 198.7795 seconds to run. If calling a batch with a script, use load instead. Data that has been written to files on the cluster needs to be retrieved directly from the file system, such as via FTP.

>> j.fetchOutputs

Note: outputs can only be fetched if the job is in state ‘finished’.

  1. You can view a list of your past and current jobs, as well as their IDs, using the ‘c.Jobs’ Command.

>> c.Jobs

  1. If a serial job produces an error, call the ‘getDebugLog’ method to view the error log file. When submitting independent jobs, with multiple tasks, specify the task number. You can also analyze the job’s log file output when debugging.

>> c.getDebugLog(j.Tasks(3))

>> c.getDebugLog(j)
>>j.getTaskSchedulerIDs{:}

LOG FILE OUTPUT:

SPMD Example

Single Program Multiple Data (SPMD) is an advanced construct that enables communication
amongst different workers (processors) throughout the computation, and customization of tasks
across workers. Under SMPD, each worker has a unique index, spmdIndex. In the example
below, workers 1-10 are assigned to only take the loops with the same digit as their spmdIndex

  1. Create a Slurm file (spmd_test.sl) that calls your MATLAB code file (spmd_matlab.m).

#!/bin/bash
#SBATCH –job-name=spmd
#SBATCH –time=12:00:00
#SBATCH –ntasks=1
#SBATCH –nodes=1
#SBATCH –partition=shared
#SBATCH –mem-per-cpu=4G
#SBATCH –cpus-per-task=1
#SBATCH –mail-type=ALL
#SBATCH –mail-user=username@bc.edu

module load matlab/2024a
cd /mmfs1/data/johnchris/parallel_matlab
matlab -batch spmd_matlab >&spmd_matlab.out

Note: Bold denotes input individual users MUST change.

  1. Configure the MATLAB code (spmd_matlab.m) with desired parallel computing process:

c = parcluster;
c.AdditionalProperties.WallTime=’6:00:00′;
c.AdditionalProperties.MemPerCPU=’6G’;
c.AdditionalProperties.ProcsPerNode=10;
c.AdditionalProperties.Partition=’shared’;
saveProfile(c)
parpool(10)
tic
spmd(10)
n = 20000;
A = 500;
a = zeros(1,n);
for i = 1:n
if mod(i, 10) == mod(labindex,10)
a(i) = max(abs(eig(rand(A))));
end
end
end
toc
delete(gcp(‘nocreate’));

3. The other steps are the same as Step 1 to Step 5 in the “Submitting Jobs” section.

Scroll to Top