MATLAB Parallel Computing on Andromeda

Initial Setup

1. Load and Open the MATLAB Module

Note: Normal text denotes user input, code blocks denote response from software.

[username@a002 ~]$ interactive
[username@c116 ~]$ rm -rf .matlab # First-time user ONLY
[username@c116 ~]$ module load matlab
[username@c116 ~]$ matlab

Note: You get assigned to a node on the cluster (c116 in this example). Then, you get the following message, and the MATLAB window opens interactively on the terminal.

Note: There is a space between rm and -rf, as well as another one between -rf and .matlab

2. Now we are in a MATLAB session. Run the MATLAB command:

Note: Type each individual line, do not copy-paste these commands as a block of code or you may see errors. ConfigCluster should only be called once per cluster. No need to do it for the second time.

>> configCluster

>> c = parcluster;
>> c.saveProfile;

3. Exit the MATLAB Session

>> exit
[username@c116 ~]$

Configuring Jobs

1. Create a Slurm File (parallel_matlab.sl) that calls your MATLAB code file which you will construct next.

#!/bin/bash
#SBATCH –job-name=sample # Job name
#SBATCH –time=12:00:00 # Time limit hrs:min:sec (max: 120 hours)
#SBATCH –ntasks=1 # Number of tasks
#SBATCH –nodes=1 # Number of nodes requested
#SBATCH –partition=short # Choice of partition
#SBATCH –cpus-per-task=1 # Number of CPU processors per task
#SBATCH –mem-per-cpu=4G # RAM per node (Max: 180GB/44 cores, 250GB/60 cores)
#SBATCH –mail-type=ALL # Mail notifications (NONE, BEGIN, END, FAIL, ALL)
#SBATCH –mail-user=username@bc.edu # BC email address

module load matlab
cd /projects/<proj_name>/parallel_matlab # Directory of MATLAB code file
matlab -batch parallel_matlab >&parallel_matlab.out

Note: Bold denotes input individual users MUST change. Ensure the time limit is sufficient for MATLAB execution.The cluster currently provides time-based partitions for CPU & GPU nodes. They are: long (up to 120 hours), medium (up to 48 hours), and short (up to 12 hours). Submitting a job this way requires you to edit c.AdditionalProperties to specify computational resources.

2. Add the Parallel Computing Command to MATLAB Code File (parallel_matlab.m)

c = parcluster;
%Corresponding to the time limit in the Slurm file
c.AdditionalProperties.WallTime = ’12:00:00′;
%Corresponding to CPU memory, per core
c.AdditionalProperties.MemPerCPU = ‘6G’;
%Number of processors requested per node, same as the number in parpool()
c.AdditionalProperties.ProcsPerNode = 10;
%If you wish to work across nodes, divide the number above by the number of nodes
%E.g. “c.AdditionalProperties.ProcsPerNode = 5;” will work on 2 nodes in this example
%Can be different from the node choice
c.AdditionalProperties.Partition= ‘short’;
saveProfile(c)
%Total CPU cores requested for the program, cannot exceed 512 cores
parpool(10)
%End of the parallel computing command, program codes start here
tic
n = 20000;
A = 500;
a = zeros(1,n);
parfor i=1:n
   if mod(i, 100) == 1
i
   end
   a(i) = max(abs(eig(rand(A))));
end
toc
delete(gcp(‘nocreate’));

Submitting Jobs

  1. Upload the Slurm file (parallel_matlab.sl) and the MATLAB code file (parallel_matlab.m) to the directory specified in your Slurm file (/projects/<proj_name>/parallel_matlab is given in the example, but you must change it to reflect your own file path) on the Andromeda cluster:

Note: It is advised to upload the Slurm file and the MATLAB code file in the same directory.

[username@a002 ~/parallel_matlab]$ ls

  1. Once logged in to the Andromeda cluster, open the directory you created on the terminal:

[username@a002 ~]$ cd parallel_matlab
[username@a002 ~/parallel_matlab]$

  1. Submit your job (the Slurm .sl file) onto Andromeda:

[username@a002 ~/parallel_matlab]$ sbatch parallel_matlab.sl

  1. You can check on the status of your jobs using the following code. “R” indicates that the job is running.

[username@a002 ~/parallel_matlab]$ squeue -u $user

Note: Job 2146825 (sample) is what the Slurm file calls directly. Job 2146825 (MATLAB_R) is the parallel pool that the code in parallel_matlab.m calls within MATLAB.

  1. Once your job is done, a file will appear (parallel_matlab.out), as named in the Slurm file, with the output of your MATLAB session.

[username@a002 ~/parallel_matlab]$ ls

Running Interactively

If you do not wish to use a Slurm file, you may instead write code directly in the terminal after entering an interactive session. This will take you to an available compute node on Andromeda.

[username@a002 ~]$ interactive
[username@c116 ~]$ module load matlab
[username@c116 ~]$ matlab

Note: You get assigned to a node on the cluster (eg. c116 in this case). Then, you get the following message and the MATLAB window opens interactively on the terminal.

You can now enter the code in MATLAB interactively as below:

Note: Type each individual line, do not not copy paste these commands as a block of code or you may see errors.

>> c = parcluster;
>> c.AdditionalProperties.MemPerCPU=’6G’;
>>c.AdditionalProperties.WallTime = ‘8:00:00’;
>>saveProfile(c)
>>parpool(10)

Note: This is not always a quick process, it can take time for resources to be allocated due to normal cluster queues.

You can now insert your desired MATLAB code:

>> tic
>> n = 20000;
>> A = 500;
>> a = zeros(1,n);
>> parfor i=1:n
>> if mod(i,100) == 1
>> i
>> end
>> a(i) = max(abs(eig(rand(A))));
>> end
>>toc
>>delete(gcp(‘nocreate’));

Screenshot: Example of MATLAB code and interactive output from the run.

>>exit
[johnchris@c116 ~]$

Batch Job

You can also use the ‘batch’ command to submit asynchronous jobs to the cluster. The batch command will return a job object which is used to access the output of the submitted job.

  1. Configure the MATLAB code (parabatch_matlab.m) with desired parallel computing process as a function:

function t = parabatch_matlab()
t0 = tic;
n = 20000;
A = 500;
a = zeros(1,n);
parfor i = 1:n
if mod(i, 100) == 1
i
end
a(i) = max(abs(eig(rand(A))));
end
t = toc(t0);
end

2. Open the specified directory on Andromeda:

[johnchris@l001 ~]$ cd parallel_matlab
[johnchris@l001 ~/parallel_matlab]$

  1. Use ‘srun’ to go to a Compute node in Andromeda. Then start a MATLAB session, and
    open a parcluster:

[johnchris@l001 ~/parallel_matlab]$ srun –job-name=sample –nodes=1 –ntasks=1 —
time=1:00:00 –mem=20G –pty bash -I

[johnchris@c001 parallel_matlab]$ module load matlab
[johnchris@c001 parallel_matlab]$ matlab

Note: You get assigned to a node on the cluster (eg. c001 in this case). Then, you get the following message and the MATLAB window opens interactively on the terminal.

Note: Type each individual line, do not not copy paste these commands as a block of code or you may see errors.

>>c = parcluster;
>>c.AdditionalProperties.MemPerCPU=’6G’;
>>c.AdditionalProperties.WallTime = ‘1:00:00’;
>>saveProfile(c)
>>parpool(10)

Note: Please ensure to specify an appropriate wall time for your job as the MATLAB parallel process may encounter errors if wall time is not specified in the above code.

  1. Now call your function file using the batch command. The number after ‘Pool’ specifies
    the number of workers to be used (change this according to your needs). The batch
    command returns a job object, j.

>>j = c.batch(@parabatch_matlab,1{},’Pool’,10,’Currentfolder’,’.’,’AutoAddClientPath’,false)

Note: Be aware that in addition to the number of workers that you ask for in your ‘Pool’ argument, you will also receive an additional “Orchestrator” worker that is required for job execution. Therefore, a request of 10 pool workers will result in 11 total tasks from the scheduler.

5. ‘j.State’ tells you whether the job is queued (waiting to start), running, or finished.

>>j.State

  1. Use ‘j.fetchOutputs’ to retrieve function output arguments. Use ‘j.fetchOutputs{:}’ to display all contents in it. In this case, t = 198.7795 indicates that the program took 198.7795 seconds to run. If calling a batch with a script, use load instead. Data that has been written to files on the cluster needs to be retrieved directly from the file system, such as via FTP.

>> j.fetchOutputs

Note: outputs can only be fetched if the job is in state ‘finished’.

  1. You can view a list of your past and current jobs, as well as their IDs, using the ‘c.Jobs’ Command.

>> c.Jobs

  1. If a serial job produces an error, call the ‘getDebugLog’ method to view the error log file. When submitting independent jobs, with multiple tasks, specify the task number. You can also analyze the job’s log file output when debugging.

>> c.getDebugLog(j.Tasks(3))

>> c.getDebugLog(j)
>>j.getTaskSchedulerIDs{:}

LOG FILE OUTPUT:

SPMD Example

Single Program Multiple Data (SPMD) is an advanced construct that enables communication
amongst different workers (processors) throughout the computation, and customization of tasks
across workers. Under SMPD, each worker has a unique index, spmdIndex. In the example
below, workers 1-10 are assigned to only take the loops with the same digit as their spmdIndex

  1. Create a Slurm file (spmd_test.sl) that calls your MATLAB code file (spmd_matlab.m).

#!/bin/bash
#SBATCH –job-name=spmd
#SBATCH –time=12:00:00
#SBATCH –ntasks=1
#SBATCH –nodes=1
#SBATCH –partition=shared
#SBATCH –mem-per-cpu=4G
#SBATCH –cpus-per-task=1
#SBATCH –mail-type=ALL
#SBATCH –mail-user=username@bc.edu

module load matlab/2024a
cd /mmfs1/data/johnchris/parallel_matlab
matlab -batch spmd_matlab >&spmd_matlab.out

Note: Bold denotes input individual users MUST change.

  1. Configure the MATLAB code (spmd_matlab.m) with desired parallel computing process:

c = parcluster;
c.AdditionalProperties.WallTime=’6:00:00′;
c.AdditionalProperties.MemPerCPU=’6G’;
c.AdditionalProperties.ProcsPerNode=10;
c.AdditionalProperties.Partition=’shared’;
saveProfile(c)
parpool(10)
tic
spmd(10)
n = 20000;
A = 500;
a = zeros(1,n);
for i = 1:n
if mod(i, 10) == mod(labindex,10)
a(i) = max(abs(eig(rand(A))));
end
end
end
toc
delete(gcp(‘nocreate’));

3. The other steps are the same as Step 1 to Step 5 in the “Submitting Jobs” section.

Scroll to Top