Installing Julia Packages Locally
If possible, it is recommended to install the Julia package in your home directory. The example below demonstrates how to do this on Andromeda.
1. Start an interactive session and load Julia
[johnchris@andromeda ~]$ interactive
Executing: srun --pty -N1 -n1 -c4 --mem=8g -pinteractive -t0-04:00:00 /bin/bash
Press any key to continue or ctrl+c to abort.
srun: job 710479 queued and waiting for resources
srun: job 710479 has been allocated resources
cpu-bind=MASK - c049, task 0 0 [3394141]: mask 0x888800000000 set
[johnchris@c049 ~]$ module load julia
2. Create a directory for local Julia packages
Here we create a directory called Julia_lib in the home directory, then launch Julia:
[johnchris@c049 ~]$ cd ~
[johnchris@c049 ~]$ mkdir Julia_lib
[johnchris@c049 ~]$ julia _
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.10.9 (2025-03-10)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
3. Activate the new package environment
In Julia, use Pkg.activate() to tell Julia to use the new directory for packages:
julia> import Pkg
julia> Pkg.activate("/home/johnchris/Julia_lib")
Activating new project at `~/Julia_lib`
Replace /home/johnchris with your actual home path.
4. Install and verify packages
Here, we install two packages:
julia> Pkg.add(name="ITensors", version="0.3.52")
julia> Pkg.add("StatsBase")
- Use Pkg.add(name=”PackageName”, version=”x.y.z”) to install a specific version.
- Use Pkg.add(“PackageName”) to install the default compatible version.
- Specifying the version is recommended to avoid unexpected compatibility issues and to ensure reproducibility.
Once the packages are installed, verify the installation with:
julia> Pkg.status()
Status `~/Julia_lib/Project.toml`
⌃ [9136182c] ITensors v0.3.52
[2913bbd2] StatsBase v0.34.6
When the installation is complete, you can exit Julia:
julia> exit()
The installed packages will be stored in your home directory.
In future Julia sessions, remember to activate the corresponding environment before using these packages.
To do this, add the following two lines at the very top of your .jl file (before any using or import statements):
import Pkg
Pkg.activate("/home/johnchris/Julia_lib")
Only after activation will Julia know to use the packages from that environment.
Running Multiple Julia Jobs with Parameters (Multiple Submissions)
1. Julia Script (log_ab.jl)
Suppose we want to compute log(a * b) for a range of values: a = 1, 2, 3, 4 and b = 2, 4, 6, 8.
Create a Julia script that reads a and b from command-line arguments:
a = parse(Float64, ARGS[1]) # 1,2,3,4
b = parse(Float64, ARGS[2]) # 2,4,6,8
println(a, ",", b, ",", log(a*b))
2. Single job submission (single_job.sl)
Submit the script via a Slurm wrapper that takes two arguments (a and b):
#!/bin/bash -l
#SBATCH --job-name=log_ab
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=4G
#SBATCH --time=01:00:00
#SBATCH --partition=short
module load julia
julia ../log_ab.jl $1 $2
Here, $1 and $2 are placeholders for the values of a and b, passed in when sbatch is called.
Example call (single run):
To submit one job (e.g. a=3 and b=4):
[johnchris@c049 test]$ sbatch single_job.sl 3 4
This will compute log(3*4).
Note: The script above calls ../log_ab.jl. Place single_job.sl inside a subdirectory and log_ab.jl in its parent, or adjust the path accordingly.
3. Wrapper Script for All Jobs (multiple_submission.sl)
To automate all 16 combinations of a and b:
#!/bin/bash -l
#SBATCH --job-name=multiple_submission
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --time=00:10:00
#SBATCH --partition=short
for a in $(seq 1 1 4)
do
for b in $(seq 2 2 8)
do
mkdir "a$a""b$b"
cd "a$a""b$b"
sbatch ../single_job.sl $a $b
cd ..
done
done
Each job is submitted from a separate subdirectory (a1b2, a1b4, …), which can help organize outputs.
4. Running the Batch
Make sure all the required files (multiple_submission.sl, single_job.sl and log_ab.jl) are in the targeted directory (for example, the test folder). Then, run:
[johnchris@c049 test]$ sbatch multiple_submission.sl
This will submit a batch of jobs corresponding to all combinations of a and b.
5. Monitoring the Jobs
You can check the queue with:
[johnchris@c049 test]$ squeue -u johnchris
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
710623 short log_ab gaoqc PD 0:00 1 (Priority)
710622 short log_ab gaoqc PD 0:00 1 (Priority)
...
710608 short log_ab gaoqc PD 0:00 1 (Priority)
Julia Parallel Computing
Running Multithreading Tasks Using Julia Threads
When you need to run multiple tasks in parallel with shared memory or communication between them, Julia’s multithreading is a simple and efficient solution.
This approach is suitable when your job runs on a single node with one task (–nodes=1, –ntasks=1), but uses multiple CPU cores.
1. Example: Generating Random Numbers in Parallel and Summing Them
Suppose we want to generate 10 random numbers between 0 and 1 in parallel, and then compute their sum.
The following Julia script (sum_of_random_num.jl) defines this task:
println("Number of threads: ", Threads.nthreads())
@time begin
num = 10
test = zeros(num)
Threads.@threads for i=1:num
sleep(1) # simulate a time-consuming task
test[i] = rand()
end
println(sum(test))
end
2. Slurm Script for Running the Multithreaded Job
To execute the script using 10 threads, configure Slurm as follows (sum_of_random_num.sl):
#!/bin/bash -l
#SBATCH --job-name=sum_of_random_sum
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10 # Request 10 CPUs for threading
#SBATCH --mem-per-cpu=4G
#SBATCH --time=01:00:00
#SBATCH --partition=short
module load julia
export JULIA_NUM_THREADS=$SLURM_CPUS_PER_TASK # Set number of threads for Julia
julia sum_of_random_num.jl
3. Running the Job
Make sure both sum_of_random_num.sl and sum_of_random_num.jl are in the targeted directory. Then, submit the job with:
[johnchris@c049 test]$ sbatch sum_of_random_num.sl
4. Example Output
In the output, we get
Number of threads: 10
4.847715876885147
1.140287 seconds (97.47 k allocations: 7.262 MiB, 33.40% compilation time)
This output confirms that:
- 10 threads were launched
- 10 random numbers were generated in parallel (note the ~1 second runtime despite sleep(1) in each thread)
- Their sum was correctly computed
This is a basic but effective demonstration of multithreading with shared memory in Julia.
Running Distributed and Hybrid Tasks
When your calculations need to go beyond multithreading, you can distribute multiple Julia processes across different workers on different nodes with separate memory spaces. In addition, you can use multithreading within each worker using multiple CPU cores to maximize efficiency in a hybrid method.
1. Example: Julia Scripts and Different Parallelization Methods
The following Julia script (example.jl) demonstrates different parallelization methods for computing the sum of an array of x^2 values.
(a) To start with, we need to use packages for distributed parallelization on cluster:
#!/usr/bin/env julia
using Distributed
using Base.Threads
using SlurmClusterManager
# Launch workers across nodes via SlurmManager
addprocs(SlurmManager())
@info "Master process: $(myid()), workers: $(workers()), threads per worker: $(Threads.nthreads())"
Where the “Distributed” package provides Julia’s distributed computing framework, and “addprocs(SlurmManager())” adds Julia worker processes distributed across the allocated nodes.
(b) Then we define the functions for our calculation of x^2 (expensive_function) and a multi-threaded function that calculates the sum of x^2 over an array for hybrid parallelization:
@everywhere function expensive_function(x)
sleep(0.1) # simulate heavy work
return x^2
end
@everywhere function threaded_sum(arr)
s = Threads.Atomic{Int64}(0)
Threads.@threads for x in arr
Threads.atomic_add!(s, expensive_function(x))
end
return s[]
end
Here, the “@everywhere” is very important, which makes the function available to all workers.
(c) Different Parallelization Methods
function sum_with_pmap(n)
results = pmap(expensive_function, 1:n)
return sum(results)
end
function sum_with_distributed(n)
return @distributed (+) for i in 1:n
expensive_function(i)
end
end
function sum_with_spawn(n)
tasks = [Distributed.@spawn expensive_function(i) for i in 1:n]
results = fetch.(tasks)
return sum(results)
end
function sum_with_hybrid(n)
arr = 1:n
nwrk = max(nworkers(), 1)
parts = Iterators.partition(1:length(arr), cld(length(arr), nwrk))
chunks = [arr[first(p):last(p)] for p in parts]
results = pmap(threaded_sum, chunks)
return sum(results)
end
2. Slurm Script for Running the Distributed Job
Use the slurm script with below options and test on different parameters (run_example.sl):
#!/bin/bash
#SBATCH --job-name=julia-demo
#SBATCH --output=julia-demo-%j.out
#SBATCH --error=julia-demo-%j.err
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=4
#SBATCH --time=00:10:00
module load julia
# Export number of threads to Julia runtime
export JULIA_NUM_THREADS=$SLURM_CPUS_PER_TASK
# Prevent CPU binding conflicts with Julia's process management
export SLURM_CPU_BIND=none
julia example.jl
3. Running the Example
Here is an example Julia script that compare different functions’ running time (add it at the bottom of example.jl):
if myid() == 1
println("Number of workers per process: ", nworkers())
println("Number of threads per process: ", Threads.nthreads())
n = 200 # size of computation
@time s1 = sum_with_pmap(n)
println("Result: ", s1)
@time s2 = sum_with_distributed(n)
println("Result: ", s2)
@time s3 = sum_with_spawn(n)
println("Result: ", s3)
@time s4 = sum_with_hybrid(n)
println("Result: ", s4)
println("All sums equal: ", s1 == s2 == s3 == s4)
end
Similar to previous parts of the tutorial, run the job:
sbatch run_example.sl
In the output, we get:
Number of workers per process: 4
Number of threads per process: 4
6.008759 seconds (2.15 M allocations: 146.670 MiB, 12.86% compilation time)
Result: 2686700
7.427056 seconds (480.65 k allocations: 32.088 MiB, 8.08% compilation time)
Result: 2686700
1.216503 seconds (360.94 k allocations: 24.434 MiB, 74.88% compilation time)
Result: 2686700
1.926931 seconds (787.02 k allocations: 53.600 MiB, 20.51% compilation time)
Result: 2686700
All sums equal: true
Which confirms that we are running 2*2=4 workers, and 4 threads per worker. The time here depends on specific functions and cluster setup.