Weka File System

One of the key features that was added to the cluster in 2024 was the Weka storage area. This new file system has an NVME based front end to satisfy even the most demanding workload’s need for throughput and IOPS. With snapshots of key storage locations taken regularly and stored in a separate building on campus.

Quotas

To help ensure the functionality of this premium storage system a set of quotas have been applied to the different storage areas that are available to researchers workloads running on the cluster.

  • The home directory /home/<login> has a 50G quota
    • Intended for storing personal data only
    • This space cannot be used to share data with a group
    • 30 days of nightly snapshots
  • The project directory /projects/<project_name> has a 1TB initial quota
    • Intended for storing research related data
    • Can be used to share data with a group
    • 30 days of nightly snapshots
  • The no-backup directory /nbu/<project_name> has a 1TB initial quota
    • Intended for non-ephemeral data the should not be backed up
    • Can be used to share data with a group
    • No snapshots are taken of this data
  • The scratch directory /scratch/<login> has a 1TB initial quota
    • Intended for storing ephemeral data that can easily be replaced
    • Data stored here should only be needed for a short time (e.g. While a job is running.)
    • Is not intended for sharing data with a group
    • No snapshots are taken of this data

While the 50G quota for home directories cannot be changed, initial quotas for the other areas can be increased by submitting a request to the Research Help page. Those needing storage capacity above 10TB in any of the paths above will be asked to provide details about the use case and duration of the need.

NOTE: Storage consumption is calculated for each Principle Investigator as a sum total of capacity allocated to all of their projects in /projects and /nbu.

Snapshots

Certain parts of the Weka File System have snapshots taken nightly, which can be accessed via the following folder paths:

  • /home/.snapshots
  • /projects/.snapshots

Inside each of those directories you’ll find a sub-directory like “@GMT-2025.02.01-07.15.01”, which is named after the date and time the snapshot of the parent folder was taken. Inside that “@GMT…” directory you’ll find a read-only copy of the data that was in the parent folder at that date and time.

Daily snapshots aren’t kept indefinitely, as they do consume significant amounts of storage capacity. We aim to keep at least the last 14 to 30 nightly snapshots, depending on how full Weka is at the time. And when capacity usage is low we may also keep a few more snapshots beyond 30 days old.

Let’s say there’s a snapshot of a user’s home directory in /home/.snapshots from March 15th, 2025, that they want to check for a previous version of a file. To get to that snapshot of their home directory they could run the following.

cd /home/.snapshots/@GMT-2025.03.15-07.15.01/$USER

Any changes to /home/$USER since March 15th 2025 at 7:15am GMT will not appear in that folder and the contents of the snapshot cannot be changed. They can be read, executed or copied, as needed.

IMPORTANT: If you run a script from inside a snapshot be careful of the paths the script references!!!

Depending on whether there are full or relative paths used in the script will change whether the script follows the path relative to the contents of the snapshot or the full path to the current version of the file system.

Scroll to Top