BC’s production HPC cluster, Andromeda, has been given an upgrade which has temporarily been named Andromeda 2. More than just an updated OS or addition of computational resources, this is a full evolution.
Storage
Moving on from the Pixstor GPFS storage area that has served Andromeda since 2020, Andromeda 2 is equipped with a Weka file system that’s designed to scale with the community’s needs. Boasting an initial 1.5PB of storage with an NVME frontend, Weka is poised for rapid capacity expansion up to 3PB that will bring even more performance as the community of researchers continues to grow.
When it comes to critical data storage, capacity and performance are nothing without stability and redundancy. Andromeda 2’s Weka storage area is comprised of 12 frontend server, but only 10 of those are required for peak performance. Which allows for greater uptime since the storage area can be updated while the file system is in use and greater stability due to its resiliency against even a double node failure. As well as full snapshot replication to a separate building on on campus to ensure that researcher’s data is safe if something catastrophic were to happen in the primary data center.
Service Redundancy
Once migrations to Andromeda 2 are complete, the head and login node from the old cluster will be repurposed to provide those same services for the upgraded cluster. This won’t bring greater performance, but it will provide redundancy that will make the evolved for the cluster more resilient outages and the need for down time for security patches.
Ease of Access
The days of SSH and FTP access being sufficient have come and gone. Today’s researchers depend on GUI based interactive job sessions and full desktop functionality for their HPC workflows. Moving on the previous iteration of the cluster requiring researchers to install a NoMachine on their personal laptop and other hosts they wish to connect to the cluster from, Andromeda 2 is equipped with an Open OnDemand gateway that provides browser based access to cluster resources. Interactive desktop environment and even GUI based applications can be accessed using any modern, HTML5 enabled web browser. File transfers and editing, job submissions and monitoring, and interactive access to cluster resources are all possible through Andromeda 2’s Open OnDemand gateway.
Compute Resources
Existing resources from the old cluster are being repurposed as researchers migrate to using Andromeda 2 full time for their research. And new resources are added on a regular basis to continue to expand the cluster’s capabilities.
Hardware doesn’t last forever, so expansion isn’t linear. But, 2024 happened to be a year where linear expansion was possible. Allowing for the purchase of some very special compute resources. More specifically, big memory CPU nodes and a significant amount of GPU resources were purchased, and are scheduled to be added to Andromeda 2 before the end of the Spring 2025 semester.
High Speed Networking
Hyperscalers like Amazon and Google have to settle for ROCEv2 connectivity over high speed Ethernet, due to the logistical limitations of cabling data centers bigger than most football stadiums. BC’s HPC cluster are fitted with a premium Infiniband fabric for lossless, low latency connectivity across the cluster. With speeds measured in the hundreds of gigabits per second, Infiniband is the gold standard for HPC workloads.
Custom Dedicated Computing Solutions
Sometimes the research just can’t wait for shared resources to become available or the standard “shapes” just don’t align well with a workflow, and that’s when BC’s HPC cluster can really save the day. Researchers in need of something more, something that’s always available when they are, Research Services can help tailor a purchase of dedicated hardware that Research Services will manage as part of the cluster for the researcher’s private resource. All the best parts of the using the cluster with it’s Weka storage and low latency Infiniband fabric, and a private partition with their dedicated hardware.
Research Support
On top of managing and maintaining Andromeda’s HPC resources Research Services also supports the use of the cluster and other aspects of the research efforts here at BC, such as Data Acquisition and Statistical Analysis. The Research Services team of statisticians, data scientists, consultants, systems administrators and graduate assistants work together leverage cutting edge technology to meet the ever evolving needs of the the research community here at Boston College.