Orchestration for HPC systems¶
The goal of this guide is to show and guide you on how to handle co-simulation orchestration on high-performance computing systems. We will walk through using a specific tool (Merlin) that has been tested with HELICS co-simulations. This is not the only tool that exists that has this capability and is not a requirement for co-simulation orchestration. One advantage that Merlin has is its ability to interface with HPC systems that have SLURM or Flux as their resource managers.
Definition of “Orchestration” in HELICS¶
We will define the term “orchestration” within HELICS as workflow and deployment in an HPC environment. This will allow users to define a co-simulation workflow they would like to execute and deploy the co-simulation either on their own machine or in an HPC environment.
Orchestration with Merlin¶
First you will need to build and install Merlin. This guide will walk through a suggested co-simulation spec using Merlin to launch a HELICS co-simulation. This is not a comprehensive guide on how to use Merlin, but a guide to use Merlin for HELICS co-simulation orchestration. For a full guide on how to use Merlin, please refer to the Merlin tutorial.
Merlin is a distributed task queuing system, designed to allow complex HPC workflows to scale to large numbers of simulations. It is designed to make building, running, and processing large scale HPC workflows manageable. It is not limited to HPC; it can also be set up on a single machine.
Merlin translates a command-line focused workflow into discrete tasks that it will queue up and launch. This workflow is called a specification, spec for short. The spec is separated into multiple sections that is used to describe how to execute the workflow. This workflow is then represented as a directed acyclic graph (DAG) which describes how the workflow executes.
Once the Merlin spec has been created, the main execution logic is contained in the Study step. This step describes how the applications or scripts need to be executed in the command line in order to execute your workflow. The study step is made up of multiple run steps that are represented as the nodes in the DAG.
For a more in-depth explanation on how Merlin works, take a look at their documentation here
Why Merlin¶
The biggest feature that Merlin will give HELICS users is its ability to deploy co-simulations in an HPC environment. Merlin has the ability to interface with both FLUX and SLURM workload managers that are installed on HPC machines. Merlin will handle the request for resource allocation, and take care of job and task distribution amongst the nodes. Users will not need to know how to use SLURM or FLUX because Merlin will handle all resource allocation calls to the workload manager, the user will only need to provide the number of nodes they need for their study.
Another benefit of using Merlin for HELICS co-simulation is its
flexibility to manage complex co-simulations. pyhelics
includes functionality to launch HELICS co-simulations using a JSON file that defines the federates of the co-simulation. As of this writing it does not have
the ability to analyze the data and launch subsequent co-simulations.
In this type of scenario a user could use Merlin to setup a
specification that included an analyze step in the Study step of
Merlin. The analysis step would determine if another co-simulation
was needed and the input to the next co-simulation, and would then
proceed to launch the co-simulation with the input generated by the
analysis step.
Merlin Specification¶
A Merlin specification has multiple parts that control how a co-simulation may run. Below we describe how each part can be used in a HELICS co-simulation workflow. For the sake of simplicity we are using the the pi-exchange python example that can be found here. The goal will be to have Merlin launch multiple pi-senders and pi-receivers.
Merlin workflow description and environment¶
Merlin has a description and an environment block. The description
block
provides the name and a short description of the study.
description:
name: Test helics
description: Juggle helics data
The env
block describes the environment that the study will execute
in. This is a place where you can set environment variables to control
the number of federates you may need in your co-simulation. In this example, N_SAMPLES
will be used to describe how many
pi-senders and pi-receivers (total federates) we want in our co-simulation.
env:
variables:
OUTPUT_PATH: ./helics_juggle_output
N_SAMPLES: 8
Merlin Step¶
The Merlin step is the input data generation step. This step describes how to create the initial inputs for the co-simulation so that subsequent steps can use this input to start the co-simulation. Below is how we might describe the Merlin step for our pi-exchange study.
merlin:
samples:
generate:
cmd: |
python3 $(SPECROOT)/make_samples.py $(N_SAMPLES) $(MERLIN_INFO)
cp $(SPECROOT)/pireceiver.py $(MERLIN_INFO)
cp $(SPECROOT)/pisender.py $(MERLIN_INFO)
file: samples.csv
column_labels: [FED]
NOTE: samples.csv is generated by make_samples.py. Each line in
samples.csv is a name of one of the json files that is created.
There is a python script called make_samples.py
located in the HELICS repository that generates
all helics-cli json configs that will be executed by helics-cli that
will be used to execute the co-simulations. N_SAMPLES
is an
environment variable that is set to 8, so in this example 8
pireceivers and 8 pisenders will be created and used in this
co-simulations. make_samples.py
also outputs the name of each
json file to a csv file called samples.csv
. samples.csv
contains the names of the json files that were generated. The
column_labels
tag tells Merlin to set each column in
samples.csv
to [FED]
. This means we can use FED
as a
variable in the study step. Below is an example of one of the json files that is
created.
{
"federates": [
{
"directory": ".",
"exec": "python3 -u pisender.py 0",
"host": "localhost",
"name": "pisender0"
}
],
"name": "pisender0"
}
This json file will then be used as the input file for helics-cli. The helics-cli will be executed in the study step in Merlin which we will go over next.
Study Step¶
The study step is where Merlin will execute all the steps specified in the block. Each step is denoted by a name and has a run segment. The run segment is where you will tell Merlin what commands need to be executed.
- name: start_federates <-- Name of the step
description: say Hello
run:
cmd: |
helics run --path=$(FED) <-- execute the HELICS runner for each column in samples.csv
echo "DONE"
In the example snippet we ask Merlin to execute the json file that was
created in the Merlin step. Since the FED
variable is a list,
this command will get executed for each index in FED
.
Full Spec¶
Below is the full Merlin spec that was created to make 8 pi-receivers and pi-senders and execute it as a Merlin workflow.
description:
name: Test helics
description: Juggle helics data
env:
variables:
OUTPUT_PATH: ./helics_juggle_output
N_SAMPLES: 8
merlin:
samples:
generate:
cmd: |
python3 $(SPECROOT)/make_samples.py $(N_SAMPLES) $(MERLIN_INFO)
cp $(SPECROOT)/pireceiver.py $(MERLIN_INFO)
cp $(SPECROOT)/pisender.py $(MERLIN_INFO)
file: samples.csv
column_labels: [FED]
study:
- name: start_federates
description: say Hello
run:
cmd: |
spack load helics
helics run --path=$(FED)
echo "DONE"
- name: cleanup
description: Clean up
run:
cmd: rm $(SPECROOT)/samples.csv
depends: [start_federates_*]
DAG of the spec¶
Finally, we can look at the DAG of the spec to visualize the steps in the Study.
Orchestration Example¶
An example of orchestrating multiple simulation runs (e.g. Monte Carlo co-simulation) is given in the Advanced Examples Section.