Quickstart tutorial
To learn to use Dingo, we recommend starting with the examples provided in the examples/
folder. The YAML files contained in this directory (and subdirectories) contain
configuration settings for the various Dingo tasks (constructing training data, training networks, and performing inference). These files should be provided as input to the
command-line scripts, which then run Dingo and save output files. These output files
contain as metadata the settings in the YAML files, and they may usually be inspected
by running dingo_ls.
flowchart TB
dataset_settings[dataset_settings.yaml]
dataset_settings-->generate_dataset(["dingo_generate_dataset
#nbsp; #nbsp; --settings_file dataset_settings.yaml
#nbsp; #nbsp; --out_file waveform_dataset.hdf5"])
style generate_dataset text-align:left
asd_settings[asd_dataset_settings.yaml]
asd_settings-->generate_asd(["generate_asd_dataset
#nbsp; #nbsp; --settings_file dataset_settings.yaml
#nbsp; #nbsp; --data_dir asd_dataset"])
style generate_asd text-align:left
train_init(["dingo_train
#nbsp; #nbsp; --settings_file train_settings_init.yaml
#nbsp; #nbsp; --train_dir model_init"])
style train_init text-align:left
train_settings_init[train_settings_init.yaml]
train_settings_init-->train_init
generate_dataset--->train_init
generate_asd--->train_init
generate_dataset--->train_main(["dingo_train
#nbsp; #nbsp; --settings_file train_settings_main.yaml
#nbsp; #nbsp; --train_dir model_main"])
style train_main text-align:left
train_settings_main[train_settings_main.yaml]
generate_asd--->train_main
train_settings_main-->train_main
train_init-->inference(["dingo_pipe GW150914.ini"])
style inference text-align:left
train_main-->inference
inference-->samples[GW150914_data0_1126259462-4_sampling.hdf5]
After configuring the settings files, the scripts may be used as follows, assuming the
Dingo venv is active.
Generate training data
Waveforms
To generate a waveform dataset for training, execute
dingo_generate_dataset --settings_file waveform_dataset_settings.yaml --num_processes N --out_file waveform_dataset.hdf5
where N is the number of processes you would like to use to generate the waveforms in
parallel. This saves the dataset of waveform polarizations in the
file waveform_dataset.hdf5 (typically compressed using SVD, depending on configuration).
One can use dingo_generate_dataset_dag to set up a condor DAG for generating waveforms
on a cluster. This is typically useful for slower waveform models.
Noise ASDs
Training also requires a dataset of noise ASDs, which are sampled randomly for each training sample. To generate this dataset based on noise observed during a run, execute
dingo_generate_ASD_dataset --data_dir data_dir --settings_file asd_dataset_settings.yaml
This will download data from GWOSC and create a /tmp directory, in which the
estimated PSDs are stored. Subsequently, these are collected together into a final .hdf5
ASD dataset.
If no settings_file is passed, the script will attempt to use the default
one data_dir/asd_dataset_settings.yaml.
Training
With a waveform dataset and ASD dataset(s), one can train a neural network. Configure
the train_settings.yaml file to point to these datasets, and run
dingo_train --settings_file train_settings.yaml --train_dir train_dir
This will configure the network, train it, and store checkpoints, a record of the history,
and the final network in the directory train_dir. Alternatively, to resume training from
a checkpoint file, run
dingo_train --checkpoint model.pt --train_dir train_dir
If using CUDA on a machine with several GPUs, be sure to first select the desired GPU
number using the CUDA_VISIBLE_DEVICES environment variable. If using a cluster, Dingo
can be trained using dingo_train_condor.
Example training files can be found under examples/training.
train_settings_toy.yaml and train_settings_production.yaml train a flow to
estimate the full posterior of the event conditioned on the time of coalescence
in the detectors. The “toy” label is to indicate this should NOT be used for production but
rather to get a feel for the Dingo pipeline. The production settings contain tested
settings. Note that depending on the waveform model and event, these may need to occasionally
be tuned. train_settings_init_toy.yaml and train_settings_init_production.yaml train
flows to estimate the time of coalescence in the individual detectors. These two
networks are needed to use GNPE. This is the preferred and
most tested way of using Dingo.
Alternatively, the train_settings_no_gnpe_toy.yaml and
train_settings_no_gnpe_production.yaml contain settings to train a network
without the GNPE step. Note the lack of a data/gnpe_time_shifts option. While this is not
recommended for production, it is still pedagogically useful and is good for prototyping
new ideas or doing a less expensive training.
Inference
Once a Dingo model is trained, inference for real events can be performed using
dingo_pipe. There are 3 main inference steps, downloading the data,
running Dingo on this data and finally running importance sampling. The basic
idea is to create a .ini file which contains the filepaths of the Dingo networks
trained above and the segment of data to analyze. An example .ini file can be
found under examples/pipe/GW150914.ini.
To do inference, cd into the directory with the .ini file and run
dingo_pipe GW150914.ini