Quickstart tutorial

To learn to use Dingo, we recommend starting with the examples provided in the examples/ folder. The YAML files contained in this directory (and subdirectories) contain configuration settings for the various Dingo tasks (constructing training data, training networks, and performing inference). These files should be provided as input to the command-line scripts, which then run Dingo and save output files. These output files contain as metadata the settings in the YAML files, and they may usually be inspected by running dingo_ls.

        flowchart TB
    dataset_settings[dataset_settings.yaml]
    dataset_settings-->generate_dataset(["dingo_generate_dataset
    #nbsp; #nbsp; --settings_file dataset_settings.yaml
    #nbsp; #nbsp; --out_file waveform_dataset.hdf5"])
    style generate_dataset text-align:left
    asd_settings[asd_dataset_settings.yaml]
    asd_settings-->generate_asd(["generate_asd_dataset
    #nbsp; #nbsp; --settings_file dataset_settings.yaml
    #nbsp; #nbsp; --data_dir asd_dataset"])
    style generate_asd text-align:left
    train_init(["dingo_train 
    #nbsp; #nbsp; --settings_file train_settings_init.yaml
    #nbsp; #nbsp; --train_dir model_init"])
    style train_init text-align:left
    train_settings_init[train_settings_init.yaml]
    train_settings_init-->train_init
    generate_dataset--->train_init
    generate_asd--->train_init
    generate_dataset--->train_main(["dingo_train 
    #nbsp; #nbsp; --settings_file train_settings_main.yaml
    #nbsp; #nbsp; --train_dir model_main"])
    style train_main text-align:left
    train_settings_main[train_settings_main.yaml]
    generate_asd--->train_main
    train_settings_main-->train_main
    train_init-->inference(["dingo_pipe GW150914.ini"])
    style inference text-align:left
    train_main-->inference
    inference-->samples[GW150914_data0_1126259462-4_sampling.hdf5]
    

After configuring the settings files, the scripts may be used as follows, assuming the Dingo venv is active.

Generate training data

Waveforms

To generate a waveform dataset for training, execute

dingo_generate_dataset --settings_file waveform_dataset_settings.yaml --num_processes N --out_file waveform_dataset.hdf5

where N is the number of processes you would like to use to generate the waveforms in parallel. This saves the dataset of waveform polarizations in the file waveform_dataset.hdf5 (typically compressed using SVD, depending on configuration).

One can use dingo_generate_dataset_dag to set up a condor DAG for generating waveforms on a cluster. This is typically useful for slower waveform models.

Noise ASDs

Training also requires a dataset of noise ASDs, which are sampled randomly for each training sample. To generate this dataset based on noise observed during a run, execute

dingo_generate_ASD_dataset --data_dir data_dir --settings_file asd_dataset_settings.yaml

This will download data from GWOSC and create a /tmp directory, in which the estimated PSDs are stored. Subsequently, these are collected together into a final .hdf5 ASD dataset. If no settings_file is passed, the script will attempt to use the default one data_dir/asd_dataset_settings.yaml.

Training

With a waveform dataset and ASD dataset(s), one can train a neural network. Configure the train_settings.yaml file to point to these datasets, and run

dingo_train --settings_file train_settings.yaml --train_dir train_dir

This will configure the network, train it, and store checkpoints, a record of the history, and the final network in the directory train_dir. Alternatively, to resume training from a checkpoint file, run

dingo_train --checkpoint model.pt --train_dir train_dir

If using CUDA on a machine with several GPUs, be sure to first select the desired GPU number using the CUDA_VISIBLE_DEVICES environment variable. If using a cluster, Dingo can be trained using dingo_train_condor.

Example training files can be found under examples/training. train_settings_toy.yaml and train_settings_production.yaml train a flow to estimate the full posterior of the event conditioned on the time of coalescence in the detectors. The “toy” label is to indicate this should NOT be used for production but rather to get a feel for the Dingo pipeline. The production settings contain tested settings. Note that depending on the waveform model and event, these may need to occasionally be tuned. train_settings_init_toy.yaml and train_settings_init_production.yaml train flows to estimate the time of coalescence in the individual detectors. These two networks are needed to use GNPE. This is the preferred and most tested way of using Dingo.

Alternatively, the train_settings_no_gnpe_toy.yaml and train_settings_no_gnpe_production.yaml contain settings to train a network without the GNPE step. Note the lack of a data/gnpe_time_shifts option. While this is not recommended for production, it is still pedagogically useful and is good for prototyping new ideas or doing a less expensive training.

Inference

Once a Dingo model is trained, inference for real events can be performed using dingo_pipe. There are 3 main inference steps, downloading the data, running Dingo on this data and finally running importance sampling. The basic idea is to create a .ini file which contains the filepaths of the Dingo networks trained above and the segment of data to analyze. An example .ini file can be found under examples/pipe/GW150914.ini.

To do inference, cd into the directory with the .ini file and run

dingo_pipe GW150914.ini