# Quickstart tutorial To learn to use Dingo, we recommend starting with the examples provided in the [examples/](https://github.com/dingo-gw/dingo/tree/main/examples) folder. The YAML files contained in this directory (and subdirectories) contain configuration settings for the various Dingo tasks (constructing training data, training networks, and performing inference). These files should be provided as input to the command-line scripts, which then run Dingo and save output files. These output files contain as metadata the settings in the YAML files, and they may usually be inspected by running `dingo_ls`. ```{mermaid} flowchart TB dataset_settings[dataset_settings.yaml] dataset_settings-->generate_dataset(["dingo_generate_dataset #nbsp; #nbsp; --settings_file dataset_settings.yaml #nbsp; #nbsp; --out_file waveform_dataset.hdf5"]) style generate_dataset text-align:left asd_settings[asd_dataset_settings.yaml] asd_settings-->generate_asd(["generate_asd_dataset #nbsp; #nbsp; --settings_file dataset_settings.yaml #nbsp; #nbsp; --data_dir asd_dataset"]) style generate_asd text-align:left train_init(["dingo_train #nbsp; #nbsp; --settings_file train_settings_init.yaml #nbsp; #nbsp; --train_dir model_init"]) style train_init text-align:left train_settings_init[train_settings_init.yaml] train_settings_init-->train_init generate_dataset--->train_init generate_asd--->train_init generate_dataset--->train_main(["dingo_train #nbsp; #nbsp; --settings_file train_settings_main.yaml #nbsp; #nbsp; --train_dir model_main"]) style train_main text-align:left train_settings_main[train_settings_main.yaml] generate_asd--->train_main train_settings_main-->train_main train_init-->inference(["dingo_pipe GW150914.ini"]) style inference text-align:left train_main-->inference inference-->samples[GW150914_data0_1126259462-4_sampling.hdf5] ``` After configuring the settings files, the scripts may be used as follows, assuming the Dingo `venv` is active. ## Generate training data ### Waveforms To generate a waveform dataset for training, execute ``` dingo_generate_dataset --settings_file waveform_dataset_settings.yaml --num_processes N --out_file waveform_dataset.hdf5 ``` where `N` is the number of processes you would like to use to generate the waveforms in parallel. This saves the dataset of waveform polarizations in the file `waveform_dataset.hdf5` (typically compressed using SVD, depending on configuration). One can use `dingo_generate_dataset_dag` to set up a condor DAG for generating waveforms on a cluster. This is typically useful for slower waveform models. ### Noise ASDs Training also requires a dataset of noise ASDs, which are sampled randomly for each training sample. To generate this dataset based on noise observed during a run, execute ``` dingo_generate_ASD_dataset --data_dir data_dir --settings_file asd_dataset_settings.yaml ``` This will download data from [GWOSC](https://www.gw-openscience.org) and create a `/tmp` directory, in which the estimated PSDs are stored. Subsequently, these are collected together into a final `.hdf5` ASD dataset. If no `settings_file` is passed, the script will attempt to use the default one `data_dir/asd_dataset_settings.yaml`. ## Training With a waveform dataset and ASD dataset(s), one can train a neural network. Configure the `train_settings.yaml` file to point to these datasets, and run ``` dingo_train --settings_file train_settings.yaml --train_dir train_dir ``` This will configure the network, train it, and store checkpoints, a record of the history, and the final network in the directory `train_dir`. Alternatively, to resume training from a checkpoint file, run ``` dingo_train --checkpoint model.pt --train_dir train_dir ``` If using CUDA on a machine with several GPUs, be sure to first select the desired GPU number using the `CUDA_VISIBLE_DEVICES` environment variable. If using a cluster, Dingo can be trained using `dingo_train_condor`. Example training files can be found under `examples/training`. `train_settings_toy.yaml` and `train_settings_production.yaml` train a flow to estimate the full posterior of the event conditioned on the time of coalescence in the detectors. The "toy" label is to indicate this should NOT be used for production but rather to get a feel for the Dingo pipeline. The production settings contain tested settings. Note that depending on the waveform model and event, these may need to occasionally be tuned. `train_settings_init_toy.yaml` and `train_settings_init_production.yaml` train flows to estimate the time of coalescence in the individual detectors. These two networks are needed to use [GNPE](gnpe.md). This is the preferred and most tested way of using Dingo. Alternatively, the `train_settings_no_gnpe_toy.yaml` and `train_settings_no_gnpe_production.yaml` contain settings to train a network without the GNPE step. Note the lack of a `data/gnpe_time_shifts` option. While this is not recommended for production, it is still pedagogically useful and is good for prototyping new ideas or doing a less expensive training. ## Inference Once a Dingo model is trained, inference for real events can be performed using [dingo_pipe](dingo_pipe.md). There are 3 main inference steps, downloading the data, running Dingo on this data and finally running importance sampling. The basic idea is to create a .ini file which contains the filepaths of the Dingo networks trained above and the segment of data to analyze. An example .ini file can be found under `examples/pipe/GW150914.ini`. To do inference, cd into the directory with the .ini file and run ``` dingo_pipe GW150914.ini ```