# Overview

Dingo performs gravitational-wave (GW) parameter estimation using [**neural posterior estimation**](sbi.md). The basic idea is to train a neural network (a normalizing flow) to represent the Bayesian posterior distribution $p(\theta|d)$ for GW parameters $\theta$ given observed data $d$. Training can take some time (typically, a week for a production-level model) but once trained, inference is very fast (just a few seconds). 

## Basic workflow

The basic workflow for using Dingo is as follows:

1. **Prepare training data.** This consists of pairs of intrinsic parameters and [waveform polarizations](waveform_dataset.ipynb), as well as [noise PSDs](noise_dataset.ipynb). Training parameters are drawn from the prior distribution, and [waveforms are simulated](generating_waveforms.ipynb) using a waveform model.
2. **Train a model.** [Build a neural network](network_architecture.ipynb) and [simulate data sets](training_transforms.ipynb) (noisy waveforms in detectors). [Train the model](training.md) to infer parameters based on the data.
3. **[Perform inference](dingo_pipe.md) on new data** using the trained model.

In many cases, a user may have downloaded a pre-trained model. If so, there is no need to carry out the first two steps, and one may instead skip to **step 3**.

## Command-line interface

In most cases, we expect Dingo to be called from the command line. Dingo commands begin with the prefix `dingo_`. There can be a large number of configurations options for many tasks, so in such cases, rather than specify all settings as arguments, Dingo commands take a single YAML or INI file containing all settings. As described in the [quickstart tutorial](quickstart.md), it is best to begin with settings files provided in the [examples/](https://github.com/dingo-gw/dingo/tree/main/examples) folder, modifying them as necessary.

### Summary of commands

Here we provide a list of key user commands along with brief descriptions. The commands for carrying out the main tasks above are

```{table}

| Command | Description |
|---|---|
|`dingo_generate_dataset`| Generate a training dataset of waveform polarizations. |
|`dingo_generate_ASD_dataset`| Generate a training dataset of detector noise ASDs. |
|`dingo_train`| Build and train a neural network. |
|`dingo_pipe`| Perform inference on data (real or simulated), starting from an INI file. |
```

Building a training dataset and training a model can be very expensive tasks. We therefore expect these to be frequently run on clusters, and for this reason provided [HTCondor](https://htcondor.readthedocs.io/en/latest/) versions of these commands (note that `dingo_pipe` is already HTCondor-compatible):

```{table}

| Command | Description |
|---|---|
|`dingo_generate_dataset_dag`| HTCondor version of `dingo_generate_dataset`. |
|`dingo_train_condor`| HTCondor version of `dingo_train`. |
```

Finally, there are several utility commands that are useful for working with Dingo-produced files:

```{table}

| Command | Description |
|---|---|
|`dingo_ls`| Inspect a file produced by Dingo and print a summary.|
|`dingo_append_training_stage`| Modify the training plan of a model checkpoint.|
|`dingo_pt_to_hdf5`| Convert a trained Dingo model from a PyTorch pickle .pt file to HDF5.|
```

```{hint}

The `dingo_ls` command is very useful for inspecting Dingo files. It will print all settings that went in to producing the file, as well as some derived quantities.
```

### File types

As noted above, most Dingo commands take a YAML file to specify configuration options (except for `dingo_pipe`, which uses an INI file, as is standard for LVK parameter estimation). When run, these commands generate data, which is usually stored in HDF5 files. One exception is when training a neural network. This saves the network weights using the PyTorch `.pt` format. However, primarily for LVK use, `dingo_pt_to_hdf5` can convert the weights of a trained model to a HDF5 file.

```{important}

In all cases, Dingo will save the YAML file settings within the final output file. This is needed for downstream tasks and for maintaining reproducibility.
```


## GNPE

A slightly more complicated workflow occurs when using [](gnpe.md). GNPE is an algorithm that combines physical symmetries with Gibbs sampling to significantly improve results. When using GNPE, however, it is necessary to train **two networks**---one main (conditional) network that will be repeatedly sampled during Gibbs sampling and one smaller network used to initialize the Gibbs sampler. At inference time, `dingo_pipe` must be pointed to **both** of these networks. See the section on [GNPE usage](gnpe.md#usage) for further details.