# GNPE model (production)

This tutorial has the highest profile settings and is the one typically used for production use.
The main difference from the [NPE](example_npe_model.md) tutorial is that here we are now using [GNPE](gnpe.md)
(group neural posterior estimation). The data generation is exactly the same as the [previous](example_npe_model.md)
tutorial, but we repeat it here, for completeness.

The file structure is similar to the NPE example, except now there are two
training sub-directories and two `train_settings.yaml` files. 

```
gnpe_model/

    #  config files
    waveform_dataset_settings.yaml
    asd_dataset_settings_fiducial.yaml
    asd_dataset_settings.yaml
    train_settings_main.yaml
    train_settings_init.yaml
    GW150914.ini

    training_data/
        waveform_dataset.hdf5
        asd_dataset.hdf5
        asd_dataset_fiducial.hdf5
        asd_dataset_fiducial/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
        asd_dataset/ # Contains the asd_dataset.hdf5 and also temp files for asd generation

    training/
        main_train_dir/
            model_050.pt
            model_stage_0.pt
            model_latest.pt
            history.txt
            #  etc...
        init_train_dir/
            model_050.pt
            model_stage_0.pt
            model_latest.pt
            history.txt
            #  etc...

    outdir_GW150914/
        #  dingo_pipe output
```

Step 1 Generating a Waveform Dataset
------------------------------------ 


First generate the directory structure:

```
cd gnpe_model
mkdir training_data
mkdir training
mkdir training/main_train_dir
mkdir training/init_train_dir
```

Generate the waveform dataset:

```
dingo_generate_dataset --settings waveform_dataset_settings.yaml --out_file training_data/waveform_dataset.hdf5
```

or using condor:

```
dingo_generate_dataset_dag --settings_file
waveform_dataset_settings.yaml --out_file
training_data/waveform_dataset.hdf5 --env_path $DINGO_VENV_PATH --num_jobs 4
--request_cpus 16 --request_memory 1280000 --request_memory_high 256000
```


Step 2 Generating an ASD dataset
--------------------------------

As before we generate a fiducial ASD dataset containing a single ASD:

```
dingo_generate_asd_dataset --settings_file asd_dataset_settings_fiducial.yaml --data_dir
training_data/asd_dataset_fiducial --out_name training_data/asd_dataset_fiducial/asds_O1_fiducial.hdf5
```

and a large ASD dataset:

```
dingo_generate_asd_dataset --settings_file asd_dataset_settings.yaml --data_dir
training_data/asd_dataset --out_name training_data/asd_dataset/asds_O1.hdf5
```


Step 3 Training the network
---------------------------

Now we are ready for training using GNPE. Here we need to train two networks, one which estimates the time of arrival 
in the detectors and one which does the full inference task. A natural question
is why train two networks. The main idea is if one is able to align (and thus
standardize) the times of arrival in the detectors, the inference task will
become significantly easier. To do this we first need to train an initialization
network which estimates the time of arrival in the detectors:

```
dingo_train --settings_file train_settings_init.yaml --train_dir training/init_train_dir
```

Notice that the inference parameters are only the `H1_time` and `L1_time`. Also notice that the embedding_net 
is significantly smaller and the number of flow steps, `num_flow_steps` is reduced.

```
dingo_train --settings_file train_settings.yaml --train_dir training/main_train_dir
```

Notice the `data.gnpe_time_shifts` section. The `kernel` describes how much to blur the GNPE proxies and is specified in 
seconds. To read more about this see [GNPE](gnpe.md).


Step 4 Doing Inference
----------------------

Performing inference requires a few changes to the previous NPE setup. Most notably, since we are now using GNPE, we 
have to specify the file path to both the initialization network and the main network. Another 
difference is the new attribute under sampler arguments `num-gnpe-iterations` which indicates the 
number of GNPE steps to take. If the initialization network is not fully converged or if 
the length of the segment being analyzed is very long, it is recommended to increase this number.

```
dingo_pipe GW150914.ini
```