GNPE model (production)

This tutorial has the highest profile settings and is the one typically used for production use. The main difference from the NPE tutorial is that here we are now using GNPE (group neural posterior estimation). The data generation is exactly the same as the previous tutorial, but we repeat it here, for completeness.

The file structure is similar to the NPE example, except now there are two training sub-directories and two train_settings.yaml files.

gnpe_model/

    #  config files
    waveform_dataset_settings.yaml
    asd_dataset_settings_fiducial.yaml
    asd_dataset_settings.yaml
    train_settings_main.yaml
    train_settings_init.yaml
    GW150914.ini

    training_data/
        waveform_dataset.hdf5
        asd_dataset.hdf5
        asd_dataset_fiducial.hdf5
        asd_dataset_fiducial/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
        asd_dataset/ # Contains the asd_dataset.hdf5 and also temp files for asd generation

    training/
        main_train_dir/
            model_050.pt
            model_stage_0.pt
            model_latest.pt
            history.txt
            #  etc...
        init_train_dir/
            model_050.pt
            model_stage_0.pt
            model_latest.pt
            history.txt
            #  etc...

    outdir_GW150914/
        #  dingo_pipe output

Step 1 Generating a Waveform Dataset

First generate the directory structure:

cd gnpe_model
mkdir training_data
mkdir training
mkdir training/main_train_dir
mkdir training/init_train_dir

Generate the waveform dataset:

dingo_generate_dataset --settings waveform_dataset_settings.yaml --out_file training_data/waveform_dataset.hdf5

or using condor:

dingo_generate_dataset_dag --settings_file
waveform_dataset_settings.yaml --out_file
training_data/waveform_dataset.hdf5 --env_path $DINGO_VENV_PATH --num_jobs 4
--request_cpus 16 --request_memory 1280000 --request_memory_high 256000

Step 2 Generating an ASD dataset

As before we generate a fiducial ASD dataset containing a single ASD:

dingo_generate_asd_dataset --settings_file asd_dataset_settings_fiducial.yaml --data_dir
training_data/asd_dataset_fiducial --out_name training_data/asd_dataset_fiducial/asds_O1_fiducial.hdf5

and a large ASD dataset:

dingo_generate_asd_dataset --settings_file asd_dataset_settings.yaml --data_dir
training_data/asd_dataset --out_name training_data/asd_dataset/asds_O1.hdf5

Step 3 Training the network

Now we are ready for training using GNPE. Here we need to train two networks, one which estimates the time of arrival in the detectors and one which does the full inference task. A natural question is why train two networks. The main idea is if one is able to align (and thus standardize) the times of arrival in the detectors, the inference task will become significantly easier. To do this we first need to train an initialization network which estimates the time of arrival in the detectors:

dingo_train --settings_file train_settings_init.yaml --train_dir training/init_train_dir

Notice that the inference parameters are only the H1_time and L1_time. Also notice that the embedding_net is significantly smaller and the number of flow steps, num_flow_steps is reduced.

dingo_train --settings_file train_settings.yaml --train_dir training/main_train_dir

Notice the data.gnpe_time_shifts section. The kernel describes how much to blur the GNPE proxies and is specified in seconds. To read more about this see GNPE.

Step 4 Doing Inference

Performing inference requires a few changes to the previous NPE setup. Most notably, since we are now using GNPE, we have to specify the file path to both the initialization network and the main network. Another difference is the new attribute under sampler arguments num-gnpe-iterations which indicates the number of GNPE steps to take. If the initialization network is not fully converged or if the length of the segment being analyzed is very long, it is recommended to increase this number.

dingo_pipe GW150914.ini