GNPE model (production)
This tutorial has the highest profile settings and is the one typically used for production use. The main difference from the NPE tutorial is that here we are now using GNPE (group neural posterior estimation). The data generation is exactly the same as the previous tutorial, but we repeat it here, for completeness.
The file structure is similar to the NPE example, except now there are two
training sub-directories and two train_settings.yaml files.
gnpe_model/
# config files
waveform_dataset_settings.yaml
asd_dataset_settings_fiducial.yaml
asd_dataset_settings.yaml
train_settings_main.yaml
train_settings_init.yaml
GW150914.ini
training_data/
waveform_dataset.hdf5
asd_dataset.hdf5
asd_dataset_fiducial.hdf5
asd_dataset_fiducial/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
asd_dataset/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
training/
main_train_dir/
model_050.pt
model_stage_0.pt
model_latest.pt
history.txt
# etc...
init_train_dir/
model_050.pt
model_stage_0.pt
model_latest.pt
history.txt
# etc...
outdir_GW150914/
# dingo_pipe output
Step 1 Generating a Waveform Dataset
First generate the directory structure:
cd gnpe_model
mkdir training_data
mkdir training
mkdir training/main_train_dir
mkdir training/init_train_dir
Generate the waveform dataset:
dingo_generate_dataset --settings waveform_dataset_settings.yaml --out_file training_data/waveform_dataset.hdf5
or using condor:
dingo_generate_dataset_dag --settings_file
waveform_dataset_settings.yaml --out_file
training_data/waveform_dataset.hdf5 --env_path $DINGO_VENV_PATH --num_jobs 4
--request_cpus 16 --request_memory 1280000 --request_memory_high 256000
Step 2 Generating an ASD dataset
As before we generate a fiducial ASD dataset containing a single ASD:
dingo_generate_asd_dataset --settings_file asd_dataset_settings_fiducial.yaml --data_dir
training_data/asd_dataset_fiducial --out_name training_data/asd_dataset_fiducial/asds_O1_fiducial.hdf5
and a large ASD dataset:
dingo_generate_asd_dataset --settings_file asd_dataset_settings.yaml --data_dir
training_data/asd_dataset --out_name training_data/asd_dataset/asds_O1.hdf5
Step 3 Training the network
Now we are ready for training using GNPE. Here we need to train two networks, one which estimates the time of arrival in the detectors and one which does the full inference task. A natural question is why train two networks. The main idea is if one is able to align (and thus standardize) the times of arrival in the detectors, the inference task will become significantly easier. To do this we first need to train an initialization network which estimates the time of arrival in the detectors:
dingo_train --settings_file train_settings_init.yaml --train_dir training/init_train_dir
Notice that the inference parameters are only the H1_time and L1_time. Also notice that the embedding_net
is significantly smaller and the number of flow steps, num_flow_steps is reduced.
dingo_train --settings_file train_settings.yaml --train_dir training/main_train_dir
Notice the data.gnpe_time_shifts section. The kernel describes how much to blur the GNPE proxies and is specified in
seconds. To read more about this see GNPE.
Step 4 Doing Inference
Performing inference requires a few changes to the previous NPE setup. Most notably, since we are now using GNPE, we
have to specify the file path to both the initialization network and the main network. Another
difference is the new attribute under sampler arguments num-gnpe-iterations which indicates the
number of GNPE steps to take. If the initialization network is not fully converged or if
the length of the segment being analyzed is very long, it is recommended to increase this number.
dingo_pipe GW150914.ini