NPE Model (production)
We will now do a tutorial with higher profile settings. Note these are not the full production settings used for runs since we are not using GNPE, but they should lead to decent results. Go to this tutorial for the full production network. The steps are the essentially same as the toy example but with higher level settings. It is recommended to run this on a cluster or GPU machine.
We can repeat the same first few steps from the previous tutorial with a couple
differences. The file structure is mostly the same but now there is an additional
asd_dataset_fiducial which will be explained below.
npe_model/
# config files
waveform_dataset_settings.yaml
asd_dataset_settings.yaml
asd_dataset_settings_fiducial.yaml
train_settings.yaml
GW150914.ini
training_data/
waveform_dataset.hdf5
asd_dataset_fiducial/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
asd_dataset/ # Contains the asd_dataset.hdf5 and also temp files for asd generation
training/
model_050.pt
model_stage_0.pt
model_latest.pt
history.txt
# etc...
outdir_GW150914/
# dingo_pipe output
Step 1 Generating a Waveform Dataset
Again the first step is to generate the necessary folders
cd npe_model
mkdir training_data
mkdir training
As before we run dingo_generate_dataset:
dingo_generate_dataset --settings waveform_dataset_settings.yaml --out_file training_data/waveform_dataset.hdf5
The waveform_dataset_settings.yaml settings file now includes a new attribute compression.
This creates a truncated singular value decomposition (SVD) of the waveform polarizations which
is stored on disk as a compressed representation of the dataset. The size attribute
refers to the number of basis vectors included in the expansion of the waveform. This can later be
changed during training. When the compression phase is finished, the log will
display the mismatch between the decompressed waveform and generated waveform. You can
also access these mismatch settings by running dingo_ls on a generated waveform_dataset.hdf5
file. It will show multiple mismatches corresponding to the number of basis vectors used
to decompress the waveform. It is up to the user as to what type of mismatch is acceptable,
typically a maximum mismatch of \(10^{-3}-10^{-4}\) is recommended.
We could also generate the waveform dataset using a condor DAG on a cluster. To do this run
dingo_generate_dataset_dag --settings_file waveform_dataset_settings.yaml --out_file training_data/waveform_dataset.hdf5 --env_path $DINGO_VENV_PATH --num_jobs 4 --request_cpus 64 --request_memory 128000 --request_memory_high 256000
and then submit the generated DAG
condor_submit_dag condor/submit/dingo_generate_dataset_dagman_DATE.submit
where DATE is specified in the filename of the .submit file that was generated.
Step 2 Generating an ASD dataset
To generate an ASD dataset we can run the same command as in the previous tutorial.
dingo_generate_asd_dataset --settings_file asd_dataset_settings_fiducial.yaml --data_dir training_data/asd_dataset_fiducial --out_name training_data/asd_dataset_fiducial/asds_O1_fiducial.hdf5
However, this time, during training we will need two sets of ASDs. The first one will be fixed during the initial training – this is the fiducial dataset generated above. This dataset will contain only a single ASD. The second ASDDataset will contain many ASDs and is used during the fine tuning stage. The reason to use just one ASD during the first stage is to allow the network to train in an easier inference setting. It should learn how to infer parameters in the presence of that one ASD. However, during inference the ASD will be variable. Thus, in the second stage many ASDs are used so that dingo learns the distribution of ASDs from the observing run. We find this split leads to an improvement in overall performance. To generate this second dataset run
dingo_generate_asd_dataset --settings_file asd_dataset_settings.yaml --data_dir training_data/asd_dataset --out_name training_data/asd_dataset/asds_O1.hdf5
We can see that in asd_dataset_settings.yaml the num_psds_max
attribute is set to 0 indicating that all possible ASDs will be downloaded. If you want to
decrease this, make sure that there are enough ASDs in the training set to represent
any possible data the dingo network will see. Typically this should be at least 1000,
but of course more is better.
Step 3 Training the network
Now we are ready for training. The command is analogous to the previous tutorial but the settings are increased to production values. To run the training do
dingo_train --settings_file train_settings.yaml --train_dir training
Tip
If running on a machine with multiple GPUs make sure to specify the GPU by running
export CUDA_VISIBILE_DEVICES=GPU_NUM before running dingo_train
The main difference from the toy example in the network architecture is the size of the embedding
network which is described in model.embedding_net_kwargs.hidden_dims and the
number of neural spline flow transforms described in
model.nsf_kwargs.num_flow_steps. These increase the depth of the network and the
number/size of the layers in the embedding network.
Notice, we are not inferring the phase parameter here as it is not listed below inference_parameters. However,
we do recover the phase in post processing. To see why and how this is done see synthetic phase
Also notice there are now two training stages stage_0 and stage_1. In stage_0, a fixed ASD is used and the reduced basis layer
is frozen. Then in stage_1, all ASDs are used and the reduced basis layer is unfrozen.
The main difference in the local settings is that the device is set to CUDA.
Important
It is recommended to have at least 40 GB of GPU memory on the device. If there is not enough memory on the machine,
first try halving the batch_size. In this case one should also multiply the learning rate, lr, by \(\frac{1}{\sqrt{2}}\). If
there is still not enough memory, consider reducing the number of hidden dimensions.
Step 4 Doing Inference
We can run inference with the same command as before
dingo_pipe GW150914.ini
There is just one difference from the previous example. It is possible to reweight the posterior to a new prior.
Note though, that the new prior must be a subset of the previous prior. Otherwise, the proposal distribution
generated by dingo will include regions from the new prior where the network has not been trained which will
result in a low effective sample size and lead to poor results. As an example see the prior-dict attribute in
GW150914.ini.