dingo.gw.training package

Submodules

dingo.gw.training.train_builders module

dingo.gw.training.train_builders.build_dataset(data_settings: dict, leave_waveforms_on_disk: bool | None = False) WaveformDataset

Build a dataset based on a settings dictionary. This should contain the path of a saved waveform dataset.

This function also truncates the dataset as necessary.

Parameters:
  • data_settings (dict)

  • leave_waveforms_on_disk (bool) – If provided, the values associated with the waveforms will not be loaded into memory during initialization. Instead, they will be loaded from disk when the dataset is accessed. This is useful for reducing the memory load of large datasets, but can slow down data preprocessing.

Return type:

WaveformDataset

dingo.gw.training.train_builders.build_svd_for_embedding_network(wfd: WaveformDataset, data_settings: dict, asd_dataset_path: str, size: int, num_training_samples: int, num_validation_samples: int, num_workers: int = 0, batch_size: int = 1000, out_dir: str | None = None) List

Construct SVD matrices V based on clean waveforms in each interferometer. These will be used to seed the weights of the initial projection part of the embedding network.

It first generates a number of training waveforms, and then produces the SVD.

Parameters:
  • wfd (WaveformDataset)

  • data_settings (dict)

  • asd_dataset_path (str) – Training waveforms will be whitened with respect to these ASDs.

  • size (int) – Number of basis elements to include in the SVD projection.

  • num_training_samples (int)

  • num_validation_samples (int)

  • num_workers (int)

  • batch_size (int)

  • out_dir (str) – SVD performance diagnostics are saved here.

Returns:

The V matrices for each interferometer. They are ordered as in data_settings[ ‘detectors’].

Return type:

list of numpy arrays

dingo.gw.training.train_builders.set_train_transforms(wfd, data_settings, asd_dataset_path, omit_transforms=None)

Set the transform attribute of a waveform dataset based on a settings dictionary. The transform takes waveform polarizations, samples random extrinsic parameters, projects to detectors, adds noise, and formats the data for input to the neural network. It also implements optional GNPE transformations.

Note that the WaveformDataset is modified in-place, so this function returns nothing.

Parameters:
  • wfd (WaveformDataset)

  • data_settings (dict)

  • asd_dataset_path (str) – Path corresponding to the ASD dataset used to generate noise.

  • omit_transforms – List of sub-transforms to omit from the full composition.

dingo.gw.training.train_pipeline module

dingo.gw.training.train_pipeline.copy_files_to_local(file_path: str, local_dir: str | None, leave_keys_on_disk: bool, is_condor: bool = False) str

Copy files to local node if local_dir is provided to minimize network traffic during training.

Parameters:
  • file_path (str) – Path to file that should be copied.

  • local_dir (Optional[str]) – Directory where file should be copied. If None, file will not be copied.

  • leave_keys_on_disk (bool) – Whether to leave keys on disk and load them during training. If dataset is not copied and leave_keys_on_disk is True, a warning will be raised.

  • is_condor (bool) – Whether this is a condor job.

Returns:

local_file_path – Modified file path if file was copied to local node, else the original file path.

Return type:

str

dingo.gw.training.train_pipeline.initialize_stage(pm: BasePosteriorModel, wfd: WaveformDataset, stage: dict, num_workers: int, resume: bool = False)
Initializes training based on PosteriorModel metadata and current stage:
  • Builds transforms (based on noise settings for current stage);

  • Builds DataLoaders;

  • At the beginning of a stage (i.e., if not resuming mid-stage), initializes

a new optimizer and scheduler; * Freezes / unfreezes SVD layer of embedding network

Parameters:
  • pm (BasePosteriorModel)

  • wfd (WaveformDataset)

  • stage (dict) – Settings specific to current stage of training

  • num_workers (int)

  • resume (bool) – Whether training is resuming mid-stage. This controls whether the optimizer and scheduler should be re-initialized based on contents of stage dict.

Return type:

(train_loader, test_loader)

dingo.gw.training.train_pipeline.parse_args()
dingo.gw.training.train_pipeline.prepare_training_new(train_settings: dict, train_dir: str, local_settings: dict) Tuple[BasePosteriorModel, WaveformDataset]

Based on a settings dictionary, initialize a WaveformDataset and PosteriorModel.

For model type ‘nsf+embedding’ (the only acceptable type at this point) this also initializes the embedding network projection stage with SVD V matrices based on clean detector waveforms.

Parameters:
  • train_settings (dict) – Settings which ultimately come from train_settings.yaml file.

  • train_dir (str) – This is only used to save diagnostics from the SVD.

  • local_settings (dict) – Local settings (e.g., num_workers, device)

Return type:

(BasePosteriorModel, WaveformDataset)

dingo.gw.training.train_pipeline.prepare_training_resume(checkpoint_name: str, local_settings: dict, train_dir: str) Tuple[BasePosteriorModel, WaveformDataset]

Loads a PosteriorModel from a checkpoint, as well as the corresponding WaveformDataset, in order to continue training. It initializes the saved optimizer and scheduler from the checkpoint.

Parameters:
  • checkpoint_name (str) – File name containing the checkpoint (.pt format).

  • local_settings (dict) – Local settings (e.g., num_workers, device)

  • train_dir (str) – Path to training directory where the wandb info is saved.

Return type:

(BasePosteriorModel, WaveformDataset)

dingo.gw.training.train_pipeline.train_local()
dingo.gw.training.train_pipeline.train_stages(pm: BasePosteriorModel, wfd: WaveformDataset, train_dir: str, local_settings: dict) bool

Train the network, iterating through the sequence of stages. Stages can change certain settings such as the noise characteristics, optimizer, and scheduler settings.

Parameters:
Returns:

True if all stages are complete False otherwise

Return type:

bool

dingo.gw.training.train_pipeline_condor module

dingo.gw.training.train_pipeline_condor.copy_logfiles(log_dir, epoch, name='info', suffixes=('.err', '.log', '.out'))
dingo.gw.training.train_pipeline_condor.copyfile(src, dst)
dingo.gw.training.train_pipeline_condor.create_submission_file(train_dir: str, condor_settings: dict, filename: str = 'submission_file.sub')

Creates submission file and writes it to filename.

Parameters:
  • train_dir (str) – Path to training directory

  • condor_settings (dict) – Condor settings

  • filename (str) – Filename of submission file

dingo.gw.training.train_pipeline_condor.train_condor()

dingo.gw.training.utils module

dingo.gw.training.utils.append_stage()

Module contents