dingo.gw.training package
Submodules
dingo.gw.training.train_builders module
- dingo.gw.training.train_builders.build_dataset(data_settings: dict, leave_waveforms_on_disk: bool | None = False) WaveformDataset
Build a dataset based on a settings dictionary. This should contain the path of a saved waveform dataset.
This function also truncates the dataset as necessary.
- Parameters:
data_settings (dict)
leave_waveforms_on_disk (bool) – If provided, the values associated with the waveforms will not be loaded into memory during initialization. Instead, they will be loaded from disk when the dataset is accessed. This is useful for reducing the memory load of large datasets, but can slow down data preprocessing.
- Return type:
- dingo.gw.training.train_builders.build_svd_for_embedding_network(wfd: WaveformDataset, data_settings: dict, asd_dataset_path: str, size: int, num_training_samples: int, num_validation_samples: int, num_workers: int = 0, batch_size: int = 1000, out_dir: str | None = None) List
Construct SVD matrices V based on clean waveforms in each interferometer. These will be used to seed the weights of the initial projection part of the embedding network.
It first generates a number of training waveforms, and then produces the SVD.
- Parameters:
wfd (WaveformDataset)
data_settings (dict)
asd_dataset_path (str) – Training waveforms will be whitened with respect to these ASDs.
size (int) – Number of basis elements to include in the SVD projection.
num_training_samples (int)
num_validation_samples (int)
num_workers (int)
batch_size (int)
out_dir (str) – SVD performance diagnostics are saved here.
- Returns:
The V matrices for each interferometer. They are ordered as in data_settings[ ‘detectors’].
- Return type:
list of numpy arrays
- dingo.gw.training.train_builders.set_train_transforms(wfd, data_settings, asd_dataset_path, omit_transforms=None)
Set the transform attribute of a waveform dataset based on a settings dictionary. The transform takes waveform polarizations, samples random extrinsic parameters, projects to detectors, adds noise, and formats the data for input to the neural network. It also implements optional GNPE transformations.
Note that the WaveformDataset is modified in-place, so this function returns nothing.
- Parameters:
wfd (WaveformDataset)
data_settings (dict)
asd_dataset_path (str) – Path corresponding to the ASD dataset used to generate noise.
omit_transforms – List of sub-transforms to omit from the full composition.
dingo.gw.training.train_pipeline module
- dingo.gw.training.train_pipeline.copy_files_to_local(file_path: str, local_dir: str | None, leave_keys_on_disk: bool, is_condor: bool = False) str
Copy files to local node if local_dir is provided to minimize network traffic during training.
- Parameters:
file_path (str) – Path to file that should be copied.
local_dir (Optional[str]) – Directory where file should be copied. If None, file will not be copied.
leave_keys_on_disk (bool) – Whether to leave keys on disk and load them during training. If dataset is not copied and leave_keys_on_disk is True, a warning will be raised.
is_condor (bool) – Whether this is a condor job.
- Returns:
local_file_path – Modified file path if file was copied to local node, else the original file path.
- Return type:
str
- dingo.gw.training.train_pipeline.initialize_stage(pm: BasePosteriorModel, wfd: WaveformDataset, stage: dict, num_workers: int, resume: bool = False)
- Initializes training based on PosteriorModel metadata and current stage:
Builds transforms (based on noise settings for current stage);
Builds DataLoaders;
At the beginning of a stage (i.e., if not resuming mid-stage), initializes
a new optimizer and scheduler; * Freezes / unfreezes SVD layer of embedding network
- Parameters:
pm (BasePosteriorModel)
wfd (WaveformDataset)
stage (dict) – Settings specific to current stage of training
num_workers (int)
resume (bool) – Whether training is resuming mid-stage. This controls whether the optimizer and scheduler should be re-initialized based on contents of stage dict.
- Return type:
(train_loader, test_loader)
- dingo.gw.training.train_pipeline.parse_args()
- dingo.gw.training.train_pipeline.prepare_training_new(train_settings: dict, train_dir: str, local_settings: dict) Tuple[BasePosteriorModel, WaveformDataset]
Based on a settings dictionary, initialize a WaveformDataset and PosteriorModel.
For model type ‘nsf+embedding’ (the only acceptable type at this point) this also initializes the embedding network projection stage with SVD V matrices based on clean detector waveforms.
- Parameters:
train_settings (dict) – Settings which ultimately come from train_settings.yaml file.
train_dir (str) – This is only used to save diagnostics from the SVD.
local_settings (dict) – Local settings (e.g., num_workers, device)
- Return type:
- dingo.gw.training.train_pipeline.prepare_training_resume(checkpoint_name: str, local_settings: dict, train_dir: str) Tuple[BasePosteriorModel, WaveformDataset]
Loads a PosteriorModel from a checkpoint, as well as the corresponding WaveformDataset, in order to continue training. It initializes the saved optimizer and scheduler from the checkpoint.
- Parameters:
checkpoint_name (str) – File name containing the checkpoint (.pt format).
local_settings (dict) – Local settings (e.g., num_workers, device)
train_dir (str) – Path to training directory where the wandb info is saved.
- Return type:
- dingo.gw.training.train_pipeline.train_local()
- dingo.gw.training.train_pipeline.train_stages(pm: BasePosteriorModel, wfd: WaveformDataset, train_dir: str, local_settings: dict) bool
Train the network, iterating through the sequence of stages. Stages can change certain settings such as the noise characteristics, optimizer, and scheduler settings.
- Parameters:
pm (BasePosteriorModel)
wfd (WaveformDataset)
train_dir (str) – Directory for saving checkpoints and train history.
local_settings (dict)
- Returns:
True if all stages are complete False otherwise
- Return type:
bool
dingo.gw.training.train_pipeline_condor module
- dingo.gw.training.train_pipeline_condor.copy_logfiles(log_dir, epoch, name='info', suffixes=('.err', '.log', '.out'))
- dingo.gw.training.train_pipeline_condor.copyfile(src, dst)
- dingo.gw.training.train_pipeline_condor.create_submission_file(train_dir: str, condor_settings: dict, filename: str = 'submission_file.sub')
Creates submission file and writes it to filename.
- Parameters:
train_dir (str) – Path to training directory
condor_settings (dict) – Condor settings
filename (str) – Filename of submission file
- dingo.gw.training.train_pipeline_condor.train_condor()
dingo.gw.training.utils module
- dingo.gw.training.utils.append_stage()