dingo.gw.dataset package

Submodules

dingo.gw.dataset.evaluate_multibanded_domain module

dingo.gw.dataset.evaluate_multibanded_domain.main() None
dingo.gw.dataset.evaluate_multibanded_domain.parse_args()

dingo.gw.dataset.generate_dataset module

dingo.gw.dataset.generate_dataset.generate_dataset(settings: Dict, num_processes: int) WaveformDataset

Generate a waveform dataset.

Parameters:
  • settings (dict) – Dictionary of settings to configure the dataset

  • num_processes (int)

Return type:

A WaveformDataset based on the settings.

dingo.gw.dataset.generate_dataset.generate_parameters_and_polarizations(waveform_generator: WaveformGenerator, prior: BBHPriorDict, num_samples: int, num_processes: int) Tuple[DataFrame, Dict[str, ndarray]]

Generate a dataset of waveforms based on parameters drawn from the prior.

Parameters:
  • waveform_generator (WaveformGenerator)

  • prior (Prior)

  • num_samples (int)

  • num_processes (int)

Returns:

  • pandas DataFrame of parameters

  • dictionary of numpy arrays corresponding to waveform polarizations

dingo.gw.dataset.generate_dataset.main() None
dingo.gw.dataset.generate_dataset.parse_args()
dingo.gw.dataset.generate_dataset.train_svd_basis(dataset: WaveformDataset, size: int, n_train: int)

Train (and optionally validate) an SVD basis.

Parameters:
  • dataset (WaveformDataset) – Contains waveforms to be used for building SVD.

  • size (int) – Number of elements to keep for the SVD basis.

  • n_train (int) – Number of training waveforms to use. Remaining are used for validation. Note that the actual number of training waveforms is n_train * len(polarizations), since there is one waveform used for each polarization.

Returns:

Since EOB waveforms can fail to generate, provide also the number used in training and validation.

Return type:

SVDBasis, n_train, n_test

dingo.gw.dataset.generate_dataset_dag module

dingo.gw.dataset.generate_dataset_dag.configure_runs(settings, num_jobs, temp_dir)

Prepare and save settings .yaml files for generating subsets of the dataset. Generally this will produce two .yaml files, one for generating the main dataset, one for the SVD training.

Parameters:
  • settings (dict) – Settings for full dataset configuration.

  • num_jobs (int) – Number of jobs over which to split the run.

  • temp_dir (str) – Name of (temporary) directory in which to place temporary output files.

dingo.gw.dataset.generate_dataset_dag.create_args_string(args_dict: Dict)

Generate argument string from dictionary of argument names and arguments.

dingo.gw.dataset.generate_dataset_dag.create_dag(args, settings)

Create a Condor DAG from command line arguments to carry out the five steps in the workflow.

dingo.gw.dataset.generate_dataset_dag.main()
dingo.gw.dataset.generate_dataset_dag.modulus_check(a: int, b: int, a_label: str, b_label: str)

Raise error if a % b != 0.

dingo.gw.dataset.generate_dataset_dag.parse_args()

dingo.gw.dataset.utils module

dingo.gw.dataset.utils.build_svd_cli()

Command-line function to build an SVD based on an uncompressed dataset file.

dingo.gw.dataset.utils.merge_datasets(dataset_list: List[WaveformDataset]) WaveformDataset

Merge a collection of datasets into one.

Parameters:

dataset_list (list[WaveformDataset]) – A list of WaveformDatasets. Each item should be a dictionary containing parameters and polarizations.

Return type:

WaveformDataset containing the merged data.

dingo.gw.dataset.utils.merge_datasets_cli()

Command-line function to combine a collection of datasets into one. Used for parallelized waveform generation.

dingo.gw.dataset.waveform_dataset module

class dingo.gw.dataset.waveform_dataset.WaveformDataset(file_name: str | None = None, dictionary: dict | None = None, transform=None, precision: str | None = None, domain_update: dict | None = None, svd_size_update: int | None = None, leave_waveforms_on_disk: bool | None = False)

Bases: DingoDataset, Dataset

This class stores a dataset of waveforms (polarizations) and corresponding parameters.

It can load the dataset either from an HDF5 file or suitable dictionary.

It is possible to either load the entire dataset into memory or to load the dataset during training (leave_waveforms_on_disk=True) to reduce the memory footprint. At the moment, it is only possible to load the waveforms on-demand since the standardization dict for all parameters in the dataset has to be computed at the beginning of training.

The waveform data is consumed through a __getitem__() or __getitems__() call which optionally loads the polarizations and applies a chain of transformations, which are classes that implement a __call__() method.

For constructing, provide either file_name, or dictionary containing data and settings entries, or neither.

Parameters:
  • file_name (str) – HDF5 file containing a dataset

  • dictionary (dict) – Contains settings and data entries. The dictionary keys should be ‘settings’, ‘parameters’, and ‘polarizations’.

  • transform (Transform) – Transform to be applied to dataset samples when accessed through __getitem__

  • precision (str ('single', 'double')) – If provided, changes precision of loaded dataset.

  • domain_update (dict) – If provided, update domain from existing domain using new settings.

  • svd_size_update (int) – If provided, reduces the SVD size when decompressing (for speed).

  • leave_waveforms_on_disk (bool) – If True, the values for the waveforms are not loaded into RAM when initializing the waveform dataset. Instead, they are loaded lazily in __getitem__().

property complex_type
dataset_type = 'waveform_dataset'
initialize_decompression(svd_size_update: int | None = None)

Sets up decompression transforms. These are applied to the raw dataset before self.transform. E.g., SVD decompression.

Parameters:

svd_size_update (int) – If provided, reduces the SVD size when decompressing (for speed).

load_supplemental(domain_update: dict | None = None, svd_size_update: int | None = None)

Method called immediately after loading a dataset.

Creates (and possibly updates) domain, updates dtypes, and initializes any decompression transform. Also zeros data below f_min, and truncates above f_max.

Parameters:
  • domain_update (dict) – If provided, update domain from existing domain using new settings.

  • svd_size_update (int) – If provided, reduces the SVD size when decompressing (for speed).

parameter_mean_std()
property real_type
update_domain(domain_update: dict | None = None)

Update the domain based on new configuration.

The waveform dataset provides waveform polarizations in a particular domain. In Frequency domain, this is [0, domain._f_max]. Furthermore, data is set to 0 below domain._f_min. In practice one may want to train a network based on slightly different domain settings, which corresponds to truncating the likelihood integral.

This method provides functionality for that. It truncates and/or zeroes the dataset to the range specified by the domain, by calling domain.update_data.

Parameters:

domain_update (dict) – Settings dictionary. Must contain a subset of the keys contained in domain_dict.

Module contents