dingo.core.posterior_models package

Submodules

dingo.core.posterior_models.base_model module

This module contains the abstract base class for representing posterior models, as well as functions for training and testing across an epoch.

class dingo.core.posterior_models.base_model.BasePosteriorModel(model_filename: str | None = None, metadata: dict | None = None, initial_weights: dict | None = None, device: str = 'cuda', load_training_info: bool = True)

Bases: ABC

Abstract base class for PosteriorModels. This is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training.

Subclasses must implement methods for constructing the specific network, sampling, density evaluation, and computing the loss during training.

Initialize a model for the posterior distribution.

Parameters:
  • model_filename (str) – If given, loads data from the given file.

  • metadata (dict) – If given, initializes the model from these settings

  • initial_weights (dict) – Initial weights for the model

  • device (str)

  • load_training_info (bool)

abstract initialize_network()

Initialize the network backbone for the posterior model.

initialize_optimizer_and_scheduler()

Initializes the optimizer and scheduler with self.optimizer_kwargs and self.scheduler_kwargs, respectively.

load_model(model_filename: str, load_training_info: bool = True, device: str = 'cuda')

Load a posterior model from the disk.

Parameters:
  • model_filename (str) – path to saved model

  • load_training_info (bool #TODO: load information for training) – specifies whether information required to proceed with training is loaded, e.g. optimizer state dict

  • device (str)

abstract log_prob(theta: Tensor, *context: Tensor)

Evaluate the log posterior density,

log p(theta | context)

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.

Returns:

log_prob – Shape (B,)

Return type:

torch.Tensor

abstract loss(theta: Tensor, *context: Tensor)

Compute the loss for a batch of data.

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.

Returns:

loss – Mean loss across the batch (a scalar).

Return type:

torch.Tensor

network_to_device(device)

Put model to device, and set self.device accordingly.

abstract sample(*context: Tensor, num_samples: int = 1)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples – Shape (B, num_samples, dim(theta))

Return type:

torch.Tensor

abstract sample_and_log_prob(*context: Tensor, num_samples: int = 1)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

and also return the log_prob. For models such as normalizing flows, it is more economical to calculate the log_prob at the same time as sampling, rather than as a separate step.

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)

Return type:

torch.Tensor, torch.Tensor

save_model(model_filename: str, save_training_info: bool = True)

Save the posterior model to the disk.

Parameters:
  • model_filename (str) – filename for saving the model

  • save_training_info (bool) – specifies whether information required to proceed with training is saved, e.g. optimizer state dict

train(train_loader: DataLoader, test_loader: DataLoader, train_dir: str, runtime_limits: object | None = None, checkpoint_epochs: int | None = None, use_wandb=False, test_only=False, early_stopping: EarlyStopping | None = None)
Parameters:
  • train_loader

  • test_loader

  • train_dir

  • runtime_limits

  • checkpoint_epochs

  • use_wandb

  • test_only (bool = False) – if True, training is skipped

  • early_stopping (EarlyStopping) – Optional EarlyStopping instance.

dingo.core.posterior_models.base_model.test_epoch(pm, dataloader)
dingo.core.posterior_models.base_model.train_epoch(pm, dataloader)

dingo.core.posterior_models.build_model module

dingo.core.posterior_models.build_model.autocomplete_model_kwargs(model_kwargs: dict, data_sample: list)

Autocomplete the model kwargs from train_settings and data_sample from the dataloader:

  • set input dimension of embedding net to shape of data_sample[1]

  • set dimension of parameter space to len(data_sample[0])

  • set added_context flag of embedding net if required for gnpe proxies

  • set context dim of posterior model to output dim of embedding net + gnpe proxy dim

Parameters:
  • model_kwargs (dict) – Model settings, which are modified in-place.

  • data_sample (list) – Sample from dataloader (e.g., wfd[0]) used for autocomplection. Should be of format [parameters, GW data, gnpe_proxies], where the last element is only there is GNPE proxies are required.

dingo.core.posterior_models.build_model.build_model_from_kwargs(filename: str | None = None, settings: dict | None = None, **kwargs) BasePosteriorModel

Returns a PosteriorModel based on a saved network or settings dict.

The function is careful to choose the appropriate PosteriorModel class (e.g., for a normalizing flow, flow matching, or score matching).

Parameters:
  • filename (str) – Path to a saved network (.pt).

  • settings (dict) – Settings dictionary.

  • kwargs – Arguments forwarded to the model constructor.

Return type:

PosteriorModel

dingo.core.posterior_models.cflow_base module

class dingo.core.posterior_models.cflow_base.ContinuousFlowPosteriorModel(**kwargs)

Bases: BasePosteriorModel

Class for posterior models based on continuous normalizing flows (CNFs).

CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE

d/dt f(theta, t) = v(f(theta, t), t).

The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.

Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.

t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample

Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.

This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.

Initialize a model for the posterior distribution.

Parameters:
  • model_filename (str) – If given, loads data from the given file.

  • metadata (dict) – If given, initializes the model from these settings

  • initial_weights (dict) – Initial weights for the model

  • device (str)

  • load_training_info (bool)

abstract evaluate_vector_field(t, theta_t, *context_data)

Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE

d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).

Parameters:
  • t (float) – time (noise level)

  • theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t

  • *context_data (list[torch.tensor]) – list with context data (GW data)

initialize_network()

Initialize the network backbone for the posterior model.

property integration_range

Integration range for ODE. We integrate in the range [0, 1-self.eps]. For score matching, self.eps > 0 is required for stability. For flow matching we can have self.eps = 0.

log_prob(theta: Tensor, *context: Tensor, hutchinson=False)

Evaluate the log posterior density,

log p(theta | context)

For this we solve an ODE backwards in time until we reach the initial pure noise distribution.

There are two contributions, the log_prob of theta_0 (which is uniquely determined by theta) under the base distribution, and the integrated divergence of the vector field.

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.

  • hutchinson

Returns:

log_prob – Shape (B,)

Return type:

torch.Tensor

rhs_of_joint_ode(t, theta_and_div_t, *context_data, hutchinson=False)

Returns the right hand side of the neural ODE that is used to evaluate the log_prob of theta samples. This is a joint ODE over the vector field and the divergence. By integrating this ODE, one can simultaneously trace the parameter sample theta_t and integrate the divergence contribution to the log_prob, see e.g., https://arxiv.org/abs/1806.07366 or Appendix C in https://arxiv.org/abs/2210.02747.

Parameters:
  • t (float) – time (noise level)

  • theta_and_div_t (torch.Tensor) – concatenated tensor of (theta_t, div). theta_t: noisy parameters, perturbed with noise level t

  • *context_data (list[torch.tensor]) – list with context data (GW data)

Returns:

vector field that generates the flow and its divergence (required for likelihood evaluation).

Return type:

torch.Tensor

sample(*context: Tensor, num_samples: int | None = None)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

by solving an ODE forward in time.

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples – Shape (B, num_samples, dim(theta))

Return type:

torch.Tensor

sample_and_log_prob(*context: Tensor, num_samples: int | None = None)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

and also return the log_prob. This is more efficient than calling sample_batch and log_prob_batch separately.

If d/dt [phi(t), f(t)] = rhs joint with initial conditions [theta_0, log p(theta_0)], where theta_0 ~ p_0(theta_0), then [phi(1), f(1)] = [theta_1, log p(theta_0) + log p_1(theta_1) - log p(theta_0)] = [theta_1, log p_1(theta_1)].

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)

Return type:

torch.Tensor, torch.Tensor

sample_t(batch_size)
sample_theta_0(batch_size)

Sample theta_0 from the gaussian prior.

dingo.core.posterior_models.cflow_base.compute_divergence(y, x)
dingo.core.posterior_models.cflow_base.compute_hutchinson_divergence(y, x)
dingo.core.posterior_models.cflow_base.compute_log_prior(theta_0)
dingo.core.posterior_models.cflow_base.norm_without_divergence_component(y)

dingo.core.posterior_models.flow_matching module

class dingo.core.posterior_models.flow_matching.FlowMatchingPosteriorModel(**kwargs)

Bases: ContinuousFlowPosteriorModel

Class for posterior models based on continuous normalizing flows (CNFs).

CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE

d/dt f(theta, t) = v(f(theta, t), t).

The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.

Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.

t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample

Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.

This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.

For flow matching, the vector field represents the velocity vector field for a particle trajectory. Training proceeds as follows:

t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample

eps = 0 c0 = (1 - (1 - sigma_min) * t) c1 = t

v_target = theta_1 - (1 - sigma_min) * theta_0 loss = || v_target - network(theta_t, t) ||

Initialize a model for the posterior distribution.

Parameters:
  • model_filename (str) – If given, loads data from the given file.

  • metadata (dict) – If given, initializes the model from these settings

  • initial_weights (dict) – Initial weights for the model

  • device (str)

  • load_training_info (bool)

evaluate_vector_field(t, theta_t, *context_data)

Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE

d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).

For flow matching, the vector field is regressed directly during training.

Parameters:
  • t (float) – time (noise level)

  • theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t

  • *context_data (list[torch.tensor]) – list with context data (GW data)

loss(theta, *context)

Calculates loss as the mean squared error between the predicted vector field and the vector field for transporting the parameter data to samples from the prior.

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.

Returns:

loss – Mean loss across the batch (a scalar).

Return type:

torch.Tensor

dingo.core.posterior_models.flow_matching.ot_conditional_flow(x_0, x_1, t, sigma_min)

dingo.core.posterior_models.normalizing_flow module

class dingo.core.posterior_models.normalizing_flow.NormalizingFlowPosteriorModel(**kwargs)

Bases: BasePosteriorModel

Posterior model based on a (discrete) normalizing flow.

A normalizing flow describes a distribution as a sequence of discrete transformations on a parameter space, ultimately taking samples from the base space (multivariate standard normal) to the desired distribution. The discrete transforms are parametrized functions (e.g., splines), which are designed to be invertible with simple Jacobian determinant. The probability density is given by the change of variables rule,

q(theta | d) = pi(f_d^{-1}(theta)) | det J_{f_d^{-1}} |

where

pi = N(0,1)^D is the base space distribution f_d is the normalizing flow on the D-dimensional space

The flow f_d is allowed to depend on context information d, which would be observational data in the case of posterior estimation. By construction, the flow has fast sampling and density evaluation, require just forward passes of the network.

This class uses normalizing flows from the dingo.core.nn.nsf module (which in turn uses glasflow, which is based on nflows). It is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also calls the sampling and density evaluation routines from the flows.

Initialize a model for the posterior distribution.

Parameters:
  • model_filename (str) – If given, loads data from the given file.

  • metadata (dict) – If given, initializes the model from these settings

  • initial_weights (dict) – Initial weights for the model

  • device (str)

  • load_training_info (bool)

initialize_network()

Initialize the network backbone for the posterior model.

log_prob(theta, *context)

Evaluate the log posterior density,

log p(theta | context)

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.

Returns:

log_prob – Shape (B,)

Return type:

torch.Tensor

loss(theta, *context)

Compute the loss for a batch of data.

Parameters:
  • theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).

  • context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.

Returns:

loss – Mean loss across the batch (a scalar).

Return type:

torch.Tensor

sample(*context, num_samples: int = 1)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples – Shape (B, num_samples, dim(theta))

Return type:

torch.Tensor

sample_and_log_prob(*context, num_samples: int = 1)

Sample parameters theta from the posterior model,

theta ~ p(theta | context)

and also return the log_prob. For models such as normalizing flows, it is more economical to calculate the log_prob at the same time as sampling, rather than as a separate step.

Parameters:
  • context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).

  • num_samples (int = 1) – Number of samples to generate.

Returns:

samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)

Return type:

torch.Tensor, torch.Tensor

dingo.core.posterior_models.score_matching module

class dingo.core.posterior_models.score_matching.ScoreDiffusionPosteriorModel(**kwargs)

Bases: ContinuousFlowPosteriorModel

Class for posterior models based on continuous normalizing flows (CNFs).

CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE

d/dt f(theta, t) = v(f(theta, t), t).

The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.

Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.

t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample

Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.

This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.

Training with score matching:

t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample

eps > 0 c0 = sigma(t) c1 = alpha(1-t)

score_target = theta_0 / sigma_t weight = 1/2 * {score-matching: sigma(t)^2, score-flow: beta(1-t), …} loss = || score_target - network(theta_t, t) ||

To specify the score matching model, “posterior_kwargs” should additionally specify the noise properties used for the diffusion ( beta_min, beta_max, epsilon).

Initialize a model for the posterior distribution.

Parameters:
  • model_filename (str) – If given, loads data from the given file.

  • metadata (dict) – If given, initializes the model from these settings

  • initial_weights (dict) – Initial weights for the model

  • device (str)

  • load_training_info (bool)

alpha(t)
beta(t)
evaluate_vector_field(t, theta_t, *context_data)

Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE

d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).

For score matching, the vector field (or drift function) is computed from the predicted score.

Parameters:
  • t (float) – time (noise level)

  • theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t

  • *context_data (list[torch.tensor]) – list with context data (GW data)

get_likelihood_weighting(weighting)
get_t_theta_t_score(theta_1)
loss(theta, *context_data)

Returns the score matching loss for parameters theta conditioned on context.

Parameters:
  • theta (torch.tensor) – parameters (e.g., binary-black hole parameters)

  • *context_data (list[torch.Tensor]) – context data (e.g., gravitational-wave data)

Returns:

Loss.

Return type:

torch.tensor

mu(t, x_1)
sigma(t)

Module contents