dingo.core.posterior_models package
Submodules
dingo.core.posterior_models.base_model module
This module contains the abstract base class for representing posterior models, as well as functions for training and testing across an epoch.
- class dingo.core.posterior_models.base_model.BasePosteriorModel(model_filename: str | None = None, metadata: dict | None = None, initial_weights: dict | None = None, device: str = 'cuda', load_training_info: bool = True)
Bases:
ABCAbstract base class for PosteriorModels. This is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training.
Subclasses must implement methods for constructing the specific network, sampling, density evaluation, and computing the loss during training.
Initialize a model for the posterior distribution.
- Parameters:
model_filename (str) – If given, loads data from the given file.
metadata (dict) – If given, initializes the model from these settings
initial_weights (dict) – Initial weights for the model
device (str)
load_training_info (bool)
- abstract initialize_network()
Initialize the network backbone for the posterior model.
- initialize_optimizer_and_scheduler()
Initializes the optimizer and scheduler with self.optimizer_kwargs and self.scheduler_kwargs, respectively.
- load_model(model_filename: str, load_training_info: bool = True, device: str = 'cuda')
Load a posterior model from the disk.
- Parameters:
model_filename (str) – path to saved model
load_training_info (bool #TODO: load information for training) – specifies whether information required to proceed with training is loaded, e.g. optimizer state dict
device (str)
- abstract log_prob(theta: Tensor, *context: Tensor)
Evaluate the log posterior density,
log p(theta | context)
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.
- Returns:
log_prob – Shape (B,)
- Return type:
torch.Tensor
- abstract loss(theta: Tensor, *context: Tensor)
Compute the loss for a batch of data.
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.
- Returns:
loss – Mean loss across the batch (a scalar).
- Return type:
torch.Tensor
- network_to_device(device)
Put model to device, and set self.device accordingly.
- abstract sample(*context: Tensor, num_samples: int = 1)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples – Shape (B, num_samples, dim(theta))
- Return type:
torch.Tensor
- abstract sample_and_log_prob(*context: Tensor, num_samples: int = 1)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
and also return the log_prob. For models such as normalizing flows, it is more economical to calculate the log_prob at the same time as sampling, rather than as a separate step.
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)
- Return type:
torch.Tensor, torch.Tensor
- save_model(model_filename: str, save_training_info: bool = True)
Save the posterior model to the disk.
- Parameters:
model_filename (str) – filename for saving the model
save_training_info (bool) – specifies whether information required to proceed with training is saved, e.g. optimizer state dict
- train(train_loader: DataLoader, test_loader: DataLoader, train_dir: str, runtime_limits: object | None = None, checkpoint_epochs: int | None = None, use_wandb=False, test_only=False, early_stopping: EarlyStopping | None = None)
- Parameters:
train_loader
test_loader
train_dir
runtime_limits
checkpoint_epochs
use_wandb
test_only (bool = False) – if True, training is skipped
early_stopping (EarlyStopping) – Optional EarlyStopping instance.
- dingo.core.posterior_models.base_model.test_epoch(pm, dataloader)
- dingo.core.posterior_models.base_model.train_epoch(pm, dataloader)
dingo.core.posterior_models.build_model module
- dingo.core.posterior_models.build_model.autocomplete_model_kwargs(model_kwargs: dict, data_sample: list)
Autocomplete the model kwargs from train_settings and data_sample from the dataloader:
set input dimension of embedding net to shape of data_sample[1]
set dimension of parameter space to len(data_sample[0])
set added_context flag of embedding net if required for gnpe proxies
set context dim of posterior model to output dim of embedding net + gnpe proxy dim
- Parameters:
model_kwargs (dict) – Model settings, which are modified in-place.
data_sample (list) – Sample from dataloader (e.g., wfd[0]) used for autocomplection. Should be of format [parameters, GW data, gnpe_proxies], where the last element is only there is GNPE proxies are required.
- dingo.core.posterior_models.build_model.build_model_from_kwargs(filename: str | None = None, settings: dict | None = None, **kwargs) BasePosteriorModel
Returns a PosteriorModel based on a saved network or settings dict.
The function is careful to choose the appropriate PosteriorModel class (e.g., for a normalizing flow, flow matching, or score matching).
- Parameters:
filename (str) – Path to a saved network (.pt).
settings (dict) – Settings dictionary.
kwargs – Arguments forwarded to the model constructor.
- Return type:
PosteriorModel
dingo.core.posterior_models.cflow_base module
- class dingo.core.posterior_models.cflow_base.ContinuousFlowPosteriorModel(**kwargs)
Bases:
BasePosteriorModelClass for posterior models based on continuous normalizing flows (CNFs).
CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE
d/dt f(theta, t) = v(f(theta, t), t).
The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.
Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.
t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample
Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.
This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.
Initialize a model for the posterior distribution.
- Parameters:
model_filename (str) – If given, loads data from the given file.
metadata (dict) – If given, initializes the model from these settings
initial_weights (dict) – Initial weights for the model
device (str)
load_training_info (bool)
- abstract evaluate_vector_field(t, theta_t, *context_data)
Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE
d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).
- Parameters:
t (float) – time (noise level)
theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t
*context_data (list[torch.tensor]) – list with context data (GW data)
- initialize_network()
Initialize the network backbone for the posterior model.
- property integration_range
Integration range for ODE. We integrate in the range [0, 1-self.eps]. For score matching, self.eps > 0 is required for stability. For flow matching we can have self.eps = 0.
- log_prob(theta: Tensor, *context: Tensor, hutchinson=False)
Evaluate the log posterior density,
log p(theta | context)
For this we solve an ODE backwards in time until we reach the initial pure noise distribution.
There are two contributions, the log_prob of theta_0 (which is uniquely determined by theta) under the base distribution, and the integrated divergence of the vector field.
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.
hutchinson
- Returns:
log_prob – Shape (B,)
- Return type:
torch.Tensor
- rhs_of_joint_ode(t, theta_and_div_t, *context_data, hutchinson=False)
Returns the right hand side of the neural ODE that is used to evaluate the log_prob of theta samples. This is a joint ODE over the vector field and the divergence. By integrating this ODE, one can simultaneously trace the parameter sample theta_t and integrate the divergence contribution to the log_prob, see e.g., https://arxiv.org/abs/1806.07366 or Appendix C in https://arxiv.org/abs/2210.02747.
- Parameters:
t (float) – time (noise level)
theta_and_div_t (torch.Tensor) – concatenated tensor of (theta_t, div). theta_t: noisy parameters, perturbed with noise level t
*context_data (list[torch.tensor]) – list with context data (GW data)
- Returns:
vector field that generates the flow and its divergence (required for likelihood evaluation).
- Return type:
torch.Tensor
- sample(*context: Tensor, num_samples: int | None = None)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
by solving an ODE forward in time.
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples – Shape (B, num_samples, dim(theta))
- Return type:
torch.Tensor
- sample_and_log_prob(*context: Tensor, num_samples: int | None = None)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
and also return the log_prob. This is more efficient than calling sample_batch and log_prob_batch separately.
If d/dt [phi(t), f(t)] = rhs joint with initial conditions [theta_0, log p(theta_0)], where theta_0 ~ p_0(theta_0), then [phi(1), f(1)] = [theta_1, log p(theta_0) + log p_1(theta_1) - log p(theta_0)] = [theta_1, log p_1(theta_1)].
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)
- Return type:
torch.Tensor, torch.Tensor
- sample_t(batch_size)
- sample_theta_0(batch_size)
Sample theta_0 from the gaussian prior.
- dingo.core.posterior_models.cflow_base.compute_divergence(y, x)
- dingo.core.posterior_models.cflow_base.compute_hutchinson_divergence(y, x)
- dingo.core.posterior_models.cflow_base.compute_log_prior(theta_0)
- dingo.core.posterior_models.cflow_base.norm_without_divergence_component(y)
dingo.core.posterior_models.flow_matching module
- class dingo.core.posterior_models.flow_matching.FlowMatchingPosteriorModel(**kwargs)
Bases:
ContinuousFlowPosteriorModelClass for posterior models based on continuous normalizing flows (CNFs).
CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE
d/dt f(theta, t) = v(f(theta, t), t).
The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.
Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.
t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample
Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.
This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.
For flow matching, the vector field represents the velocity vector field for a particle trajectory. Training proceeds as follows:
t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample
eps = 0 c0 = (1 - (1 - sigma_min) * t) c1 = t
v_target = theta_1 - (1 - sigma_min) * theta_0 loss = || v_target - network(theta_t, t) ||
Initialize a model for the posterior distribution.
- Parameters:
model_filename (str) – If given, loads data from the given file.
metadata (dict) – If given, initializes the model from these settings
initial_weights (dict) – Initial weights for the model
device (str)
load_training_info (bool)
- evaluate_vector_field(t, theta_t, *context_data)
Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE
d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).
For flow matching, the vector field is regressed directly during training.
- Parameters:
t (float) – time (noise level)
theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t
*context_data (list[torch.tensor]) – list with context data (GW data)
- loss(theta, *context)
Calculates loss as the mean squared error between the predicted vector field and the vector field for transporting the parameter data to samples from the prior.
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.
- Returns:
loss – Mean loss across the batch (a scalar).
- Return type:
torch.Tensor
- dingo.core.posterior_models.flow_matching.ot_conditional_flow(x_0, x_1, t, sigma_min)
dingo.core.posterior_models.normalizing_flow module
- class dingo.core.posterior_models.normalizing_flow.NormalizingFlowPosteriorModel(**kwargs)
Bases:
BasePosteriorModelPosterior model based on a (discrete) normalizing flow.
A normalizing flow describes a distribution as a sequence of discrete transformations on a parameter space, ultimately taking samples from the base space (multivariate standard normal) to the desired distribution. The discrete transforms are parametrized functions (e.g., splines), which are designed to be invertible with simple Jacobian determinant. The probability density is given by the change of variables rule,
q(theta | d) = pi(f_d^{-1}(theta)) | det J_{f_d^{-1}} |
- where
pi = N(0,1)^D is the base space distribution f_d is the normalizing flow on the D-dimensional space
The flow f_d is allowed to depend on context information d, which would be observational data in the case of posterior estimation. By construction, the flow has fast sampling and density evaluation, require just forward passes of the network.
This class uses normalizing flows from the dingo.core.nn.nsf module (which in turn uses glasflow, which is based on nflows). It is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also calls the sampling and density evaluation routines from the flows.
Initialize a model for the posterior distribution.
- Parameters:
model_filename (str) – If given, loads data from the given file.
metadata (dict) – If given, initializes the model from these settings
initial_weights (dict) – Initial weights for the model
device (str)
load_training_info (bool)
- initialize_network()
Initialize the network backbone for the posterior model.
- log_prob(theta, *context)
Evaluate the log posterior density,
log p(theta | context)
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have context.shape[0] = B.
- Returns:
log_prob – Shape (B,)
- Return type:
torch.Tensor
- loss(theta, *context)
Compute the loss for a batch of data.
- Parameters:
theta (torch.Tensor) – Parameter values at which to evaluate the density. Should have a batch dimension (even if size B = 1).
context (torch.Tensor) – Context information (typically observed data). Must have the same leading (batch) dimension as theta.
- Returns:
loss – Mean loss across the batch (a scalar).
- Return type:
torch.Tensor
- sample(*context, num_samples: int = 1)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples – Shape (B, num_samples, dim(theta))
- Return type:
torch.Tensor
- sample_and_log_prob(*context, num_samples: int = 1)
Sample parameters theta from the posterior model,
theta ~ p(theta | context)
and also return the log_prob. For models such as normalizing flows, it is more economical to calculate the log_prob at the same time as sampling, rather than as a separate step.
- Parameters:
context (torch.Tensor) – Context information (typically observed data). Should have a batch dimension (even if size B = 1).
num_samples (int = 1) – Number of samples to generate.
- Returns:
samples, log_prob – Shapes (B, num_samples, dim(theta)), (B, num_samples)
- Return type:
torch.Tensor, torch.Tensor
dingo.core.posterior_models.score_matching module
- class dingo.core.posterior_models.score_matching.ScoreDiffusionPosteriorModel(**kwargs)
Bases:
ContinuousFlowPosteriorModelClass for posterior models based on continuous normalizing flows (CNFs).
CNFs are parameterized by a vector field v(theta_t, t), that transports a simple base distribution (typically a gaussian N(0,1) with same dimension as theta) at time t=0 to the target distribution at time t=1. This vector field defines the flow via the ODE
d/dt f(theta, t) = v(f(theta, t), t).
The vector field v is parameterized with a neural network. It is impractical to train this neural network (and thereby the CNF) directly with log-likelihood maximization, as solving the full ODE for each training iteration, requires thousands of vector field evaluations.
Several alternative methods have been developed to make training CNFs more efficient. These directly regress on the vector field v (or a scaled version of v, such as the score). It has been shown that this can be done on a per-sample basis by adding noise to the parameters at various scales t. Specifically, a parameter sample theta is transformed as follows.
t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample
Within that framework, one can employ different methods to learn the vector field v, such as flow matching or score matching. These have slightly different coefficients c1(t), c2(t) and training objectives.
This class is intended to construct and hold a neural network for estimating the posterior density, as well as saving / loading, and training. It also has functionality for sampling and density evaluation.
Training with score matching:
t ~ U[0, 1-eps) noise level theta_0 ~ N(0, 1) sampled noise theta_1 = theta pure sample theta_t = c1(t) * theta_1 + c0(t) * theta_0 noisy sample
eps > 0 c0 = sigma(t) c1 = alpha(1-t)
score_target = theta_0 / sigma_t weight = 1/2 * {score-matching: sigma(t)^2, score-flow: beta(1-t), …} loss = || score_target - network(theta_t, t) ||
To specify the score matching model, “posterior_kwargs” should additionally specify the noise properties used for the diffusion ( beta_min, beta_max, epsilon).
Initialize a model for the posterior distribution.
- Parameters:
model_filename (str) – If given, loads data from the given file.
metadata (dict) – If given, initializes the model from these settings
initial_weights (dict) – Initial weights for the model
device (str)
load_training_info (bool)
- alpha(t)
- beta(t)
- evaluate_vector_field(t, theta_t, *context_data)
Evaluate the vector field v(t, theta_t, context_data) that generates the flow via the ODE
d/dt f(theta_t, t, context) = v(f(theta_t, t, context), t, context).
For score matching, the vector field (or drift function) is computed from the predicted score.
- Parameters:
t (float) – time (noise level)
theta_t (torch.Tensor) – noisy parameters, perturbed with noise level t
*context_data (list[torch.tensor]) – list with context data (GW data)
- get_likelihood_weighting(weighting)
- get_t_theta_t_score(theta_1)
- loss(theta, *context_data)
Returns the score matching loss for parameters theta conditioned on context.
- Parameters:
theta (torch.tensor) – parameters (e.g., binary-black hole parameters)
*context_data (list[torch.Tensor]) – context data (e.g., gravitational-wave data)
- Returns:
Loss.
- Return type:
torch.tensor
- mu(t, x_1)
- sigma(t)