lace.Engine.logp

Engine.logp(values, given=None, *, state_ixs: List[int] | None = None, scaled: bool = False) None | float | Series

Compute the log likelihood.

This function computes log p(values) or log p(values|given).

Parameters:
  • values (polars or pandas DataFrame or Series) – The values over which to compute the log likelihood. Each row of the DataFrame, or each entry of the Series, is an observation. Column names (or the Series name) should correspond to names of features in the table.

  • given (Dict[index, value], optional) – A dictionary mapping column indices/name to values, which specifies conditions on the observations.

  • state_ixs (List[int], optional) – An optional list specifying which states should be used in the likelihood computation. If None (default), use all states.

  • scaled (bool, optional) – If True the components of the likelihoods will be scaled so that each dimension (feature) contributes a likelihood in [0, 1], thus the scaled log likelihood will not be as prone to being dominated by any one feature.

Returns:

The log likelihood for each observation in values

Return type:

polars.Series or float

Notes

  • For missing-not-at-random (MNAR) columns, asking about the likelihood of a values returns the likelihood of just that value; not the likelihood of that value and that value being present. Computing logp of None returns the log likeihood of a value being missing.

  • The scaled variant is a heuristic used for model monitoring.

Examples

Ask about the likelihood of values in a single column

>>> import polars as pl
>>> from lace.examples import Satellites
>>> engine = Satellites()
>>> class_of_orbit = pl.Series("Class_of_Orbit", ["LEO", "MEO", "GEO"])
>>> engine.logp(class_of_orbit).exp()  
shape: (3,)
Series: 'logp' [f64]
[
    0.523575
    0.06601
    0.380453
]

Conditioning using given

>>> engine.logp(
...     class_of_orbit,
...     given={"Period_minutes": 1436.0},
... ).exp()  
shape: (3,)
Series: 'logp' [f64]
[
    0.000349
    0.000756
    0.998017
]

Ask about the likelihood of values belonging to multiple features

>>> values = pl.DataFrame(
...     {
...         "Class_of_Orbit": ["LEO", "MEO", "GEO"],
...         "Period_minutes": [70.0, 320.0, 1440.0],
...     }
... )
>>> engine.logp(values).exp()  
shape: (3,)
Series: 'logp' [f64]
[
    0.000306
    0.000008
    0.016546
]

An example of the scaled variant:

>>> engine.logp(
...     values,
...     scaled=True,
... ).exp()  
shape: (3,)
Series: 'logp_scaled' [f64]
[
    0.137554
    0.167357
    0.577699
]

For columns which we explicitly model missing-not-at-random data, we can ask about the likelihood of missing values.

>>> from math import exp
>>> no_long_geo = pl.Series("longitude_radians_of_geo", [None])
>>> exp(engine.logp(no_long_geo))
0.631030460838865

The probability of a value missing (not-at-random) changes depending on the conditions.

>>> exp(engine.logp(no_long_geo, given={"Class_of_Orbit": "GEO"}))
0.048855132811982976

And we can condition on missingness

>>> engine.logp(
...     class_of_orbit,
...     given={"longitude_radians_of_geo": None},
... ).exp()  
shape: (3,)
Series: 'logp' [f64]
[
    0.827158
    0.099435
    0.029606
]

Plot the marginal distribution of Period_minutes for each state

>>> import numpy as np
>>> import plotly.graph_objects as go
>>> period = pl.Series('Period_minutes', np.linspace(0, 1500, 500))
>>> fig = go.Figure()
>>> for i in range(engine.n_states):
...     p = engine.logp(period, state_ixs=[i]).exp()
...     fig = fig.add_trace(go.Scatter(
...         x=period,
...         y=p,
...         name=f'state {i}',
...         hoverinfo='text+name',
...     ))
>>> fig.update_layout(
...         xaxis_title='Period_minutes',
...         yaxis_title='f(Period)',
...     ) \
...     .show()  
{...}