lace.Engine.logp
- Engine.logp(values, given=None, *, state_ixs: List[int] | None = None, scaled: bool = False) None | float | Series
Compute the log likelihood.
This function computes
log p(values)
orlog p(values|given)
.- Parameters:
values (polars or pandas DataFrame or Series) – The values over which to compute the log likelihood. Each row of the DataFrame, or each entry of the Series, is an observation. Column names (or the Series name) should correspond to names of features in the table.
given (Dict[index, value], optional) – A dictionary mapping column indices/name to values, which specifies conditions on the observations.
state_ixs (List[int], optional) – An optional list specifying which states should be used in the likelihood computation. If None (default), use all states.
scaled (bool, optional) – If True the components of the likelihoods will be scaled so that each dimension (feature) contributes a likelihood in [0, 1], thus the scaled log likelihood will not be as prone to being dominated by any one feature.
- Returns:
The log likelihood for each observation in
values
- Return type:
polars.Series or float
Notes
For missing-not-at-random (MNAR) columns, asking about the likelihood of a values returns the likelihood of just that value; not the likelihood of that value and that value being present. Computing logp of
None
returns the log likeihood of a value being missing.The
scaled
variant is a heuristic used for model monitoring.
Examples
Ask about the likelihood of values in a single column
>>> import polars as pl >>> from lace.examples import Satellites >>> engine = Satellites() >>> class_of_orbit = pl.Series("Class_of_Orbit", ["LEO", "MEO", "GEO"]) >>> engine.logp(class_of_orbit).exp() shape: (3,) Series: 'logp' [f64] [ 0.515602 0.06607 0.38637 ]
Conditioning using
given
>>> engine.logp( ... class_of_orbit, ... given={"Period_minutes": 1436.0}, ... ).exp() shape: (3,) Series: 'logp' [f64] [ 0.000975 0.018733 0.972718 ]
Ask about the likelihood of values belonging to multiple features
>>> values = pl.DataFrame( ... { ... "Class_of_Orbit": ["LEO", "MEO", "GEO"], ... "Period_minutes": [70.0, 320.0, 1440.0], ... } ... ) >>> engine.logp(values).exp() shape: (3,) Series: 'logp' [f64] [ 0.000353 0.000006 0.015253 ]
An example of the scaled variant:
>>> engine.logp( ... values, ... scaled=True, ... ).exp() shape: (3,) Series: 'logp_scaled' [f64] [ 0.260898 0.133143 0.592816 ]
For columns which we explicitly model missing-not-at-random data, we can ask about the likelihood of missing values.
>>> from math import exp >>> no_long_geo = pl.Series("longitude_radians_of_geo", [None]) >>> exp(engine.logp(no_long_geo)) 0.626977387513902
The probability of a value missing (not-at-random) changes depending on the conditions.
>>> exp(engine.logp(no_long_geo, given={"Class_of_Orbit": "GEO"})) 0.07779133514786091
And we can condition on missingness
>>> engine.logp( ... class_of_orbit, ... given={"longitude_radians_of_geo": None}, ... ).exp() shape: (3,) Series: 'logp' [f64] [ 0.818785 0.090779 0.04799 ]
Plot the marginal distribution of Period_minutes for each state
>>> import numpy as np >>> import plotly.graph_objects as go >>> period = pl.Series('Period_minutes', np.linspace(0, 1500, 500)) >>> fig = go.Figure() >>> for i in range(engine.n_states): ... p = engine.logp(period, state_ixs=[i]).exp() ... fig = fig.add_trace(go.Scatter( ... x=period, ... y=p, ... name=f'state {i}', ... hoverinfo='text+name', ... )) >>> fig.update_layout( ... xaxis_title='Period_minutes', ... yaxis_title='f(Period)', ... ) \ ... .show() {...}