lace.analysis.held_out_neglogp

lace.analysis.held_out_neglogp(engine: Engine, values, given: dict[Union[str, int], Any], quiet: bool = False, greedy: bool = True) DataFrame

Compute -logp for values while sequentially dropping given conditions.

Parameters:
  • engine (Engine) – The Engine used to compute logp

  • values (polars or pandas DataFrame or Series) – The values over which to compute the log likelihood. Each row of the DataFrame, or each entry of the Series, is an observation. Column names (or the Series name) should correspond to names of features in the table.

  • given (dict[index, value], optional) – A dictionary mapping column indices/name to values, which specifies conditions on the observations.

  • quiet (bool) – Prevent the display of a progress bar.

  • greedy (bool) – Use a greedy algorithm which is faster but may be less optimal.

Returns:

A DataFrame with a ‘feature’ column and a ‘-logp’ column.

Return type:

polars.DataFrame

Examples

>>> import polars as pl
>>> from lace.examples import Satellites
>>> from lace.analysis import held_out_neglogp
>>> satellites = Satellites()
>>> given = (
...     satellites.df.to_pandas()
...     .set_index("ID")
...     .loc["Intelsat 903", :]
...     .dropna()
...     .to_dict()
... )
>>> period = given.pop("Period_minutes")
>>> held_out_neglogp(
...     satellites,
...     pl.Series("Period_minutes", [period]),
...     given,
...     quiet=True,
... )  
shape: (19, 3)
┌─────────────────────────┬─────────────────────┬───────────┐
│ feature_rmed            ┆ HoldOutFunc.NegLogp ┆ keys_rmed │
│ ---                     ┆ ---                 ┆ ---       │
│ list[str]               ┆ f64                 ┆ i64       │
╞═════════════════════════╪═════════════════════╪═══════════╡
│ null                    ┆ 7.808063            ┆ 0         │
│ ["Apogee_km"]           ┆ 5.082683            ┆ 1         │
│ ["Eccentricity"]        ┆ 2.931816            ┆ 2         │
│ ["Launch_Vehicle"]      ┆ 2.931816            ┆ 3         │
│ …                       ┆ …                   ┆ …         │
│ ["Power_watts"]         ┆ 2.932103            ┆ 15        │
│ ["Inclination_radians"] ┆ 2.933732            ┆ 16        │
│ ["Users"]               ┆ 2.940667            ┆ 17        │
│ ["Perigee_km"]          ┆ 3.956759            ┆ 18        │
└─────────────────────────┴─────────────────────┴───────────┘

If we don’t want to use the greedy search, we can enumerate, but we need to be mindful that the number of conditions we must enumerate over is 2^n

>>> keys = sorted(list(given.keys()))
>>> _ = [given.pop(c) for c in keys[-10:]]
>>> held_out_neglogp(
...     satellites,
...     pl.Series("Period_minutes", [period]),
...     given,
...     quiet=True,
...     greedy=False,
... )  
shape: (9, 3)
┌───────────────────────────────────┬─────────────────────┬───────────┐
│ feature_rmed                      ┆ HoldOutFunc.NegLogp ┆ keys_rmed │
│ ---                               ┆ ---                 ┆ ---       │
│ list[str]                         ┆ f64                 ┆ i64       │
╞═══════════════════════════════════╪═════════════════════╪═══════════╡
│ null                              ┆ 7.853468            ┆ 0         │
│ ["Apogee_km"]                     ┆ 5.106627            ┆ 1         │
│ ["Apogee_km", "Eccentricity"]     ┆ 2.951662            ┆ 2         │
│ ["Apogee_km", "Country_of_Operat… ┆ 2.951254            ┆ 3         │
│ …                                 ┆ …                   ┆ …         │
│ ["Apogee_km", "Country_of_Contra… ┆ 2.956224            ┆ 5         │
│ ["Apogee_km", "Country_of_Contra… ┆ 2.96479             ┆ 6         │
│ ["Apogee_km", "Country_of_Contra… ┆ 2.992173            ┆ 7         │
│ ["Apogee_km", "Class_of_Orbit", … ┆ 3.956759            ┆ 8         │
└───────────────────────────────────┴─────────────────────┴───────────┘