lace.analysis.held_out_inconsistency

lace.analysis.held_out_inconsistency(engine: Engine, values, given: dict[Union[str, int], Any], quiet: bool = False, greedy: bool = True) DataFrame

Compute inconsistency for values while sequentially dropping given conditions.

Parameters:
  • engine (Engine) – The Engine used to compute inconsistency

  • values (polars or pandas DataFrame or Series) – The values over which to compute the inconsistency. Each row of the DataFrame, or each entry of the Series, is an observation. Column names (or the Series name) should correspond to names of features in the table.

  • given (dict[index, value], optional) – A dictionary mapping column indices/name to values, which specifies conditions on the observations.

  • quiet (bool) – Prevent the display of a progress bar.

  • greedy (bool) – Use a greedy algorithm which is faster but may be less optimal.

Returns:

A DataFrame with a ‘feature’ column and a ‘-logp’ column.

Return type:

polars.DataFrame

Examples

>>> import polars as pl
>>> from lace.examples import Satellites
>>> from lace.analysis import held_out_inconsistency
>>> satellites = Satellites()
>>> given = (
...     satellites.df.to_pandas()
...     .set_index("ID")
...     .loc["Intelsat 903", :]
...     .dropna()
...     .to_dict()
... )
>>> period = given.pop("Period_minutes")
>>> held_out_inconsistency(
...     satellites,
...     pl.Series("Period_minutes", [period]),
...     given,
...     quiet=True,
... )  
shape: (19, 3)
┌─────────────────────────┬───────────────────────────┬───────────┐
│ feature_rmed            ┆ HoldOutFunc.Inconsistency ┆ keys_rmed │
│ ---                     ┆ ---                       ┆ ---       │
│ list[str]               ┆ f64                       ┆ i64       │
╞═════════════════════════╪═══════════════════════════╪═══════════╡
│ null                    ┆ 1.973348                  ┆ 0         │
│ ["Apogee_km"]           ┆ 1.284557                  ┆ 1         │
│ ["Eccentricity"]        ┆ 0.740964                  ┆ 2         │
│ ["Launch_Vehicle"]      ┆ 0.740964                  ┆ 3         │
│ …                       ┆ …                         ┆ …         │
│ ["Power_watts"]         ┆ 0.741036                  ┆ 15        │
│ ["Inclination_radians"] ┆ 0.741448                  ┆ 16        │
│ ["Users"]               ┆ 0.743201                  ┆ 17        │
│ ["Perigee_km"]          ┆ 1.0                       ┆ 18        │
└─────────────────────────┴───────────────────────────┴───────────┘

If we don’t want to use the greedy search, we can enumerate, but we need to be mindful that the number of conditions we must enumerate over is 2^n

>>> keys = sorted(list(given.keys()))
>>> _ = [given.pop(c) for c in keys[-10:]]
>>> held_out_inconsistency(
...     satellites,
...     pl.Series("Period_minutes", [period]),
...     given,
...     quiet=True,
...     greedy=False,
... )  
shape: (9, 3)
┌───────────────────────────────────┬───────────────────────────┬───────────┐
│ feature_rmed                      ┆ HoldOutFunc.Inconsistency ┆ keys_rmed │
│ ---                               ┆ ---                       ┆ ---       │
│ list[str]                         ┆ f64                       ┆ i64       │
╞═══════════════════════════════════╪═══════════════════════════╪═══════════╡
│ null                              ┆ 1.984823                  ┆ 0         │
│ ["Apogee_km"]                     ┆ 1.290609                  ┆ 1         │
│ ["Apogee_km", "Eccentricity"]     ┆ 0.74598                   ┆ 2         │
│ ["Apogee_km", "Country_of_Operat… ┆ 0.745877                  ┆ 3         │
│ …                                 ┆ …                         ┆ …         │
│ ["Apogee_km", "Country_of_Contra… ┆ 0.747133                  ┆ 5         │
│ ["Apogee_km", "Country_of_Contra… ┆ 0.749297                  ┆ 6         │
│ ["Apogee_km", "Country_of_Contra… ┆ 0.756218                  ┆ 7         │
│ ["Apogee_km", "Class_of_Orbit", … ┆ 1.0                       ┆ 8         │
└───────────────────────────────────┴───────────────────────────┴───────────┘