lace.Engine.surprisal

Engine.surprisal(col: int | str, *, rows=None, values=None, state_ixs=None)

Compute the surprisal of a values in specific cells.

Surprisal is the negative log likeilihood of a specific value in a specific position (cell) in the table.

Parameters:
  • col (column index) – The column location of the target cells

  • rows (arraylike[row index], optional) – Row indices of the cells. If None (default), all non-missing rows will be used.

  • values (arraylike[value]) – Proposed values for each cell. Must have an entry for each entry in rows. If None, the existing values are used.

  • state_ixs (List[int], optional) – An optional list specifying which states should be used in the surprisal computation. If None (default), use all states.

Returns:

A polars.DataFrame containing an index column for the row names, a <col> column for the values, and a surprisal column containing the surprisal values.

Return type:

polars.DataFrame

Examples

Find satellites with the top five most surprising expected lifetimes

>>> import polars as pl
>>> from lace.examples import Satellites
>>> engine = Satellites()
>>> engine.surprisal("Expected_Lifetime").sort(
...     "surprisal", descending=True
... ).head(5)
shape: (5, 3)
┌───────────────────────────────────┬───────────────────┬───────────┐
│ index                             ┆ Expected_Lifetime ┆ surprisal │
│ ---                               ┆ ---               ┆ ---       │
│ str                               ┆ f64               ┆ f64       │
╞═══════════════════════════════════╪═══════════════════╪═══════════╡
│ International Space Station (ISS… ┆ 30.0              ┆ 11.423102 │
│ Milstar DFS-5 (USA 164, Milstar … ┆ 0.0               ┆ 6.661427  │
│ DSP 21 (USA 159) (Defense Suppor… ┆ 0.5               ┆ 6.366436  │
│ DSP 22 (USA 176) (Defense Suppor… ┆ 0.5               ┆ 6.366436  │
│ Intelsat 701                      ┆ 0.5               ┆ 6.366436  │
└───────────────────────────────────┴───────────────────┴───────────┘

Compute the surprisal for specific cells

>>> engine.surprisal(
...     "Expected_Lifetime", rows=["Landsat 7", "Intelsat 701"]
... )
shape: (2, 3)
┌──────────────┬───────────────────┬───────────┐
│ index        ┆ Expected_Lifetime ┆ surprisal │
│ ---          ┆ ---               ┆ ---       │
│ str          ┆ f64               ┆ f64       │
╞══════════════╪═══════════════════╪═══════════╡
│ Landsat 7    ┆ 15.0              ┆ 4.588265  │
│ Intelsat 701 ┆ 0.5               ┆ 6.366436  │
└──────────────┴───────────────────┴───────────┘

Compute the surprisal of specific values in specific cells

>>> engine.surprisal(
...     "Expected_Lifetime",
...     rows=["Landsat 7", "Intelsat 701"],
...     values=[10.0, 10.0],
... )
shape: (2, 3)
┌──────────────┬───────────────────┬───────────┐
│ index        ┆ Expected_Lifetime ┆ surprisal │
│ ---          ┆ ---               ┆ ---       │
│ str          ┆ f64               ┆ f64       │
╞══════════════╪═══════════════════╪═══════════╡
│ Landsat 7    ┆ 10.0              ┆ 2.984587  │
│ Intelsat 701 ┆ 10.0              ┆ 2.52041   │
└──────────────┴───────────────────┴───────────┘

Compute the surprisal of multiple values in a single cell

>>> engine.surprisal(
...     "Expected_Lifetime",
...     rows=["Landsat 7"],
...     values=[0.5, 1.0, 5.0, 10.0],
... )  
shape: (4,)
Series: 'surprisal' [f64]
[
        3.225658
        3.036696
        2.273096
        2.984587
]

Surprisal will be different under different_states

>>> engine.surprisal(
...     "Expected_Lifetime",
...     rows=["Landsat 7", "Intelsat 701"],
...     values=[10.0, 10.0],
...     state_ixs=[0, 1],
... )
shape: (2, 3)
┌──────────────┬───────────────────┬───────────┐
│ index        ┆ Expected_Lifetime ┆ surprisal │
│ ---          ┆ ---               ┆ ---       │
│ str          ┆ f64               ┆ f64       │
╞══════════════╪═══════════════════╪═══════════╡
│ Landsat 7    ┆ 10.0              ┆ 3.431414  │
│ Intelsat 701 ┆ 10.0              ┆ 2.609992  │
└──────────────┴───────────────────┴───────────┘