lace.engine.Engine.simulate

Engine.simulate(cols, given=None, n: int = 1, include_given: bool = False)

Simulate data from a conditional distribution.

Parameters:
  • cols (List[column index]) – A list of target columns to simulate

  • given (Dict[column index, value], optional) – An optional dictionary of column -> value conditions

  • n (int, optional) – The number of values to draw

  • include_given (bool, optional) – If True, the conditioning values in the given will be included in the output

Returns:

The output data

Return type:

polars.DataFrame

Examples

Draw from a pair of columns

>>> from lace.examples import Satellites
>>> engine = Satellites()
>>> engine.simulate(["Class_of_Orbit", "Period_minutes"], n=5)
shape: (5, 2)
┌────────────────┬────────────────┐
│ Class_of_Orbit ┆ Period_minutes │
│ ---            ┆ ---            │
│ str            ┆ f64            │
╞════════════════╪════════════════╡
│ MEO            ┆ 2807.568333    │
│ GEO            ┆ 1421.333515    │
│ LEO            ┆ 92.435621      │
│ GEO            ┆ 1435.7067      │
│ LEO            ┆ 84.896787      │
└────────────────┴────────────────┘

Simulate a pair of columns conditioned on another

>>> engine.simulate(
...     ["Class_of_Orbit", "Period_minutes"],
...     given={"Purpose": "Communications"},
...     n=5,
... )
shape: (5, 2)
┌────────────────┬────────────────┐
│ Class_of_Orbit ┆ Period_minutes │
│ ---            ┆ ---            │
│ str            ┆ f64            │
╞════════════════╪════════════════╡
│ GEO            ┆ 1439.041087    │
│ GEO            ┆ 1426.020318    │
│ GEO            ┆ 1430.553113    │
│ GEO            ┆ 1451.192889    │
│ GEO            ┆ 1431.855712    │
└────────────────┴────────────────┘

Simulate missing values for columns that are missing not-at-random

>>> engine.simulate(["longitude_radians_of_geo"], n=5)
shape: (5, 1)
┌──────────────────────────┐
│ longitude_radians_of_geo │
│ ---                      │
│ f64                      │
╞══════════════════════════╡
│ null                     │
│ null                     │
│ null                     │
│ null                     │
│ null                     │
└──────────────────────────┘
>>> engine.simulate(
...     ["longitude_radians_of_geo"],
...     given={"Class_of_Orbit": "GEO"},
...     n=5,
... )
shape: (5, 1)
┌──────────────────────────┐
│ longitude_radians_of_geo │
│ ---                      │
│ f64                      │
╞══════════════════════════╡
│ 0.396442                 │
│ 0.794023                 │
│ 0.643669                 │
│ -0.005531                │
│ 1.827976                 │
└──────────────────────────┘

If we simulate using given conditions, we can include the conditions in the output using include_given=True.

>>> engine.simulate(
...     ["Period_minutes"],
...     given={"Purpose": "Communications", "Class_of_Orbit": "GEO"},
...     n=5,
...     include_given=True,
... )
shape: (5, 3)
┌────────────────┬────────────────┬────────────────┐
│ Period_minutes ┆ Purpose        ┆ Class_of_Orbit │
│ ---            ┆ ---            ┆ ---            │
│ f64            ┆ str            ┆ str            │
╞════════════════╪════════════════╪════════════════╡
│ 1436.038447    ┆ Communications ┆ GEO            │
│ 1447.908161    ┆ Communications ┆ GEO            │
│ 1452.635331    ┆ Communications ┆ GEO            │
│ 1443.983013    ┆ Communications ┆ GEO            │
│ 1437.544045    ┆ Communications ┆ GEO            │
└────────────────┴────────────────┴────────────────┘