lace.Engine.from_df

classmethod Engine.from_df(df: DataFrame | DataFrame, codebook: CodebookBuilder | PathLike | str | Codebook | None = None, n_states: int = 8, id_offset: int = 0, rng_seed: int | None = None, flat_columns: bool = False) Engine

Create a new Engine from a DataFrame.

Parameters:
  • dataframe (pd.DataFrame or pl.DataFrame) – DataFrame with relevant data.

  • codebook (CodebookBuilder or PathLike or str, optional) – Codebook builder which can load codebook from file or generate one from data. See CodebookBuilder.

  • n_states (int, optional) – The number of states (independent Markov chains).

  • id_offset (int, optional) – An offset for renaming states in the metadata. Used when training a single engine on multiple machines. If one wished to split an 8-state Engine run on to two machine, one may run a 4-state Engine on the first machine, then a 4-state Engine on the second machine with id_offset=4. The states within two metadata files may be merged by copying without name collisions.

  • rng_seed (int, optional) – Random number generator seed.

  • flat_columns (bool) – Initialize all states with one view. Use when you do not want to do inference over the assignment of columns to views. Note that to keep the states flat you will have to either use the flat transition set or manually create a transition set that does not update the column assignments when updating.

Examples

Create a new Engine from a DataFrame

>>> from lace import Engine
>>> import polars as pl
>>> df = pl.DataFrame({
...    "ID": [1, 2, 3, 4],
...    "list_b": [2.0, 4.0, 6.0, 8.0],
... })
>>> engine = Engine.from_df(df)

Create a new Engine with specific codebook inference rules >>> from lace import Engine, CodebookBuilder >>> import polars as pl >>> df = pl.DataFrame({ … “ID”: [1, 2, 3, 4], … “list_b”: [2.0, 4.0, 6.0, 8.0], … }) >>> engine = Engine.from_df(df, codebook=CodebookBuilder.infer( … cat_cutoff=2, … ))

Create an engine with flat column structure (one view) >>> from lace.examples import Animals >>> df = Animals().df >>> n_states = 8 >>> engine = Engine.from_df(df, n_states=n_states, flat_columns=True) >>> [max(engine.column_assignment(i)) for i in range(n_states)] [0, 0, 0, 0, 0, 0, 0, 0]