lace.engine.Engine.depprob

Engine.depprob(col_pairs: list)

Compute the dependence probability between pairs of columns.

The dependence probability between columns X and Y is the probability that a dependence path exists between two columns. If X is predictive of Y (or the reverse), dependence probability will be closer to 1.

The dependence probability between two columns is defined as the proportion of lace states in which those two columns belong to the same view.

Parameters:

col_pairs (list((column index, column index))) – A list of pairs of columns for which to compute dependence probability

Returns:

Contains a entry for each pair in col_pairs. If col_pairs contains a single entry, a float will be returned.

Return type:

float, polars.Series

Notes

Note that high dependence probability does not always indicate that two variables are mutually predictive. For example in the model

X ~ Normal(0, 1) Y ~ Normal(0, 1) Z ~ X + Y

X and Y are completely independent of each other, by X and Y are predictable through Z. If you know X and Z, you know Y. In this case X and Y will have a high dependence probability because of their shared relationship with Z.

If you are only interested in the magnitude of predictive power between two variables, use mutual information via the mi function.

See also

mi

Examples

A single pair as input gets you a float output

>>> from lace.examples import Animals
>>> engine = Animals()
>>> engine.depprob([("swims", "flippers")])
1.0

Multiple pairs as inputs gets you a polars Series

>>> engine.depprob(
...     [
...         ("swims", "flippers"),
...         ("fast", "tail"),
...     ]
... )  
shape: (2,)
Series: 'depprob' [f64]
[
    1.0
    0.625
]