lace.engine.Engine.depprob
- Engine.depprob(col_pairs: list)
Compute the dependence probability between pairs of columns.
The dependence probability between columns X and Y is the probability that a dependence path exists between two columns. If X is predictive of Y (or the reverse), dependence probability will be closer to 1.
The dependence probability between two columns is defined as the proportion of lace states in which those two columns belong to the same view.
- Parameters:
col_pairs (list((column index, column index))) – A list of pairs of columns for which to compute dependence probability
- Returns:
Contains a entry for each pair in
col_pairs
. Ifcol_pairs
contains a single entry, a float will be returned.- Return type:
float, polars.Series
Notes
Note that high dependence probability does not always indicate that two variables are mutually predictive. For example in the model
X ~ Normal(0, 1) Y ~ Normal(0, 1) Z ~ X + Y
X and Y are completely independent of each other, by X and Y are predictable through Z. If you know X and Z, you know Y. In this case X and Y will have a high dependence probability because of their shared relationship with Z.
If you are only interested in the magnitude of predictive power between two variables, use mutual information via the
mi
function.See also
Examples
A single pair as input gets you a float output
>>> from lace.examples import Animals >>> engine = Animals() >>> engine.depprob([("swims", "flippers")]) 1.0
Multiple pairs as inputs gets you a polars
Series
>>> engine.depprob( ... [ ... ("swims", "flippers"), ... ("fast", "tail"), ... ] ... ) shape: (2,) Series: 'depprob' [f64] [ 1.0 0.625 ]