lace.Engine.entropy

Engine.entropy(cols, n_mc_samples: int = 1000)

Estimate the entropy or joint entropy of one or more features.

Parameters:
  • col (column indices) – The columns for which to compute entropy

  • n_mc_samples (int) – The number of samples to use for Monte Carlo integration in cases that Monte Carlo integration is used

Returns:

h – The entropy, H(cols).

Return type:

float

Notes

  • Entropy behaves differently for continuous variables. Continuous, or differential entropy can be negative. The same holds true for joint entropies with one or more continuous feature.

Examples

Single feature entropy

>>> from lace.examples import Animals
>>> animals = Animals()
>>> animals.entropy(["slow"])
0.6755931727528786
>>> animals.entropy(["water"])
0.49836129824622094

Joint entropy

>>> animals.entropy(["swims", "fast"])
0.9552642751735604

We can use entropies to compute mutual information, I(X, Y) = H(X) + H(Y) - H(X, Y).

For example, there is not a lot of shared information between whether an animals swims and whether it is fast. These features are not predictive of each other.

>>> h_swims = animals.entropy(["swims"])
>>> h_fast = animals.entropy(["fast"])
>>> h_swims_and_fast = animals.entropy(["swims", "fast"])
>>> h_swims + h_fast - h_swims_and_fast
3.510013543328583e-05

But swimming and having flippers are mutually predictive, so we should see more mutual information.

>>> h_flippers = animals.entropy(["flippers"])
>>> h_swims_and_flippers = animals.entropy(["swims", "flippers"])
>>> h_swims + h_flippers - h_swims_and_flippers
0.19361180218629537