lace.Engine.rowsim

Engine.rowsim(row_pairs: list, wrt: list | None = None, col_weighted: bool = False)

Compute the row similarity between pairs of rows.

Row similarity (or relevance) takes on continuous values in [0, 1] and is a measure of how similar two rows are with respect to how their values are modeled. This is distinct from distance-based measures in that it looks entirely in model space. This has a number of advantages such as scaling independent of the data (or even the data types) and complete disregard for missing values (all cells, missing or occupied, are assigned to a category).

The row similarity between two rows, A and B, is defined as the mean proportion of categories in which the two rows are in the same category.

Parameters:
  • row_pairs (List[(row index, row index)]) – A list of row pairs for which to compute row similarity

  • wrt (List[column index], optional) – An optional list of column indices to provide context. If columns are provided via wrt, only views containing these columns will be considered in the row similarity computation. If None (default), all views are considered.

  • col_weighted (bool) – If True, row similarity will compute the proportion of relevant columns, instead of views, in which the two rows are in the same category.

Returns:

Contains a entry for each pair in row_pairs. If row_pairs contains a single entry, a float will be returned.

Return type:

float, polars.Series

Examples

How similar are a beaver and a polar bear?

>>> from lace.examples import Animals
>>> animals = Animals()
>>> animals.rowsim([("beaver", "polar+bear")])
0.6059523809523808

What about if we weight similarity by columns and not the standard views?

>>> animals.rowsim([("beaver", "polar+bear")], col_weighted=True)
0.5698529411764706

Not much change. How similar are they with respect to how we model their swimming?

>>> animals.rowsim([("beaver", "polar+bear")], wrt=["swims"])
0.875

Very similar. But will all animals that swim be highly similar with respect to their swimming?

>>> animals.rowsim([("otter", "polar+bear")], wrt=["swims"])
0.375

Lace predicts an otter’s swimming for different reasons than a polar bear’s.

What is a Chihuahua more similar to, a wolf or a rat?

>>> from lace.examples import Animals
>>> engine = Animals()
>>> engine.rowsim(
...     [
...         ("chihuahua", "wolf"),
...         ("chihuahua", "rat"),
...     ]
... )  
shape: (2,)
Series: 'rowsim' [f64]
[
    0.629315
    0.772545
]