lace.engine.Engine.rowsim
- Engine.rowsim(row_pairs: list, wrt: list | None = None, col_weighted: bool = False)
Compute the row similarity between pairs of rows.
Row similarity (or relevance) takes on continuous values in [0, 1] and is a measure of how similar two rows are with respect to how their values are modeled. This is distinct from distance-based measures in that it looks entirely in model space. This has a number of advantages such as scaling independent of the data (or even the data types) and complete disregard for missing values (all cells, missing or occupied, are assigned to a category).
The row similarity between two rows, A and B, is defined as the mean proportion of categories in which the two rows are in the same category.
- Parameters:
row_pairs (List[(row index, row index)]) – A list of row pairs for which to compute row similarity
wrt (List[column index], optional) – An optional list of column indices to provide context. If columns are provided via
wrt
, only views containing these columns will be considered in the row similarity computation. IfNone
(default), all views are considered.col_weighted (bool) – If
True
, row similarity will compute the proportion of relevant columns, instead of views, in which the two rows are in the same category.
- Returns:
Contains a entry for each pair in
row_pairs
. Ifrow_pairs
contains a single entry, a float will be returned.- Return type:
float, polars.Series
Examples
How similar are a beaver and a polar bear?
>>> from lace.examples import Animals >>> animals = Animals() >>> animals.rowsim([("beaver", "polar+bear")]) 0.6059523809523808
What about if we weight similarity by columns and not the standard views?
>>> animals.rowsim([("beaver", "polar+bear")], col_weighted=True) 0.5698529411764706
Not much change. How similar are they with respect to how we model their swimming?
>>> animals.rowsim([("beaver", "polar+bear")], wrt=["swims"]) 0.875
Very similar. But will all animals that swim be highly similar with respect to their swimming?
>>> animals.rowsim([("otter", "polar+bear")], wrt=["swims"]) 0.375
Lace predicts an otter’s swimming for different reasons than a polar bear’s.
What is a Chihuahua more similar to, a wolf or a rat?
>>> from lace.examples import Animals >>> engine = Animals() >>> engine.rowsim( ... [ ... ("chihuahua", "wolf"), ... ("chihuahua", "rat"), ... ] ... ) shape: (2,) Series: 'rowsim' [f64] [ 0.629315 0.772545 ]