API Reference#

Index#

class usearch.index.BatchMatches(keys: ndarray, distances: ndarray, counts: ndarray, visited_members: int = 0, computed_distances: int = 0)#

This class contains information about multiple retrieved vectors for multiple queries, i.e it is a set of Matches instances.

computed_distances: int = 0#
count_matches(expected: ndarray, count: int | None = None) int#

Measures recall [0, len(expected)] as of Matches that contain the corresponding expected entry anywhere among results.

counts: ndarray#
distances: ndarray#
keys: ndarray#
mean_recall(expected: ndarray, count: int | None = None) float#

Measures recall [0, 1] as of Matches that contain the corresponding expected entry anywhere among results.

to_list() List[List[tuple]]#

Convert the result for each query to the list of tuples with information about its matches.

visited_members: int = 0#
class usearch.index.Clustering(index: 'Index', matches: 'BatchMatches', queries: 'Optional[np.ndarray]' = None)#
property centroids_popularity: Tuple[ndarray, ndarray]#
members_of(centroid: uint64) ndarray#
property network#
plot_centroids_popularity()#
subcluster(centroid: uint64, **clustering_kwards) Clustering#
class usearch.index.CompiledMetric(pointer, kind, signature)#
kind: MetricKind#

Alias for field number 1

pointer: int#

Alias for field number 0

signature: MetricSignature#

Alias for field number 2

class usearch.index.Index(*, ndim: int = 0, metric: str | ~usearch.compiled.MetricKind | ~usearch.index.CompiledMetric = <MetricKind.Cos: 99>, dtype: str | ~usearch.compiled.ScalarKind | None = None, connectivity: int | None = None, expansion_add: int | None = None, expansion_search: int | None = None, multi: bool = False, path: ~os.PathLike | None = None, view: bool = False, enable_key_lookups: bool = True)#

Fast vector-search engine for dense equi-dimensional embeddings.

Vector keys must be integers. Vectors must have the same number of dimensions within the index. Supports Inner Product, Cosine Distance, L^n measures like the Euclidean metric, as well as automatic downcasting to low-precision floating-point and integral representations.

add(keys: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview, vectors: ndarray | Iterable[ndarray] | memoryview, *, copy: bool = True, threads: int = 0, log: str | bool = False, progress: Callable[[int, int], bool] | None = None) int | ndarray#

Inserts one or move vectors into the index.

For maximal performance the keys and vectors should conform to the Python’s “buffer protocol” spec.

To index a single entry:

keys: int, vectors: np.ndarray.

To index many entries:

keys: np.ndarray, vectors: np.ndarray.

When working with extremely large indexes, you may want to pass copy=False, if you can guarantee the lifetime of the primary vectors store during the process of construction.

Parameters:
  • keys (Optional[KeyOrKeysLike], can be None) – Unique identifier(s) for passed vectors

  • vectors (VectorOrVectorsLike) – Vector or a row-major matrix

  • copy (bool, defaults to True) – Should the index store a copy of vectors

  • threads (int, defaults to 0) – Optimal number of cores to use

  • log (Union[str, bool], defaults to False) – Whether to print the progress bar

  • progress (Optional[ProgressCallback], defaults to None) – Callback to report stats of the progress and control it

Returns:

Inserted key or keys

Type:

Union[int, np.ndarray]

property capacity: int#
clear()#

Erases all the vectors from the index, preserving the space for future insertions.

cluster(*, vectors: ndarray | None = None, keys: ndarray | None = None, min_count: int | None = None, max_count: int | None = None, threads: int = 0, log: str | bool = False, progress: Callable[[int, int], bool] | None = None) Clustering#

Clusters already indexed or provided vectors, mapping them to various centroids.

Parameters:
  • vectors (Optional[VectorOrVectorsLike]) –

    .

  • count (Optional[int], defaults to None) – Upper bound on the number of clusters to produce

  • threads (int, defaults to 0) – Optimal number of cores to use,

  • log (Union[str, bool], defaults to False) – Whether to print the progress bar

  • progress (Optional[ProgressCallback], defaults to None) – Callback to report stats of the progress and control it

Returns:

Matches for one or more queries

Return type:

Union[Matches, BatchMatches]

property connectivity: int#
contains(keys: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview) bool | ndarray#
copy() Index#
count(keys: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview) int | ndarray#
property dtype: ScalarKind#
property expansion_add: int#
get(keys: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview, dtype: str | ScalarKind | None = None) ndarray | None | Tuple[ndarray | None]#

Looks up one or more keys from the Index, retrieving corresponding vectors.

Returns None, if one key is requested, and its not present. Returns a (row) vector, if the key maps into a single vector. Returns a (row-major) matrix, if the key maps into a multiple vectors. If multiple keys are requested, composes many such responses into a tuple.

Parameters:

keys (KeyOrKeysLike) – One or more keys to lookup

Returns:

One or more keys lookup results

Return type:

Union[Optional[np.ndarray], Tuple[Optional[np.ndarray]]]

property hardware_acceleration: str#

Describes the kind of hardware-acceleration support used in that exact instance of the Index, for that metric kind, and the given number of dimensions.

Returns:

“auto”, if nothing is available, ISA subset name otherwise

Return type:

str

property jit: bool#

True, if the provided metric was JIT-ed :rtype: bool

Type:

return

join(other: Index, max_proposals: int = 0, exact: bool = False, progress: Callable[[int, int], bool] | None = None) Dict[uint64, uint64]#

Performs “Semantic Join” or pairwise matching between self & other index. Is different from search, as no collisions are allowed in resulting pairs. Uses the concept of “Stable Marriages” from Combinatorics, famous for the 2012 Nobel Prize in Economics.

Parameters:
  • other (Index) – Another index.

  • max_proposals (int, optional) – Limit on candidates evaluated per vector, defaults to 0

  • exact (bool, optional) – Controls if underlying search should be exact, defaults to False

  • progress (Optional[ProgressCallback], defaults to None) – Callback to report stats of the progress and control it

Returns:

Mapping from keys of self to keys of other

Return type:

Dict[Key, Key]

property keys: IndexedKeys#
level_stats(level: int) IndexStats#

Get statistics for one level of the index - one graph.

Returns:

Statistics for one level of the index - one graph.

Return type:

_CompiledIndexStats

Statistics:
  • nodes (int): The number of nodes in that level.

  • edges (int): The number of edges in that level.

  • max_edges (int): The maximum possible number of edges in that level.

  • allocated_bytes (int): The amount of allocated memory for that level.

property levels_stats: List[IndexStats]#

Get the accumulated statistics for every level graph.

Returns:

Statistics for every level graph.

Return type:

List[_CompiledIndexStats]

Statistics:
  • nodes (int): The number of nodes in that level.

  • edges (int): The number of edges in that level.

  • max_edges (int): The maximum possible number of edges in that level.

  • allocated_bytes (int): The amount of allocated memory for that level.

load(path_or_buffer: str | PathLike | bytes | None = None, progress: Callable[[int, int], bool] | None = None)#
property max_level: int#
property memory_usage: int#
static metadata(path_or_buffer: str | PathLike | bytes) dict | None#
property metric: MetricKind | CompiledMetric#
property metric_kind: MetricKind | CompiledMetric#
property multi: bool#
property ndim: int#
property nlevels: int#
pairwise_distance(left: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview, right: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview) ndarray | float#
remove(keys: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview, *, compact: bool = False, threads: int = 0) int | ndarray#

Removes one or move vectors from the index.

When working with extremely large indexes, you may want to mark some entries deleted, instead of rebuilding a filtered index. In other cases, rebuilding - is the recommended approach.

Parameters:
  • keys (KeyOrKeysLike) – Unique identifier for passed vectors, optional

  • compact (bool, optional) – Removes links to removed nodes (expensive), defaults to False

  • threads (int, optional) – Optimal number of cores to use, defaults to 0

Returns:

Array of integers for the number of removed vectors per key

Type:

Union[int, np.ndarray]

rename(from_: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview, to: uint64 | Iterable[uint64] | int | Iterable[int] | ndarray | memoryview) int | ndarray#

Rename existing member vector or vectors.

May be used in iterative clustering procedures, where one would iteratively relabel every vector with the name of the cluster an entry belongs to, until the system converges.

Parameters:
  • from (KeyOrKeysLike) – One or more keys to be renamed

  • to (KeyOrKeysLike) – New name or names (of identical length as from_)

Returns:

Number of vectors that were found and renamed

Return type:

int

reset()#

Erases all members from index, closing files, and returning RAM to OS.

static restore(path_or_buffer: str | PathLike | bytes, view: bool = False) Index | None#
save(path_or_buffer: str | PathLike | None = None, progress: Callable[[int, int], bool] | None = None) bytes | None#
search(vectors: ndarray | Iterable[ndarray] | memoryview, count: int = 10, radius: float = inf, *, threads: int = 0, exact: bool = False, log: str | bool = False, progress: Callable[[int, int], bool] | None = None) Matches | BatchMatches#

Performs approximate nearest neighbors search for one or more queries.

Parameters:
  • vectors (VectorOrVectorsLike) – Query vector or vectors.

  • count (int, defaults to 10) – Upper count on the number of matches to find

  • threads (int, defaults to 0) – Optimal number of cores to use

  • exact (bool, defaults to False) – Perform exhaustive linear-time exact search

  • log (Union[str, bool], optional) – Whether to print the progress bar, default to False

  • progress (Optional[ProgressCallback], defaults to None) – Callback to report stats of the progress and control it

Returns:

Matches for one or more queries

Return type:

Union[Matches, BatchMatches]

property serialized_length: int#
property size: int#
property specs: Dict[str, str | int | bool]#
property stats: IndexStats#

Get the accumulated statistics for the entire multi-level graph.

Returns:

Statistics for the entire multi-level graph.

Return type:

_CompiledIndexStats

Statistics:
  • nodes (int): The number of nodes in that level.

  • edges (int): The number of edges in that level.

  • max_edges (int): The maximum possible number of edges in that level.

  • allocated_bytes (int): The amount of allocated memory for that level.

property vectors: ndarray#
view(path_or_buffer: str | PathLike | bytes | bytearray | None = None, progress: Callable[[int, int], bool] | None = None)#
class usearch.index.IndexedKeys(index: Index)#

Smart-reference for the range of keys present in a specific Index

class usearch.index.Indexes(indexes: Iterable[Index] = [], paths: Iterable[PathLike] = [], view: bool = False, threads: int = 0)#
merge(index: Index)#
merge_path(path: PathLike)#
search(vectors, count: int = 10, *, threads: int = 0, exact: bool = False, progress: Callable[[int, int], bool] | None = None)#
class usearch.index.Match(key: int, distance: float)#

This class contains information about retrieved vector.

distance: float#
key: int#
to_tuple() tuple#
class usearch.index.Matches(keys: ndarray, distances: ndarray, visited_members: int = 0, computed_distances: int = 0)#

This class contains information about multiple retrieved vectors for single query, i.e it is a set of Match instances.

computed_distances: int = 0#
distances: ndarray#
keys: ndarray#
to_list() List[tuple]#

Convert matches to the list of tuples which contain matches’ indices and distances to them.

visited_members: int = 0#
usearch.index.search(dataset: ~numpy.ndarray, query: ~numpy.ndarray, count: int = 10, metric: str | ~usearch.compiled.MetricKind | ~usearch.index.CompiledMetric = <MetricKind.Cos: 99>, *, exact: bool = False, threads: int = 0, log: str | bool = False, progress: ~typing.Callable[[int, int], bool] | None = None) Matches | BatchMatches#

Shortcut for search, that can avoid index construction. Particularly useful for tiny datasets, where brute-force exact search works fast enough.

Parameters:
  • dataset (np.ndarray) – Row-major matrix.

  • query (np.ndarray) – Query vector or vectors (also row-major), to find in dataset.

  • count (int, optional) – Upper count on the number of matches to find, defaults to 10

  • metric (MetricLike, defaults to MetricKind.Cos Kind of the distance function, or the Numba cfunc JIT-compiled object. Possible MetricKind values: IP, Cos, L2sq, Haversine, Pearson, Hamming, Tanimoto, Sorensen.) – Distance function

  • threads (int, optional) – Optimal number of cores to use, defaults to 0

  • exact (bool, optional) – Perform exhaustive linear-time exact search, defaults to False

  • log (Union[str, bool], optional) – Whether to print the progress bar, default to False

  • progress (Optional[ProgressCallback], defaults to None) – Callback to report stats of the progress and control it

Returns:

Matches for one or more queries

Return type:

Union[Matches, BatchMatches]

IO#

usearch.io.guess_numpy_dtype_from_filename(filename) type | None#
usearch.io.load_matrix(filename: str, start_row: int = 0, count_rows: int | None = None, view: bool = False, dtype: type | None = None) ndarray | None#

Read *.ibin, *.bbib, *.hbin, *.fbin, *.dbin files with matrices.

Parameters:
  • filename – path to the matrix file

  • start_row – start reading vectors from this index

  • count_rows – number of vectors to read. If None, read all vectors

  • view – set to True to memory-map the file instead of loading to RAM

Returns:

parsed matrix

Return type:

numpy.ndarray

usearch.io.numpy_scalar_size(dtype) int#
usearch.io.save_matrix(vectors: ndarray, filename: str)#

Write *.ibin, *.bbib, *.hbin, *.fbin, *.dbin files with matrices.

Parameters:
  • vectors (numpy.ndarray) – the matrix to serialize

  • filename (str) – path to the matrix file

Evaluation#

class usearch.eval.AddTask(keys: 'np.ndarray', vectors: 'np.ndarray')#
clusters(number_of_clusters: int) List[AddTask]#

Splits this dataset into smaller chunks.

property count#
inplace_shuffle()#

Rorders the vectors and keys. Often used for robustness benchmarks.

keys: ndarray#
property ndim#
slices(batch_size: int) List[AddTask]#

Splits this dataset into smaller chunks.

vectors: ndarray#
class usearch.eval.Dataset(keys: 'np.ndarray', vectors: 'np.ndarray', queries: 'np.ndarray', neighbors: 'np.ndarray')#
static build(vectors: str | None = None, queries: str | None = None, neighbors: str | None = None, count: int | None = None, ndim: int | None = None, k: int | None = None)#

Either loads an existing dataset from disk, or generates one on the fly.

Parameters:
  • vectors (Optional[str], optional) – _description_, defaults to None

  • queries (Optional[str], optional) – _description_, defaults to None

  • neighbors (Optional[str], optional) – _description_, defaults to None

  • count (Optional[int], optional) – _description_, defaults to None

  • ndim (Optional[int], optional) – _description_, defaults to None

  • k (Optional[int], optional) – _description_, defaults to None

crop_neighbors(k: int)#
keys: ndarray#
property ndim#
neighbors: ndarray#
queries: ndarray#
vectors: ndarray#
class usearch.eval.Evaluation(tasks: 'List[Union[AddTask, SearchTask]]', count: 'int', ndim: 'int')#
count: int#
static for_dataset(dataset: Dataset, batch_size: int = 0, clusters: int = 1) Evaluation#
ndim: int#
tasks: List[AddTask | SearchTask]#
class usearch.eval.SearchStats(index_size: int, count_queries: int, count_matches: int, visited_members: int, computed_distances: int)#

Contains statistics for one or more search runs, including the number of internal nodes that were fetched (visited_members) and the number of times the distance metric was invoked (computed_distances).

Other derivative metrics include the mean_recall and mean_efficiency. Recall is the share of queried vectors, that were successfully found. Efficiency describes the number of distances that had to be computed for each query, normalized to size of the index. Highest efficiency is 0.(9), lowest is zero. Highest is achieved, when the distance metric was computed just once per query. Lowest happens during exact search, when every distance to every present vector had to be computed.

computed_distances: int#
count_matches: int#
count_queries: int#
index_size: int#
property mean_efficiency: float#
property mean_recall: float#
visited_members: int#
class usearch.eval.SearchTask(queries: 'np.ndarray', neighbors: 'np.ndarray')#
neighbors: ndarray#
queries: ndarray#
slices(batch_size: int) List[SearchTask]#

Splits this dataset into smaller chunks.

class usearch.eval.TaskResult(add_operations: 'Optional[int]' = None, add_per_second: 'Optional[float]' = None, search_operations: 'Optional[int]' = None, search_per_second: 'Optional[float]' = None, recall_at_one: 'Optional[float]' = None)#
add_operations: int | None = None#
add_per_second: float | None = None#
property add_seconds: float#
recall_at_one: float | None = None#
search_operations: int | None = None#
search_per_second: float | None = None#
property search_seconds: float#
usearch.eval.dcg(relevances: ndarray, k: int | None = None) ndarray#

Calculate DCG (Discounted Cumulative Gain) up to position k.

Parameters:
  • relevances (list) – List of true relevance scores (in the order as they are ranked)

  • k (int) – Position up to which DCG is computed

Returns:

The DCG score at position k

Return type:

float

usearch.eval.measure_seconds(f: Callable) Tuple[float, Any]#

Simple function profiling decorator.

Parameters:

f (Callable) – Function to be profiled

Returns:

Time elapsed in seconds and the result of the execution

Return type:

Tuple[float, Any]

usearch.eval.ndcg(relevances: ndarray, k: int | None = None) ndarray#

Calculate NDCG (Normalized Discounted Cumulative Gain) at position k.

Parameters:
  • relevances (list) – List of true relevance scores (in the order as they are ranked)

  • k (int) – Position up to which NDCG is computed

Returns:

The NDCG score at position k

Return type:

float

usearch.eval.random_vectors(count: int, metric: ~usearch.compiled.MetricKind = <MetricKind.IP: 105>, dtype: ~usearch.compiled.ScalarKind = <ScalarKind.F32: 11>, ndim: int | None = None, index: ~usearch.index.Index | None = None) ndarray#

Produces a collection of random vectors normalized for the provided metric and matching wanted dtype, which can both be inferred from an existing index.

usearch.eval.relevance(expected: ndarray, predicted: ndarray, k: int | None = None) ndarray#

Calculate relevance scores. Binary relevance scores

Parameters:
  • expected (np.ndarray) – ground-truth keys

  • predicted (np.ndarray) – predicted keys

usearch.eval.self_recall(index: Index, sample: float | int = 1.0, **kwargs) SearchStats#

Simplest benchmark for a quality of search, which queries every existing member of the index, to make sure approximate search finds the point itself.

Parameters:
  • index (Index) – Non-empty pre-constructed index

  • sample (Union[float, int]) – Share (or number) of vectors to search, defaults to 1.0

Returns:

Evaluation report with key metrics

Return type:

SearchStats

Client#

Server#