cuvs.neighbors.brute_force

cuvs.neighbors.brute_force#

6 min read time

Applies to Linux

Submodules#

cuvs.neighbors.brute_force.brute_force

Classes#

Index

Brute Force index object. This object stores the trained Brute Force

Functions#

`build`(*args[, resources])	build(dataset, metric=u'sqeuclidean', metric_arg=2.0, resources=None)
`load`(*args[, resources])	load(filename, resources=None)
`save`(*args[, resources])	save(filename, Index index, bool include_dataset=True, resources=None)
`search`(*args[, resources])	search(Index index, queries, k, neighbors=None, distances=None, resources=None, prefilter=None)

Package Contents#

class cuvs.neighbors.brute_force.Index#

Brute Force index object. This object stores the trained Brute Force which can be used to perform nearest neighbors searches.

static __reduce__(*args, **kwargs)#: Index.__reduce_cython__(self)

static __repr__(*args, **kwargs)#

static __setstate__(*args, **kwargs)#: Index.__setstate_cython__(self, __pyx_state)

cuvs.neighbors.brute_force.build(*args, resources=None, **kwargs)#

build(dataset, metric=u’sqeuclidean’, metric_arg=2.0, resources=None)

Build the Brute Force index from the dataset for efficient search.

datasetCUDA array interface compliant matrix shape (n_samples, dim)
Supported dtype [float32, float16]

metric : Distance metric to use. Default is sqeuclidean metric_arg : value of ‘p’ for Minkowski distances resources : Optional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

index: cuvs.neighbors.brute_force.Index
>>> import cupy as cp
>>> from cuvs.neighbors import brute_force
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> k = 10
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> index = brute_force.build(dataset, metric="cosine")
>>> distances, neighbors = brute_force.search(index, dataset, k)
>>> distances = cp.asarray(distances)
>>> neighbors = cp.asarray(neighbors)

cuvs.neighbors.brute_force.load(*args, resources=None, **kwargs)#

load(filename, resources=None)

Loads index from file.

The serialization format can be subject to changes, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

filenamestring
Name of the file.

resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

index : Index
>>> import cupy as cp
>>> from cuvs.neighbors import brute_force
>>> n_samples = 50000
>>> n_features = 50
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = brute_force.build(dataset)
>>> # Serialize and deserialize the brute_force index built
>>> brute_force.save("my_index.bin", index)
>>> index_loaded = brute_force.load("my_index.bin")

cuvs.neighbors.brute_force.save(*args, resources=None, **kwargs)#

save(filename, Index index, bool include_dataset=True, resources=None)

Saves the index to a file.

The serialization format can be subject to changes, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

filenamestring
Name of the file.

indexIndex
Trained Brute Force index.

resourcesOptional cuVS Resource handle for reusing CUDA resources.
If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.
>>> import cupy as cp
>>> from cuvs.neighbors import brute_force
>>> n_samples = 50000
>>> n_features = 50
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = brute_force.build(dataset)
>>> # Serialize and deserialize the brute_force index built
>>> brute_force.save("my_index.bin", index)
>>> index_loaded = brute_force.load("my_index.bin")

cuvs.neighbors.brute_force.search(*args, resources=None, **kwargs)#

search(Index index, queries, k, neighbors=None, distances=None, resources=None, prefilter=None)

Find the k nearest neighbors for each query.

indexIndex: Trained Brute Force index.
queriesCUDA array interface compliant matrix shape (n_samples, dim): Supported dtype [float32, float16]
kint: The number of neighbors.
neighborsOptional CUDA array interface compliant matrix shape: (n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)
distancesOptional CUDA array interface compliant matrix shape: (n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)
prefilterOptional cuvs.neighbors.cuvsFilter can be used to filter: queries and neighbors based on a given bitmap. The filter function should have a row-major layout and logical shape [n_queries, n_samples], using the first n_samples bits to indicate whether queries[0] should compute the distance with index.

(default None)
resourcesOptional cuVS Resource handle for reusing CUDA resources.: If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

>>> # Example without pre-filter
>>> import cupy as cp
>>> from cuvs.neighbors import brute_force
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = brute_force.build(dataset, metric="sqeuclidean")
>>> # Search using the built index
>>> queries = cp.random.random_sample((n_queries, n_features),
...                                   dtype=cp.float32)
>>> k = 10
>>> # Using a pooling allocator reduces overhead of temporary array
>>> # creation during search. This is useful if multiple searches
>>> # are performed with same query size.
>>> distances, neighbors = brute_force.search(index, queries, k)
>>> neighbors = cp.asarray(neighbors)
>>> distances = cp.asarray(distances)

>>> # Example with pre-filter
>>> import numpy as np
>>> import cupy as cp
>>> from cuvs.neighbors import brute_force, filters
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = brute_force.build(dataset, metric="sqeuclidean")
>>> # Search using the built index
>>> queries = cp.random.random_sample((n_queries, n_features),
...                                   dtype=cp.float32)
>>> # Build filters
>>> n_bitmap = np.ceil(n_samples * n_queries / 32).astype(int)
>>> # Create your own bitmap as the filter by replacing the random one.
>>> bitmap = cp.random.randint(1, 1000, size=(n_bitmap,), dtype=cp.uint32)
>>> prefilter = filters.from_bitmap(bitmap)
>>> k = 10
>>> # Using a pooling allocator reduces overhead of temporary array
>>> # creation during search. This is useful if multiple searches
>>> # are performed with same query size.
>>> distances, neighbors = brute_force.search(index, queries, k,
...                                           prefilter=prefilter)
>>> neighbors = cp.asarray(neighbors)
>>> distances = cp.asarray(distances)