cuvs.neighbors.cagra#

10 min read time

Applies to Linux

Submodules#

Classes#

CompressionParams

CompressionParams(pq_bits=8, *, pq_dim=0, vq_n_centers=0, kmeans_n_iters=25, vq_kmeans_trainset_fraction=0.0, pq_kmeans_trainset_fraction=0.0)

Index

CAGRA index object. This object stores the trained CAGRA index state

IndexParams

IndexParams(metric=u'sqeuclidean', *, intermediate_graph_degree=128, graph_degree=64, build_algo=u'ivf_pq', nn_descent_niter=20, compression=None)

SearchParams

SearchParams(max_queries=0, *, itopk_size=64, max_iterations=0, algo=u'auto', team_size=0, search_width=1, min_iterations=0, thread_block_size=0, hashmap_mode=u'auto', hashmap_min_bitlen=0, hashmap_max_fill_rate=0.5, num_random_samplings=1, rand_xor_mask=0x128394)

Functions#

build(*args[, resources])

build(IndexParams index_params, dataset, resources=None)

load(*args[, resources])

load(filename, resources=None)

save(*args[, resources])

save(filename, Index index, bool include_dataset=True, resources=None)

search(*args[, resources])

search(SearchParams search_params, Index index, queries, k, neighbors=None, distances=None, resources=None, filter=None)

Package Contents#

class cuvs.neighbors.cagra.CompressionParams#

CompressionParams(pq_bits=8, *, pq_dim=0, vq_n_centers=0, kmeans_n_iters=25, vq_kmeans_trainset_fraction=0.0, pq_kmeans_trainset_fraction=0.0)

Parameters for VPQ Compression

pq_bits: int

The bit length of the vector element after compression by PQ. Possible values: [4, 5, 6, 7, 8]. The smaller the ‘pq_bits’, the smaller the index size and the better the search performance, but the lower the recall.

pq_dim: int

The dimensionality of the vector after compression by PQ. When zero, an optimal value is selected using a heuristic.

vq_n_centers: int

Vector Quantization (VQ) codebook size - number of “coarse cluster centers”. When zero, an optimal value is selected using a heuristic.

kmeans_n_iters: int

The number of iterations searching for kmeans centers (both VQ & PQ phases).

vq_kmeans_trainset_fraction: float

The fraction of data to use during iterative kmeans building (VQ phase). When zero, an optimal value is selected using a heuristic.

vq_kmeans_trainset_fraction: float

The fraction of data to use during iterative kmeans building (PQ phase). When zero, an optimal value is selected using a heuristic.

static __reduce__(*args, **kwargs)#

CompressionParams.__reduce_cython__(self)

static __setstate__(*args, **kwargs)#

CompressionParams.__setstate_cython__(self, __pyx_state)

get_handle()#

CompressionParams.get_handle(self)

class cuvs.neighbors.cagra.Index#

CAGRA index object. This object stores the trained CAGRA index state which can be used to perform nearest neighbors searches.

static __reduce__(*args, **kwargs)#

Index.__reduce_cython__(self)

static __repr__(*args, **kwargs)#
static __setstate__(*args, **kwargs)#

Index.__setstate_cython__(self, __pyx_state)

class cuvs.neighbors.cagra.IndexParams#

IndexParams(metric=u’sqeuclidean’, *, intermediate_graph_degree=128, graph_degree=64, build_algo=u’ivf_pq’, nn_descent_niter=20, compression=None)

Parameters to build index for CAGRA nearest neighbor search

metricstring denoting the metric type, default=”sqeuclidean”
Valid values for metric: [“sqeuclidean”, “inner_product”], where
  • sqeuclidean is the euclidean distance without the square root operation, i.e.: distance(a,b) = sum_i (a_i - b_i)^2

  • inner_product distance is defined as distance(a, b) = sum_i a_i * b_i.

intermediate_graph_degree : int, default = 128

graph_degree : int, default = 64

build_algo: string denoting the graph building algorithm to use, default = “ivf_pq”
Valid values for algo: [“ivf_pq”, “nn_descent”,

“iterative_cagra_search”], where

  • ivf_pq will use the IVF-PQ algorithm for building the knn graph

  • nn_descent (experimental) will use the NN-Descent algorithm for building the knn graph. It is expected to be generally faster than ivf_pq.

  • iterative_cagra_search will iteratively build the knn graph using CAGRA’s search() and optimize()

compression: CompressionParams, optional

If compression is desired should be a CompressionParams object. If None compression will be disabled.

static __reduce__(*args, **kwargs)#

IndexParams.__reduce_cython__(self)

static __setstate__(*args, **kwargs)#

IndexParams.__setstate_cython__(self, __pyx_state)

class cuvs.neighbors.cagra.SearchParams#

SearchParams(max_queries=0, *, itopk_size=64, max_iterations=0, algo=u’auto’, team_size=0, search_width=1, min_iterations=0, thread_block_size=0, hashmap_mode=u’auto’, hashmap_min_bitlen=0, hashmap_max_fill_rate=0.5, num_random_samplings=1, rand_xor_mask=0x128394)

CAGRA search parameters

max_queries: int, default = 0

Maximum number of queries to search at the same time (batch size). Auto select when 0.

itopk_size: int, default = 64

Number of intermediate search results retained during the search. This is the main knob to adjust trade off between accuracy and search speed. Higher values improve the search accuracy.

max_iterations: int, default = 0

Upper limit of search iterations. Auto select when 0.

algo: string denoting the search algorithm to use, default = “auto”
Valid values for algo: [“auto”, “single_cta”, “multi_cta”], where
  • auto will automatically select the best value based on query size

  • single_cta is better when query contains larger number of vectors (e.g >10)

  • multi_cta is better when query contains only a few vectors

team_size: int, default = 0

Number of threads used to calculate a single distance. 4, 8, 16, or 32.

search_width: int, default = 1

Number of graph nodes to select as the starting point for the search in each iteration.

min_iterations: int, default = 0

Lower limit of search iterations.

thread_block_size: int, default = 0

Thread block size. 0, 64, 128, 256, 512, 1024. Auto selection when 0.

hashmap_mode: string denoting the type of hash map to use.

It’s usually better to allow the algorithm to select this value, default = “auto”. Valid values for hashmap_mode: [“auto”, “small”, “hash”], where

  • auto will automatically select the best value based on algo

  • small will use the small shared memory hash table with resetting.

  • hash will use a single hash table in global memory.

hashmap_min_bitlen: int, default = 0

Upper limit of hashmap fill rate. More than 0.1, less than 0.9.

hashmap_max_fill_rate: float, default = 0.5

Upper limit of hashmap fill rate. More than 0.1, less than 0.9.

num_random_samplings: int, default = 1

Number of iterations of initial random seed node selection. 1 or more.

rand_xor_mask: int, default = 0x128394

Bit mask used for initial random seed node selection.

static __reduce__(*args, **kwargs)#

SearchParams.__reduce_cython__(self)

static __repr__(*args, **kwargs)#
static __setstate__(*args, **kwargs)#

SearchParams.__setstate_cython__(self, __pyx_state)

cuvs.neighbors.cagra.build(*args, resources=None, **kwargs)#

build(IndexParams index_params, dataset, resources=None)

Build the CAGRA index from the dataset for efficient search.

The build performs two different steps- first an intermediate knn-graph is constructed, then it’s optimized it to create the final graph. The index_params object controls the node degree of these graphs.

It is required that both the dataset and the optimized graph fit the GPU memory.

The following distance metrics are supported:
  • L2

  • InnerProduct

index_params : IndexParams object dataset : CUDA array interface compliant matrix shape (n_samples, dim)

Supported dtype [float, int8, uint8]

resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

index: cuvs.cagra.Index

>>> import cupy as cp
>>> from cuvs.neighbors import cagra
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> k = 10
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> build_params = cagra.IndexParams(metric="sqeuclidean")
>>> index = cagra.build(build_params, dataset)
>>> distances, neighbors = cagra.search(cagra.SearchParams(),
...                                      index, dataset,
...                                      k)
>>> distances = cp.asarray(distances)
>>> neighbors = cp.asarray(neighbors)
cuvs.neighbors.cagra.load(*args, resources=None, **kwargs)#

load(filename, resources=None)

Loads index from file.

Saving / loading the index is experimental. The serialization format is subject to change, therefore loading an index saved with a previous version of cuvs is not guaranteed to work.

filenamestring

Name of the file.

resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

index : Index

cuvs.neighbors.cagra.save(*args, resources=None, **kwargs)#

save(filename, Index index, bool include_dataset=True, resources=None)

Saves the index to a file.

Saving / loading the index is experimental. The serialization format is subject to change.

filenamestring

Name of the file.

indexIndex

Trained CAGRA index.

include_datasetbool

Whether or not to write out the dataset along with the index. Including the dataset in the serialized index will use extra disk space, and might not be desired if you already have a copy of the dataset on disk. If this option is set to false, you will have to call index.update_dataset(dataset) after loading the index.

resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

>>> import cupy as cp
>>> from cuvs.neighbors import cagra
>>> n_samples = 50000
>>> n_features = 50
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = cagra.build(cagra.IndexParams(), dataset)
>>> # Serialize and deserialize the cagra index built
>>> cagra.save("my_index.bin", index)
>>> index_loaded = cagra.load("my_index.bin")
cuvs.neighbors.cagra.search(*args, resources=None, **kwargs)#

search(SearchParams search_params, Index index, queries, k, neighbors=None, distances=None, resources=None, filter=None)

Find the k nearest neighbors for each query.

search_params : SearchParams index : Index

Trained CAGRA index.

queriesCUDA array interface compliant matrix shape (n_samples, dim)

Supported dtype [float, int8, uint8]

kint

The number of neighbors.

neighborsOptional CUDA array interface compliant matrix shape

(n_queries, k), dtype int64_t. If supplied, neighbor indices will be written here in-place. (default None)

distancesOptional CUDA array interface compliant matrix shape

(n_queries, k) If supplied, the distances to the neighbors will be written here in-place. (default None)

filter: Optional cuvs.neighbors.cuvsFilter can be used to filter

neighbors based on a given bitset.

(default None)

resourcesOptional cuVS Resource handle for reusing CUDA resources.

If Resources aren’t supplied, CUDA resources will be allocated inside this function and synchronized before the function exits. If resources are supplied, you will need to explicitly synchronize yourself by calling resources.sync() before accessing the output.

>>> import cupy as cp
>>> from cuvs.neighbors import cagra
>>> n_samples = 50000
>>> n_features = 50
>>> n_queries = 1000
>>> dataset = cp.random.random_sample((n_samples, n_features),
...                                   dtype=cp.float32)
>>> # Build index
>>> index = cagra.build(cagra.IndexParams(), dataset)
>>> # Search using the built index
>>> queries = cp.random.random_sample((n_queries, n_features),
...                                   dtype=cp.float32)
>>> k = 10
>>> search_params = cagra.SearchParams(
...     max_queries=100,
...     itopk_size=64
... )
>>> # Using a pooling allocator reduces overhead of temporary array
>>> # creation during search. This is useful if multiple searches
>>> # are performed with same query size.
>>> distances, neighbors = cagra.search(search_params, index, queries,
...                                     k)
>>> neighbors = cp.asarray(neighbors)
>>> distances = cp.asarray(distances)