Table of Contents
Features
Installation
Quick Start
Examples
Brute-Force Index
HNSW Index
Thread-Safe Index
Benchmark Results
API Reference
Development & CI
GPU Acceleration
Documentation
Contributing
License
Annie Examples
Interactive Examples:
You can now run selected code blocks directly in your browser! Click the Try it button above a code block to execute it. Use sliders to adjust parameters like vector dimension or dataset size. Powered by Pyodide (Python in the browser). Learn more .
Basic Usage
Dimension:
128
Dataset size:
1000
import numpy as np
from rust_annie import AnnIndex , Distance
dim = {{ dim | 128 }}
size = {{ size | 1000 }}
# Create index
index = AnnIndex ( dim , Distance . EUCLIDEAN )
# Generate and add data
data = np . random . rand ( size , dim ) . astype ( np . float32 )
ids = np . arange ( size , dtype = np . int64 )
index . add ( data , ids )
# Single query
query = np . random . rand ( dim ) . astype ( np . float32 )
neighbor_ids , distances = index . search ( query , k = 5 )
print ( neighbor_ids , distances )
# Batch queries
queries = np . random . rand ( 10 , dim ) . astype ( np . float32 )
batch_ids , batch_dists = index . search_batch ( queries , k = 3 )
print ( batch_ids . shape , batch_dists . shape )
Filtered Search
# Create index with sample data
index = AnnIndex ( 3 , Distance . EUCLIDEAN )
data = np . array ([
[ 1.0 , 2.0 , 3.0 ],
[ 4.0 , 5.0 , 6.0 ],
[ 7.0 , 8.0 , 9.0 ]
## Filtered Search
< div class = "interactive-block" data - interactive >
``` python
import numpy as np
from rust_annie import AnnIndex , Distance
# Create index with sample data
index = AnnIndex ( 3 , Distance . EUCLIDEAN )
data = np . array ([
[ 1.0 , 2.0 , 3.0 ],
[ 4.0 , 5.0 , 6.0 ],
[ 7.0 , 8.0 , 9.0 ]
], dtype = np . float32 )
ids = np . array ([ 10 , 20 , 30 ], dtype = np . int64 )
index . add ( data , ids )
# Define filter function
def even_ids ( id : int ) -> bool :
return id % 2 == 0
# Filtered search
query = np . array ([ 1.0 , 2.0 , 3.0 ], dtype = np . float32 )
filtered_ids , filtered_dists = index . search_filter_py ( query , k = 3 , filter_fn = even_ids )
print ( filtered_ids )
], dtype=np.float32)
ids = np.array([10, 20, 30], dtype=np.int64)
index.add(data, ids)
Define filter function
def even_ids(id: int) -> bool:
return id % 2 == 0
Filtered search
query = np.array([1.0, 2.0, 3.0], dtype=np.float32)
filtered_ids, filtered_dists = index.search_filter_py(query, k=3, filter_fn=even_ids)
Only IDs 10 and 30 will be returned (20 is odd)
## HNSW Index
```python
## HNSW Index
<div class="interactive-block" data-interactive>
<div class="interactive-controls">
<label>Dimension: <input type="range" min="8" max="256" value="128" class="slider" data-var="dim" /></label>
<span class="slider-value" data-var="dim">128</span>
<label>Dataset size: <input type="range" min="1000" max="200000" value="100000" class="slider" data-var="size" /></label>
<span class="slider-value" data-var="size">100000</span>
</div>
```python
import numpy as np
from rust_annie import PyHnswIndex
dim = {{dim|128}}
size = {{size|100000}}
# Create HNSW index
index = PyHnswIndex(dims=dim)
# Add large dataset
data = np.random.rand(size, dim).astype(np.float32)
ids = np.arange(size, dtype=np.int64)
index.add(data, ids)
# Fast approximate search
query = np.random.rand(dim).astype(np.float32)
neighbor_ids, _ = index.search(query, k=10)
print(neighbor_ids)
from rust_annie import PyHnswIndex
Create HNSW index
index = PyHnswIndex(dims=128)
Add large dataset
data = np.random.rand(100000, 128).astype(np.float32)
ids = np.arange(100000, dtype=np.int64)
index.add(data, ids)
Fast approximate search
query = np.random.rand(128).astype(np.float32)
neighbor_ids, _ = index.search(query, k=10)
## Saving and Loading
```python
# Create and save index
index = AnnIndex(64, Distance.COSINE)
data = np.random.rand(500, 64).astype(np.float32)
ids = np.arange(500, dtype=np.int64)
index.add(data, ids)
index.save("my_index")
# Load index
loaded_index = AnnIndex.load("my_index")
Thread-safe Operations
from rust_annie import ThreadSafeAnnIndex , Distance
from concurrent.futures import ThreadPoolExecutor
index = ThreadSafeAnnIndex ( 256 , Distance . MANHATTAN )
# Concurrent writes
with ThreadPoolExecutor () as executor :
for i in range ( 10 ):
data = np . random . rand ( 100 , 256 ) . astype ( np . float32 )
ids = np . arange ( i * 100 , ( i + 1 ) * 100 , dtype = np . int64 )
executor . submit ( index . add , data , ids )
# Concurrent reads
with ThreadPoolExecutor () as executor :
futures = []
for _ in range ( 100 ):
query = np . random . rand ( 256 ) . astype ( np . float32 )
futures . append ( executor . submit ( index . search , query , k = 3 ))
results = [ f . result () for f in futures ]
Minkowski Distance
# Create index with custom distance
index = AnnIndex . new_minkowski ( dim = 64 , p = 2.5 )
data = np . random . rand ( 200 , 64 ) . astype ( np . float32 )
ids = np . arange ( 200 , dtype = np . int64 )
index . add ( data , ids )
# Search with Minkowski distance
query = np . random . rand ( 64 ) . astype ( np . float32 )
ids , dists = index . search ( query , k = 5 )
README
A lightning-fast, Rust-powered Approximate Nearest Neighbor library for Python with multiple backends, thread-safety, and GPU acceleration.
Table of Contents
Features
Installation
Quick Start
Examples
Brute-Force Index
HNSW Index
Thread-Safe Index
Benchmark Results
API Reference
Development & CI
GPU Acceleration
Documentation
Contributing
License
Features
Multiple Backends :
Brute-force (exact) with SIMD acceleration
HNSW (approximate) for large-scale datasets
Multiple Distance Metrics : Euclidean, Cosine, Manhattan, Chebyshev
Batch Queries for efficient processing
Thread-safe indexes with concurrent access
Zero-copy NumPy integration
On-disk Persistence with serialization
Filtered Search with custom Python callbacks
GPU Acceleration for brute-force calculations
Multi-platform support (Linux, Windows, macOS)
Automated CI with performance tracking
Installation
# Stable release from PyPI:
pip install rust-annie
# Install with GPU support (requires CUDA):
pip install rust-annie[ gpu]
# Or install from source:
git clone https://github.com/Programmers-Paradise/Annie.git
cd Annie
pip install maturin
maturin develop --release
Quick Start
Brute-Force Index
import numpy as np
from rust_annie import AnnIndex , Distance
# Create index
index = AnnIndex ( 128 , Distance . EUCLIDEAN )
# Add data
data = np . random . rand ( 1000 , 128 ) . astype ( np . float32 )
ids = np . arange ( 1000 , dtype = np . int64 )
index . add ( data , ids )
# Search
query = np . random . rand ( 128 ) . astype ( np . float32 )
neighbor_ids , distances = index . search ( query , k = 5 )
HNSW Index
from rust_annie import PyHnswIndex
index = PyHnswIndex ( dims = 128 )
data = np . random . rand ( 10000 , 128 ) . astype ( np . float32 )
ids = np . arange ( 10000 , dtype = np . int64 )
index . add ( data , ids )
# Search
query = np . random . rand ( 128 ) . astype ( np . float32 )
neighbor_ids , _ = index . search ( query , k = 10 )
Examples
Brute-Force Index
from rust_annie import AnnIndex , Distance
import numpy as np
# Create index
idx = AnnIndex ( 4 , Distance . COSINE )
# Add data
data = np . random . rand ( 50 , 4 ) . astype ( np . float32 )
ids = np . arange ( 50 , dtype = np . int64 )
idx . add ( data , ids )
# Search
labels , dists = idx . search ( data [ 10 ], k = 3 )
print ( labels , dists )
Batch Query
from rust_annie import AnnIndex , Distance
import numpy as np
# Create index
idx = AnnIndex ( 16 , Distance . EUCLIDEAN )
# Add data
data = np . random . rand ( 1000 , 16 ) . astype ( np . float32 )
ids = np . arange ( 1000 , dtype = np . int64 )
idx . add ( data , ids )
# Batch search
queries = data [: 32 ]
labels_batch , dists_batch = idx . search_batch ( queries , k = 10 )
print ( labels_batch . shape ) # (32, 10)
Thread-Safe Index
from rust_annie import ThreadSafeAnnIndex , Distance
import numpy as np
from concurrent.futures import ThreadPoolExecutor
# Create thread-safe index
idx = ThreadSafeAnnIndex ( 32 , Distance . EUCLIDEAN )
# Add data
data = np . random . rand ( 500 , 32 ) . astype ( np . float32 )
ids = np . arange ( 500 , dtype = np . int64 )
idx . add ( data , ids )
# Concurrent searches
def task ( q ):
return idx . search ( q , k = 5 )
queries = np . random . rand ( 100 , 32 ) . astype ( np . float32 )
with ThreadPoolExecutor ( max_workers = 8 ) as executor :
futures = [ executor . submit ( task , q ) for q in queries ]
for f in futures :
print ( f . result ())
Filtered Search
from rust_annie import AnnIndex , Distance
import numpy as np
# Create index
index = AnnIndex ( 3 , Distance . EUCLIDEAN )
data = np . array ([
[ 1.0 , 2.0 , 3.0 ],
[ 4.0 , 5.0 , 6.0 ],
[ 7.0 , 8.0 , 9.0 ]
], dtype = np . float32 )
ids = np . array ([ 10 , 20 , 30 ], dtype = np . int64 )
index . add ( data , ids )
# Filter function
def even_ids ( id : int ) -> bool :
return id % 2 == 0
# Filtered search
query = np . array ([ 1.0 , 2.0 , 3.0 ], dtype = np . float32 )
filtered_ids , filtered_dists = index . search_filter_py (
query ,
k = 3 ,
filter_fn = even_ids
)
print ( filtered_ids ) # [10, 30] (20 is filtered out)
Build and Query a Brute-Force AnnIndex in Python (Complete Example)
This section demonstrates a complete, beginner-friendly example of how to build and query a brute-force AnnIndex
using Python.
Measured on a 6-core CPU:
That’s a \~4× speedup vs. NumPy!
Operation
Dataset Size
Time (ms)
Speedup vs Python
Single Query (Brute)
10,000 × 64
0.7
4×
Batch Query (64)
10,000 × 64
0.23
12×
HNSW Query
100,000 × 128
0.05
56×
You’ll find:
API Reference
AnnIndex
Create a brute-force k-NN index.
Enum: Distance.EUCLIDEAN
, Distance.COSINE
, Distance.MANHATTAN
ThreadSafeAnnIndex
Same API as AnnIndex
, safe for concurrent use.
Core Classes
Class
Description
AnnIndex
Brute-force exact search
PyHnswIndex
Approximate HNSW index
ThreadSafeAnnIndex
Thread-safe wrapper for AnnIndex
Distance
Distance metrics (Euclidean, Cosine, etc)
Key Methods
Method
Description
add(data, ids)
Add vectors to index
search(query, k)
Single query search
search_batch(queries, k)
Batch query search
search_filter_py(query, k, filter_fn)
Filtered search
save(path)
Save index to disk
load(path)
Load index from disk
Development & CI
CI runs on GitHub Actions, building wheels on Linux, Windows, macOS, plus:
benchmark.py
& batch_benchmark.py
& compare_results.py
# Run tests
cargo test
pytest tests/
# Run benchmarks
python scripts/benchmark.py
python scripts/batch_benchmark.py
# Generate documentation
mkdocs build
CI pipeline includes:
- Cross-platform builds (Linux, Windows, macOS)
- Unit tests and integration tests
- Performance benchmarking
- Documentation generation
Benchmark Automation
Benchmarks are tracked over time using:
GPU Acceleration
Enable GPU in Rust
Enable CUDA support for brute-force calculations:
# Install with GPU support
pip install rust-annie[ gpu]
# Or build from source with GPU features
maturin develop --release --features gpu
Supported operations:
- Batch L2 distance calculations
- High-dimensional similarity search
Requirements:
- NVIDIA GPU with CUDA support
- CUDA Toolkit installed
Contributing
Contributions are welcome! Please:
See the main CONTRIBUTING guide for details.
License
This project is licensed under the MIT License . See LICENSE for details.