## ANN Search Filtering

This document explains how to use the filtering capabilities to improve Approximate Nearest Neighbor (ANN) search.

### Why Filtering?

Filters allow you to narrow down search results dynamically based on:
- Metadata (e.g., tags, IDs, labels)
- Numeric thresholds (e.g., only items above/below a value)
- Custom user-defined logic

This improves both precision and flexibility of search.

#### Example: Python API

```python
from rust_annie import AnnIndex
import numpy as np

# 1. Create an index with vector dimension 128
index = AnnIndex(dimension=128)

# 2. Add data with metadata
vector0 = np.random.rand(128).astype(np.float32)
vector1 = np.random.rand(128).astype(np.float32)

index.add_item(0, vector0, metadata={"category": "A"})
index.add_item(1, vector1, metadata={"category": "B"})

# 3. Define a filter function (e.g., only include items where category == "A")
def category_filter(metadata):
    return metadata.get("category") == "A"

# 4. Perform search with the filter applied
query_vector = np.random.rand(128).astype(np.float32)
results = index.search(query_vector, k=5, filter=category_filter)

print("Filtered search results:", results)

Supported Filters

This library supports applying filters to narrow down ANN search results dynamically.

Filter type Example
Equals Filter.equals("category", "A")
Greater than Filter.gt("score", 0.8)
Less than Filter.lt("price", 100)
Custom predicate Filter.custom(lambda metadata: ...)

Filters work on the metadata you provide when adding items to the index.

New Feature: Filtered Search with Custom Python Callbacks

The library now supports filtered search using custom Python callbacks, allowing for more complex filtering logic directly in Python.

Example: Filtered Search with Python Callback

from rust_annie import AnnIndex, Distance
import numpy as np

# Create index
index = AnnIndex(3, Distance.EUCLIDEAN)
data = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0]
], dtype=np.float32)
ids = np.array([10, 20, 30], dtype=np.int64)
index.add(data, ids)

# Filter function
def even_ids(id: int) -> bool:
    return id % 2 == 0

# Filtered search
query = np.array([1.0, 2.0, 3.0], dtype=np.float32)
filtered_ids, filtered_dists = index.search_filter_py(
    query, 
    k=3, 
    filter_fn=even_ids
)
print(filtered_ids)  # [10, 30] (20 is filtered out)

Sorting Behavior

The BruteForceIndex now uses total_cmp for sorting, which provides NaN-resistant sorting behavior. This change ensures that any NaN values in the data are handled consistently, preventing potential issues with partial comparisons.

Benchmarking Indices

The library now includes a benchmarking function to evaluate the performance of different index types, specifically PyHnswIndex and AnnIndex. This function measures the average, maximum, and minimum query times, providing insights into the efficiency of each index type.

Example: Benchmarking Script

import numpy as np
import time
from rust_annie import PyHnswIndex, AnnIndex

def benchmark(index_cls, name, dim=128, n=10_000, q=100, k=10):
    print(f"\nBenchmarking {name} with {n} vectors (dim={dim})...")

    # Data
    data = np.random.rand(n, dim).astype(np.float32)
    ids = np.arange(n, dtype=np.int64)
    queries = np.random.rand(q, dim).astype(np.float32)

    # Index setup
    index = index_cls(dims=dim)
    index.add(data, ids)

    # Warm-up + Timing
    times = []
    for i in range(q):
        start = time.perf_counter()
        _ = index.search(queries[i], k=k)
        times.append((time.perf_counter() - start) * 1000)

    print(f"  Avg query time: {np.mean(times):.3f} ms")
    print(f"  Max query time: {np.max(times):.3f} ms")
    print(f"  Min query time: {np.min(times):.3f} ms")

if __name__ == "__main__":
    benchmark(PyHnswIndex, "HNSW")
    benchmark(AnnIndex, "Brute-Force")

Integration & Extensibility

  • Filters are exposed from Rust to Python via PyO3 bindings.
  • New filters can be added by extending src/filters.rs in the Rust code.
  • Filters integrate cleanly with the existing ANN index search logic, so adding or combining filters doesn't require changes in the core search API.

See also