In recent years, language models have learned to represent text not merely as “bags of words,” but as high‑dimensional vectors in which each coordinate reflects the importance of a term in context. These sparse representations combine the interpretability advantage (each dimension corresponds to a word) with the power of modern neural models. However, applying them at industrial scale has so far been impractical: classical inverted‑index search algorithms become too slow when vectors are distributed very differently from traditional bag‑of‑words models.
The Evolution of Sparse Information Retrieval
In recent years, language models have learned to represent text not merely as “bags of words,” but as high‑dimensional vectors in which each coordinate reflects the importance of a term in context. These sparse representations combine the interpretability advantage (each dimension corresponds to a word) with the power of modern neural models. However, applying them at industrial scale has so far been impractical: classical inverted‑index search algorithms become too slow when vectors are distributed very differently from traditional bag‑of‑words models.
The Scalability Challenge
Most prior evaluations stopped at collections of a few million documents, such as MS Marco (≈ 8.8 M passages). But what happens when you move from 10 million to over 100 million vectors? The new study examined a dataset of 138 million passages encoded with Splade, a sparse‑embedding model, posing crucial questions:
- How long does index construction take?
- How much memory does it require?
- How much slower is the retrieval phase?
- Does accuracy degrade on collections this large?
Two Families of Solutions Compared
Proximity Graphs (HNSW)
Build a graph where each document connects to its most similar neighbours.
Enable speedy searches, but graph construction over 138 M nodes is extremely time‑ and memory‑intensive.
Seismic (clustered inverted indexes)
Split each inverted list into “geometric blocks” and compute a summary vector for each block.
At query time, evaluate summaries first to “prune” entire blocks without detailed inspection, greatly accelerating retrieval.
Key Results
Index Construction
Seismic (no k‑NN graph) builds the index in just 0.5 hours.
HNSW (k‑NN graph) requires over 10 hours.
Query Latency @ 98 % Accuracy
Seismic answers queries in approximately 3 ms while maintaining 98 % accuracy.
HNSW takes about 21 ms to achieve the same accuracy level.
Slowdown vs. MS Marco (8.8 M vectors)
Both methods slow down by less than a factor of 8 when scaling from 8.8 M to 138 M vectors.
In particular, Seismic remains under an 8× slowdown, delivering millisecond‑level latencies even on very large collections.
Practical Implications
Compute Resources: Seismic without the graph allows rapid index updates, ideal for frequently changing data.
Memory/Speed Trade‑off: Adding the k‑NN graph slightly improves accuracy but increases build time and memory usage.
Real‑Time Applications: With latencies of only a few milliseconds, sparse embeddings become feasible for commercial search engines, virtual assistants, and large‑scale QA systems.
Future Directions
The authors suggest two avenues for further study:
Out‑of‑Core Indexes: When main memory is insufficient, investigate disk‑based caching and access strategies.
Low‑Resource Environments: Adapt algorithms for devices with limited CPU, RAM, or storage.
In summary, this work shows that approximate retrieval techniques can scale to hundreds of millions of sparse vectors, paving the way for fast, interpretable, and practical neural search systems at industrial scale.
Article originally posted on Medium.