Can Sparse Neural Search Scale to 138 Million Documents?

Neural search systems are becoming the standard in information retrieval. Among them, sparse neural embeddings, such as SPLADE, have gained attention because they combine strong performance with interpretability and compatibility with traditional search infrastructures.

But an important question remained open: Do these systems still work efficiently when the dataset becomes truly massive?

A recent large-scale study investigates this by testing modern sparse retrieval methods on MsMarco v2, a collection of 138 million passages — about 15 times larger than the benchmark commonly used in research.

Why Sparse Neural Retrieval?

Sparse neural models represent a text as a high-dimensional vector, with most values set to zero. Each dimension corresponds to a term in the vocabulary, which makes the representation: 

  • Effective, comparable to dense neural models
  • Interpretable, since dimensions map to words
  • Compatible with inverted indexes, the core technology behind traditional search engines

However, these models behave differently from classical keyword search, and naïve implementations can be too slow. This has led researchers to design approximate retrieval algorithms that speed up search while keeping high accuracy.

What Was Compared?

The study compares two main approaches:

1. Graph-Based Methods
These methods organize documents in a network structure. During search, the system navigates the graph to find the most relevant documents quickly.

Graph methods are widely used in dense retrieval and are known for strong performance — but they can be expensive to build at large scale.

2. Seismic
Seismic is designed specifically for sparse neural embeddings. Instead of relying primarily on a graph, it organises the index so the system can skip large groups of irrelevant documents during search.

An enhanced version also uses a small neighbour graph to improve accuracy when needed.

What Happens at 138 Million Documents?

The results are encouraging.

Query Speed
Seismic is consistently faster than the graph-based method at the same accuracy levels.

At very high accuracy (close to exact retrieval), Seismic can answer queries in just a few milliseconds, while the graph method can be several times slower.

Index Construction Time
Here the difference is even clearer:

  • The graph-based index takes many hours to build.
  • Seismic (without the extra refinement graph) builds in about 30 minutes.
  • Adding the refinement graph increases build time significantly.

This highlights a practical trade-off: faster setup versus maximum possible accuracy.

Scalability
MsMarco v2 is about 15 times larger than the original MsMarco dataset.

Yet, search time does not increase 15-fold.
Both approaches scale much better than linear growth would suggest.

In other words, approximate sparse retrieval remains efficient even at very large scale.

Why This Matters

For years, sparse neural retrieval has been seen as promising but potentially difficult to scale in production systems.

This study shows that:

  • It can handle hundreds of millions of documents.
  • It maintains low latency.
  • It remains competitive — or superior — to graph-based methods.
  • It can be built much faster than large graph indexes.

For applications such as scientific search, legal search, biomedical information systems, or large document archives — where interpretability and lexical grounding matter — this is particularly relevant.

Looking Ahead

Two open challenges remain:

1. What happens when the index no longer fits entirely in memory?

2. Can these systems work efficiently on machines with limited resources?

Answering these questions will be key for broader real-world deployment.

In summary, sparse neural retrieval is no longer just an academic curiosity. With the right engineering choices, it is a realistic and scalable solution for truly large collections.

Article originally posted on Medium.

Send us a message

Get our latest news

Subscribe
to our newsletter.