Seismic: a Breakthrough Algorithm for Faster and More Interpretable AI Search

Researchers at the Italian National Research Council (CNR) have designed a new algorithm that redefines how artificial intelligence retrieves information. Called Seismic — short for Spilled Clustering of Inverted Lists with Summaries for Maximum Inner Product Search — this innovative approach dramatically increases the efficiency and accuracy of search systems based on large language models. Tested on massive text datasets, Seismic achieved performance up to 21 times faster than current state-of-the-art methods.

From Dense to Sparse Representations

Most AI-driven search engines use what are known as dense embeddings, numerical vectors that encode the meaning of words or entire texts. While effective, dense representations are computationally expensive and difficult to interpret. A promising alternative has recently emerged under the name Learned Sparse Retrieval (LSR). Instead of encoding text as a continuous vector, LSR represents it as a learned bag of words: a high-dimensional vector where each dimension corresponds to a specific term in the model’s vocabulary. This makes the representation more transparent and computationally lighter.

Sparse models such as LSR have shown strong performance, often comparable to dense encoders, and they generalize better when tested on data different from what they were trained on. Yet, integrating them efficiently into standard indexing systems remains a challenge. Traditional inverted indexes — the core data structure of most search engines — were not designed to handle the statistical properties of these learned sparse vectors.

An Innovative Data Structure: How Seismic Works

To overcome this limitation, CNR researchers designed Seismic, a novel algorithm that reorganizes how information is indexed and retrieved. The key idea is to combine the interpretability of sparse models with the efficiency of approximate search methods, typical of the Approximate Nearest Neighbor (ANN) family.

Seismic restructures the standard inverted index into geometrically cohesive blocks of documents. Each block is associated with a concise summary vector — a statistical sketch summarizing the content of the documents inside it. During a search, Seismic can use these summaries to skip over large portions of the index that are clearly irrelevant, reducing computational time without sacrificing accuracy.

The algorithm also introduces multiple levels of optimization:

  • Static pruning, which keeps only the most relevant entries for each term;
  • Dynamic pruning, which decides on the fly which blocks to explore based on their estimated relevance
  • Quantization and clustering, which compress and group data for faster access.

These mechanisms together allow Seismic to process queries efficiently, evaluating only the parts of the data that truly matter.

Record-Breaking Results

In tests on widely used benchmarks — MSMARCO-v1, MSMARCO-v2, and Natural Questions (NQ) — Seismic consistently outperformed leading retrieval systems, including the winners of the 2023 BigANN Challenge at NeurIPS. Depending on the dataset and embedding model used (SPLADE or ESPLADE), Seismic was up to 21 times faster than competing approaches, while maintaining the same or higher accuracy.

When applied to large collections, such as MSMARCO-v2 with over 138 million embeddings, Seismic continued to scale effectively, achieving a 6.8× speedup at high accuracy levels. The algorithm’s robustness makes it suitable for real-world search scenarios involving billions of documents.

Enhancing Search with Graph Intelligence

Seismic can also be extended with a k-nearest neighbor (k-NN) graph, which connects each document to its most similar neighbors. After an initial retrieval phase, the algorithm uses these connections to refine the ranking of results. This hybrid design combines the speed of Seismic with the relational power of graph structures, improving accuracy with minimal computational overhead. In experiments, adding the k-NN graph yielded up to 2× faster query times while preserving precision.

A Step Forward for Intelligent Information Access

By uniting interpretability, scalability, and speed, Seismic represents a major advance in the field of AI-based information retrieval. Its design offers an efficient alternative to dense encoders, paving the way for search engines capable of handling massive datasets — from web-scale archives to scientific and biomedical literature — while keeping computations affordable and understandable. In an era of ever-expanding digital content, Seismic shows how intelligent indexing and smart mathematical design can make AI-powered search both faster and more transparent.

Article originally posted on Medium.

Send us a message

Get our latest news

Subscribe
to our newsletter.