Congratulations to our partner Cnr – Isti for winning the Best Paper Runner-up award at the ACM SIGIR 2024 conference with the paper “Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations” by Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, and Rossano Venturini. ACM SIGIR is the premier international forum for presenting new research results and demonstrating new systems and techniques in information retrieval. The 47th edition of SIGIR has been held from July 14-18, 2024, in Washington D.C., USA.


The paper proposes a novel organization of the classic inverted index data structure that enables fast yet effective approximate retrieval over learned sparse embeddings, an attractive class of contextual embeddings for text retrieval. The proposed approach organizes inverted lists into geometrically cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, the technique is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms state-of-the-art graph-based techniques.
This technique has an impact on EFRA as the project heavily employs Large Language Models (LLMs) to enable efficient AI analytics on textual data via learned representations. With this published result, we are able to significantly speed up the retrieval over these representations by up to two orders of magnitude with respect to previous state-of-the-art techniques.