Can Smaller Language Models Compete with ChatGPT? A New Study Has the Answer

June 30, 2025

A recent study conducted by an international team led by the Institute of Information Science and Technologies “A. Faedo” of the Cnr (Cnr-Isti) in Italy, compared ChatGPT (gpt-3.5-turbo version) with a series of smaller large language models (LLMs), exploring their strengths and weaknesses in the field of conversational search.

What is “conversational search”?

Conversational search is an innovative human-computer interaction mode: instead of entering simple keywords into a search engine, users engage in dialogue with a chatbot, asking questions in natural language. A crucial challenge in this context is query rewriting — the system’s ability to rephrase user questions to make them complete and self-explanatory, removing ambiguities or implicit references to previous questions.

Study Results

The researchers first tested ChatGPT using various approaches to “guide” the model with precise instructions (prompting) — from the zero-shot method (without examples) to the few-shot method (with a few examples). Although ChatGPT demonstrated excellent capabilities in query rewriting, the team wondered if smaller, fine-tuned models could offer comparable or even superior results.

The answer? Yes.

Open-source models like Llama-2 (13 billion parameters) and Flan-T5 (780 million parameters) were specifically trained for conversational query rewriting. The results show that Llama-2–13B outperformed ChatGPT in several tests, with a performance increase of up to 10.58% in the NDCG@3 metric — a key indicator of information retrieval effectiveness.

Efficiency and Computational Costs

Another relevant aspect concerns efficiency: while ChatGPT requires considerable computational resources, the smaller open-source models proved to be faster and more cost-effective. For example, the Flan-T5 model reduced the average query rewriting time by more than 6 times compared to larger models, paving the way for more sustainable and accessible conversational systems.

Why This Research Matters

The study highlights that relying on gigantic proprietary models like ChatGPT to achieve good results is not always necessary. Well-trained open-source models can ensure high performance while promoting greater transparency, customization, and sustainability. This approach could be crucial for organizations looking to integrate AI into their platforms while maintaining control over their data.

Ultimately, the research marks a step toward more democratic and adaptable artificial intelligence, showing that “bigger” does not necessarily mean “better.”

Article originally posted on Medium.