Using Machine Learning to Improve Search – Part 1 of 5

Large Language Models and Generative AI

Machine learning (ML) is a powerful tool to optimize search engines and natural language processing (NLP) capabilities. Using ML can result in more accurate and contextually relevant search results. ML algorithms can analyze vast amounts of data, including user queries, search patterns, and content, to improve search rankings and understand user intent. Another thing you can achieve with ML is searching images with text or even extracting information from images to enrich data.

I recently started exploring the fascinating area of combining ML with search and decided to write blog posts to explain the possibilities.This first blog post is part of a series: Using Machine Learning to improve search. It consists of five parts. In this first part, I present how to leverage large language models and Generative AI to improve search.

When integrated with search systems, NLP techniques improve the search experience by understanding and responding to user queries in a more sophisticated and context-aware manner. Large language models, such as GPT-4 from OpenAI, represent a significant breakthrough in the field of NLP. These models are designed to understand and generate human-like text by learning patterns and structures from vast amounts of training data.

Short Introduction to LLMs

Large language models are advanced artificial intelligence systems designed to process and generate human-like text. They are built using deep learning techniques and consist of millions or even billions of parameters. These models are trained on massive amounts of text data to learn language patterns, grammar, and context. Once trained, they can perform a wide range of language-related tasks. This includes text completion, summarization, translation, and question-answering. LLMs have the ability to generate coherent and contextually relevant responses, mimicking human language with impressive fluency. Some of these cool things can also be applied to improve the search experience.

If you want to know more about LLMs, a lot has been written about it lately. I found this website helpful. It contains a lot of information about machine learning, and the particular page linked explains large language models.

Ways to Use LLMs to Improve Search


After obtaining the initial list of documents, we can further improve the search results using a technique called reranking. Reranking involves taking the initial list of documents, generally produced by a more straightforward retrieval model, and applying a more sophisticated model to reorder the documents based on relevance. In this context, the LLM can be used to understand the semantic and contextual relevance of each document to the query and reorder the documents accordingly.

I recently came across a research paper where the OpenAI GPT-4 model was benchmarked against other reranking models and it seemed to have outperformed them in most areas. This means LLMs have real potential for reranking. If you’re interested in this research, the paper can be found here. There are downsides to using OpenAI or similar APIs depending on the use case, like latency and costs. More on this I explain in the caveats section below.

This is a visual representation of how reranking can be implemented.


Match Explanation

A search engine works by examining the terms you entered and then comparing these to its index. The engine uses complex algorithms to determine the relevance of each indexed document to your search terms. Sometimes it is unclear to the user why a specific result matched the query, for example, if the result doesn’t contain any search terms. Understanding why a particular search result was ranked highly or presented to the user is crucial for building trust and providing transparent search experiences.

When given a search query and a search result, a LLM can analyze the text of the query and the result, drawing on its extensive training data to identify the likely relevant features. Based on my experience so far, LLM can point out that the result contains many instances of the search terms, that the terms appear in important places like the title or first paragraph, or that the result’s content is closely related to topics associated with the search terms.

Important here is how you phrase the prompt (prompt engineering). The way you frame a prompt can guide the model’s responses in terms of length, detail, tone, context, or subject matter. A well-crafted prompt can help the model provide more useful and relevant responses. Because the user’s screen is limited and you want to avoid long texts on your result page, you should instruct the model in such a matter that it returns a short and clear explanation.

Another option is to generate a list of the most semantically similar sentences in the document to the query and show these to the user.

I’ve created some sample code for you to check out and to easily try out match explanation.

Relevancy Judgement Assistance

A topic within the search community that is currently being discussed and experimented with is using LLMs to assist with relevancy judgments. Relevancy judgments are used in the field of information retrieval and search engines to evaluate and improve search results. This involves assessing the relevance of the results returned by a search engine in response to a particular query.

Usually, a certain scale is used to rate the relevance. For instance, a binary scale (relevant, not relevant) or a multilevel scale (highly relevant, somewhat relevant, not relevant) might be employed. In many cases, these judgments are done by human reviewers who manually evaluate the relevance of each search result to the original query. This can be a time-consuming and costly process, but it is often necessary for optimizing the accuracy of search results.

This research paper explains automating this process with the use of LLMs. While completely automating this is not ready yet, LLMs can already be used to assist human reviewers with their judgments. Human reviewers struggle to see a pertinent connection when they are lacking world knowledge. LLMs can generate rationales that can explain such connections, similar to the match explanation above.

Content Enrichment

This might not seem search related, but the opposite is true. Quality content is at the heart of every good search experience. This is why I also wanted to mention this part.

Generative AI models have the capability to generate high-quality content that can significantly enhance data enrichment. For instance, these models can generate pertinent summaries, comprehensive product descriptions, or contextual details for search queries. This offers considerable value when managing product fact sheets for example. The created summaries and descriptions can subsequently be employed in search functions.

Moreover, generative AI models can offer assistance by expanding on existing content and synthesizing additional paragraphs, examples, or detailed explanations pertinent to a specific topic. This helps users understand better by providing them with lots of information and context, which expands their knowledge of the topic. Importantly, this wealth of information can also be incorporated into search operations.

As with match explanation, prompt engineering is important here too. In my experience, it’s even more important in this part because your content should match the tone of voice of the company and should fit in the data structure (e.g. not too short or too long).


One of the key challenges in search is understanding the context and intent behind a user’s query. Vector search is a technique used to find similar items based on their vector representations in a high-dimensional space. In the context of NLP, embeddings are vector representations of words, phrases, or documents. These embeddings capture semantic and contextual information. This allows similar items to be represented, as vectors are closer together in the embedding space.

Embeddings are typically created by ML models which are trained on datasets of specific domains. LLMs, on the other hand, are trained on large text corpora, enabling them to develop a robust understanding of language and context. This means they are often better at creating more meaningful and contextually aware embeddings than traditional models, resulting in more accurate search results.

That being said, it’s important to note that the effectiveness of using LLMs for creating search embeddings depends on the specific use case and dataset. In some cases, simpler or more traditional models might perform just as well or even better. Especially when computational resources or data are limited or your data is niche.

An easy way to combine LLMs with vector search is to use the LangChain framework. My colleague Jettro has made a blog post about that, you can read it here.

In Part 2 of this blog series, I dive deeper into vector search and embeddings.


This all sounds very cool, but I feel it is important to also mention some things that should be taken into account before applying any of the above.

  1. Overfitting and irrelevant information: While the LLMs are designed to respond based on patterns it has observed during their training, they can occasionally generate outputs that include irrelevant or inaccurate information due to over-generalizing from the data it was trained on.
  2. Not up to date: LLMs were not designed for real-time learning or updating their knowledge. They are trained on a static dataset and do not have the ability to learn new information after training. This means they might not have information on recent events or developments.
  3. Data privacy: There could be potential privacy concerns if a search engine built on an LLM is not designed with strong data privacy protections. Users would need to be assured that their queries are handled confidentially, and that the system isn’t retaining or learning from their personal data.
  4. Niche data: LLMs can struggle with highly specialized terminology or context, as their understanding is based on patterns they’ve observed in their training data. If a niche topic has unique contexts that were not adequately represented in the training data, the model might not respond accurately.
  5. Latency: Larger models require more computational power to process and generate responses. This can lead to longer response times, especially if the model is not optimized or if hardware resources are limited. The length of the generated responses and the number of requests the model has to handle concurrently can impact latency too. Longer responses take more time to generate and transmit. If a model serves a high volume of queries simultaneously, response times may increase.
  6. Cost: If you’re using a closed-source model, you probably have to pay for each request sent to the API wrapped around the model. Larger and more comprehensive models tend to be more expensive. Experiment with different models to see which one aligns best with your requirements, and whether it justifies the associated expenses.

Final Thoughts

The examples above are just a few of the capabilities of LLMs. Development in this area is progressing rapidly. Each day, these tools become more efficient, accurate, and easier to implement, signaling a transformative shift in search mechanisms.

However, there are caveats, but I firmly believe with rigorous experimentation and refinement, we can navigate these hurdles. It’s essential to know that LLMs are not futuristic constructs. They are here, now, and accessible for everyone to use. Now is the time to start experimenting!

Want to know more about what we do?

We are your dedicated partner. Reach out to us.