Question Answering with your own data, LLMs and Java: meet Langchain4j

-

Python has become the de facto programming language when working on data related tasks. I’ve recently started exploring the world of Machine Learning (ML), Large Language Models (LLMs) and vector databases. See my previous blog post about using LLMs and Generative AI to improve search.

This is also when I started ramping up my Python programming skills, because most companies and organizations release very neat Python client libraries for their products and services. But sometimes you want or need to use Java. In this blog post, I will explain how you can easily create a question answering system with Java and Langchain4j.

At the time of writing this blog post, I am doing research and preparations for conference talks and workshops with my colleague Jettro. These talks and workshops are all about creating question answering systems with the combination of LLMs and semantic (vector) search. We use a powerful tool there called LangChain. This tool makes it very easy to connect all the pieces together. Jettro has written a blog about using that tool with their official client, which is a Python library. While the main programming language in our talks and workshops is also Python, we would also like to give the attendees that are not comfortable with Python the opportunity to work with Java. Meet Langchain4j, a Java port of LangChain. Although not yet as feature rich as the original, it already provides sufficient features to be used. I will show how easy it is to work with.

Embedding Model

First, you want to start by defining your embedding model. This model is used to covert your text into embeddings. Embeddings are mathematical representations of your text, to be able to do calculations on them. Like calculating the similarity of pieces of text. If you want to know more about this, see this blogpost of Jettro, or watch out for the upcoming part 2 in my “Using Machine Learning to Improve Search” blog series.

Langchain4j supports multiple clients for embedding models, like OpenAI, HuggingFace or even local in process models. These are all really simple to initiate. For instance, the OpenAI version looks like this:

...
    @Qualifier("openaiEmbeddingModel")
    @Bean
    public EmbeddingModel openaiEmbeddingModel() {
        return OpenAiEmbeddingModel.builder()
                .apiKey("your-key")
                .modelName(TEXT_EMBEDDING_ADA_002)
                .build();
    }
...

And the local in process one looks like this:

...
    @Qualifier("inMemoryModel")
    @Bean
    public EmbeddingModel inMemoryEmbeddingModel() {
        return new InProcessEmbeddingModel(ALL_MINILM_L6_V2);
    }
...

That EmbeddingModel interface holds a couple of easy to use methods you can use to convert text to embeddings. You’ll see that later when we’re going to create an embedding for our question.

Embedding Store

After creating the embeddings, they need to be stored in an embedding store. This could be an in memory embeddings store, but Langchain4j also supports a few vector databases, like Weaviate and PineCone. Just like the embedding models, the setup for the stores is also very easy. This is what it looks like for an in memory store:

...
    @Qualifier("inMemoryEmbeddingStore")
    @Bean
    public EmbeddingStore inMemoryEmbeddingStore() {
        return new InMemoryEmbeddingStore<>();
    }
...

But if you want to go with Weaviate, for example, it’s not complex either:

...
    @Qualifier("weaviateEmbeddingStore")
    @Bean
    public EmbeddingStore weaviateEmbeddingStore() {
        return WeaviateEmbeddingStore.builder()
                .apiKey("your-key")
                .scheme("https")
                .host("your.weaviate.host")
                .build();
    }
...

The WeaviateEmbeddingStore builder has a few more methods which you can use, you can explore those in the example section.

Data Ingestion

When you have your embedding model and store ready, you want to ingest your data in the embedding store. This can be done with an EmbeddingStoreIngestor:

...
    Document document = Document.from("text");
    DocumentSplitter documentSplitter = DocumentSplitters.recursive(300);
    EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
            .documentSplitter(documentSplitter)
            .embeddingModel(embeddingModel)
            .embeddingStore(embeddingStore)
            .build();
    ingestor.ingest(document);
...

In this example, I created a Document object from the string “text”, but in reality you would probably have some larger text there. Langchain4j includes some parsers for PDF or DocX (MS Word) and some other types of files. These parsers also output a Document object which can be used to ingest into the store.

A DocumentSplitter is needed to cut your long text into smaller chunks (300 characters in this case). The Python LangChain library also supports overlap in chunks, but Langchain4j doesn’t support that (yet). Cutting your text into chunks is an important step in vector search, because embeddings of larger chunks tend to be less accurate. Also, sending large texts results in higher token counts, which will increase the cost of your language model if you use OpenAI for example. You should experiment with this thoroughly until you find the best chunk size for your data and use case.

Chat Language Model

Now we have an embedding model and a store with data in it. Next thing is to set up a chat language model to convert our search results into an actual answer to the question asked. For this part, we need an LLM. Langchain4j currently supports clients for OpenAI and HuggingFace. Here is an example for OpenAI:

...
    @Qualifier("openaiChatLanguageModel")
    @Bean
    public ChatLanguageModel openaiChatLanguageModel() {
        return OpenAiChatModel.builder()
                .apiKey("your-key")
                .modelName(GPT_3_5_TURBO)
                .temperature(0.8)
                .timeout(ofSeconds(15))
                .maxRetries(3)
                .logResponses(true)
                .logRequests(true)
                .build();
    }
...

Here you can set timeout, retries, choose what model you want to use and set the sampling temperature. The sampling temperature is a parameter for language models that governs the randomness/creativity of the responses of the model. This should be a value between 0 and 1. Higher values like 0.8 will make the output more random. Lower values like 0.2 will make it more focused and deterministic, meaning you almost always get the same response to a given prompt.

Getting Answers

When all of the above is done, it’s time to put all the pieces together to query for relevant texts, send those together with the question to the ChatLanguageModel and let it write an answer to your question from the provided information.

...
    public String askQuestion(String question) {
        Embedding queryEmbedding = embeddingModel.embed(question);
        List<EmbeddingMatch> relevant = embeddingStore.findRelevant(queryEmbedding, 4, 0.8);

        Map<String, Object> variables = new HashMap<>();
        variables.put("question", question);
        variables.put("information", relevant.stream().map(match -> match.embedded().text()).collect(Collectors.joining("\n\n")));

        PrompTemplate promptTemplate = PromptTemplate.from(
                "Answer the following question to the best of your abilities: \"{{question}}\"\n\n" +
                        "Base your answer on the following information:\n{{information}}");
        Prompt prompt = promptTemplate.apply(variables);

        LOGGER.info("Sending following prompt to LLM:\n{}", prompt.text());

        AiMessage aiMessage = chatLanguageModel.sendUserMessage(prompt.toUserMessage());
        return aiMessage.text();
    }
...

First, we use the EmbeddingModel to create an embedding for the question string, so it can be used to find semantically relevant pieces of text from the EmbeddingStore. The two other parameters in the findRelevant method are respectively the max amount of results we want to get back (4) and the minimal score the results should have (0.8). For these values you should experiment and find the sweet spot that works best for your use case and data.

The next step is to create a Map of variables which holds the question and the relevant texts (information). This map is then applied to the PromptTemplate where you specify the text (or prompt) that goes to the ChatLanguageModel. Prompt engineering, the art of writing instructions for your language model, is a very important step to get the answers you want in the format you want. I would really advise to spend time on writing the best prompt for your use case and data. Deeplearning.ai has a great free course on this.

Last but not least we send the prompt to the ChatLanguageModel and get back an answer to your question from the information you provide yourself. This could also mean your data does not contain anything related to your question and the answer is something like “I could not answer this question”, depending on your settings and prompt.

Final Words

As you could see it doesn’t really take much coding to create an awesome question answering system in Java with the use of Langchain4j. In my GitHub repository you can find all the above code with a couple REST endpoints. One to ingest a PDF created from the Devoxx Belgium conference FAQ page. Another one to ask questions which will use the information of that FAQ page as source. A question like “What is the address of the venue?” will result in something like “The address of the venue is Groenendaallaan 394, 2030 Antwerp, Belgium.”, which is very cool.

Keep in mind that for a production ready system you need to do a lot of investigation and experimentation. You need to choose or train the right embedding model, pick the ideal chunk size for your data, find the most fitting settings, like temperature for ChatLanguageModel and minimum score for matches. To see if your system is performing well you need to have a set of questions available that could be asked by users and verify if the generated answers are similar to what you expect them to be.

That being said, with the information in this post you can already start experimenting!