Answering questions from interviews

Given a set of interviews and a list of hypothesis, you might want to use the interviews to test the hypothesis you made. To do this, you could use an open AI model, but this has flaws:

Not every interview answers every hypothesis.
You cannot give all the interviews to the model because that would be too much text.

To solve this, we can use local Embedding models to convert the interviews and question to vector to see which ones are similar as these models have been trained on question / answer pairs (usually from websites like quora) and will produce similar vectors for a question and its answer.

Once this has been done, we can pick the closest interviews for every question and use these as a prompt to OpenIA. The results work well.

import openai
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("sentence-transformers/all-MiniLM-L12-v2")

# Get embeddings
hypothesis_vectors = model.encode(hypothesis)
interview_vector = model.encode(interviews)

for i in range(len(hypothesis_vectors)):
    hypo = hypothesis_vectors[i]

	# Find the closest interviews
	distances = []
    for interview in interview_vector:
        distances.append(np.linalg.norm(hypo - interview, ord=2))

    sorted_indices = np.argsort(distances)
	closest_interviews = data["Interview"].iloc[sorted_indices[0:3]]

	# Build the prompt
	prompt = """
    Voici des interviews utilisateurs sur un produit qui vous intéresse:

    """

    prompt += "\n---------------\n".join(closest_interviews)
	prompt += "\nA partir de ces réponses, confirme ou rejette l'hypothèse suivante: "

	prompt += hypothesis[i]

	prompt += "\nRéponse:"

	response = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        temperature=0.3,
        max_tokens=150,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0,
    )
	# Print the answer from OpenIA.
	print(response["choices"][0]["text"])