Skip to main content

Using GPT for reviews-to-themes association

Problem​

Can we use GPT-3.5 for reviews to themes association?

Control points​

Short Answer​

Accuracy is better, but % of themes found is worse.

If we want to use GPT-3.5, then prompt 1 is better.

Note that GPT-4 performs much better than all other methods, but it is more expensive.

Recommendation: keep all-MiniLM-L6-v2 for the moment, switch to GPT-4 if PMs complain

How?​

  • We use a prompt to ask GPT-3 to generate associations
  • We use functions to format the output to a JSON

See code below for more details.

Comparisons of multiple attempts​

Themes found (TP)Themes not found (FN)Incorrect themes found (FP)% themes found TPΓ·(TP+FN)% themes correct TPΓ·(TP+FP)Accuracy (TP+TN)Γ·(TP+TN+FP+FN)Commentary
all-MiniLM-L6-v2655811352,85 %36,52 %72,24 %For reference
GPT-3.5 : prompt 0911417,32 %90,00 %81,33 %Find nothing, assign zero categories to 3/4 of reviews
GPT-3.5 : prompt 149743939,84 %55,68 %81,66 %Better results for this prompt
GPT-3.5 : prompt 135884228,46 %45,45 %78,90 %idem
GPT-3.5 : prompt 129942923,58 %50,00 %80,03 %idem
GPT-3.5 : prompt 231926825,20 %31,31 %74,03 %Worse results here: add too many themes to each review
GPT-3.5 : prompt 224996419,51 %27,27 %73,54 %idem
GPT-4 : prompt 169542256,10 %75,82 %87,66 %Try GPT-4, just to see : better than everything, as expected
  • Prompt 0 : a review can be associated to zero, one, or multiple categories
  • Prompt 1 : a review should be associated to at least one category, or multiple if relevant.
  • Prompt 2 : a review should be associated with multiple categories, and never zero.

Limitations​

  1. Prices:

    Tokens used on 1 test (88 reviews)Price spend on 1 test (88 reviews)Tokens used on 1 app (approx. 400 reviews)Price spend on 1 app (approx. 400 reviews)
    GPT-3.5 (16K version)5 106 (in average)$0.01723 200 (2 requests)$0.08
    GPT-4 (8K version)5 226 (one test)$0.2223 700 (3 requests)$1.00
  2. Dataset of test that can be not relevant (but it is less important in case of generative models).

  3. Slower: GPT-4 takes 4’30’’ to make the associations (instead of 1 to 2 minutes for similarity).

  4. The prompt could be more advanced, with some examples or explanations about the global picture.

Example​

def categorize_reviews(df_reviews: pd.DataFrame, df_categories: pd.DataFrame):
reviews_text = "\n\n".join(
[f"{row['id']}: {row['content']}" for _, row in df_reviews.iterrows()]
)

categories_text = "\n\n".join(
[
f"{row['label']}: {row['description']}"
for _, row in df_categories.iterrows()
]
)

answer_schema = {
"results": {
"type": "array",
"description": "List of categorized reviews.",
"items": {
"type": "object",
"properties": {
"review_id": {
"type": "string",
"description": "Id of the review",
},
"categories": {
"type": "array",
"description": (
"List of categories associated with the review."
),
"items": {
"type": "string",
"description": "One of the category labels: "
+ ", ".join(df_categories["label"].tolist()),
},
},
},
},
},
}

response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
temperature=1,
messages=[
{
"role": "system",
"content": """
You are an experienced Product Manager. You are tasked with associating reviews
to categories. A review should be associated to at least one category, or
multiple if relevant.
""",
},
{
"role": "user",
"content": (
"Here is the list of reviews to categorize:\n\n"
+ reviews_text
+ "\n\n"
+ "Here is the list of categories:\n\n"
+ categories_text
),
},
],
functions=[
{
"name": "show_categorized_reviews",
"description": (
"Display the categories associated with each review id"
),
"parameters": {
"type": "object",
"properties": answer_schema,
},
}
],
function_call={"name": "show_categorized_reviews"},
)
data = json.loads(
response["choices"][0]["message"]["function_call"]["arguments"]
)

# Create a dictionary for faster lookup
category_dict = {}
for review in data["results"]:
category_dict[review["review_id"]] = review["categories"]

# Add a column for each category and set values
for category in df_categories["label"]:
df_reviews[category] = df_reviews["id"].apply(
lambda x: 1 if category in category_dict.get(x, []) else 0
)

return df_reviews