Using GPT for reviews-to-themes association
Problemβ
Can we use GPT-3.5 for reviews to themes association?
Control pointsβ
- I forced GPT-3.5 to output associations with functions
- I compared the results using our evaluation method: Evaluate the results of reviews-to-themes associations
Short Answerβ
Accuracy is better, but % of themes found is worse.
If we want to use GPT-3.5, then prompt 1 is better.
Note that GPT-4 performs much better than all other methods, but it is more expensive.
Recommendation: keep all-MiniLM-L6-v2 for the moment, switch to GPT-4 if PMs complain
How?β
- We use a prompt to ask GPT-3 to generate associations
- We use functions to format the output to a JSON
See code below for more details.
Comparisons of multiple attemptsβ
| Themes found (TP) | Themes not found (FN) | Incorrect themes found (FP) | % themes found TPΓ·(TP+FN) | % themes correct TPΓ·(TP+FP) | Accuracy (TP+TN)Γ·(TP+TN+FP+FN) | Commentary | |
|---|---|---|---|---|---|---|---|
| all-MiniLM-L6-v2 | 65 | 58 | 113 | 52,85 % | 36,52 % | 72,24 % | For reference |
| GPT-3.5 : prompt 0 | 9 | 114 | 1 | 7,32 % | 90,00 % | 81,33 % | Find nothing, assign zero categories to 3/4 of reviews |
| GPT-3.5 : prompt 1 | 49 | 74 | 39 | 39,84 % | 55,68 % | 81,66 % | Better results for this prompt |
| GPT-3.5 : prompt 1 | 35 | 88 | 42 | 28,46 % | 45,45 % | 78,90 % | idem |
| GPT-3.5 : prompt 1 | 29 | 94 | 29 | 23,58 % | 50,00 % | 80,03 % | idem |
| GPT-3.5 : prompt 2 | 31 | 92 | 68 | 25,20 % | 31,31 % | 74,03 % | Worse results here: add too many themes to each review |
| GPT-3.5 : prompt 2 | 24 | 99 | 64 | 19,51 % | 27,27 % | 73,54 % | idem |
| GPT-4 : prompt 1 | 69 | 54 | 22 | 56,10 % | 75,82 % | 87,66 % | Try GPT-4, just to see : better than everything, as expected |
- Prompt 0 : a review can be associated to zero, one, or multiple categories
- Prompt 1 : a review should be associated to at least one category, or multiple if relevant.
- Prompt 2 : a review should be associated with multiple categories, and never zero.
Limitationsβ
-
Prices:
Tokens used on 1 test (88 reviews) Price spend on 1 test (88 reviews) Tokens used on 1 app (approx. 400 reviews) Price spend on 1 app (approx. 400 reviews) GPT-3.5 (16K version) 5 106 (in average) $0.017 23 200 (2 requests) $0.08 GPT-4 (8K version) 5 226 (one test) $0.22 23 700 (3 requests) $1.00 -
Dataset of test that can be not relevant (but it is less important in case of generative models).
-
Slower: GPT-4 takes 4β30ββ to make the associations (instead of 1 to 2 minutes for similarity).
-
The prompt could be more advanced, with some examples or explanations about the global picture.
Exampleβ
def categorize_reviews(df_reviews: pd.DataFrame, df_categories: pd.DataFrame):
reviews_text = "\n\n".join(
[f"{row['id']}: {row['content']}" for _, row in df_reviews.iterrows()]
)
categories_text = "\n\n".join(
[
f"{row['label']}: {row['description']}"
for _, row in df_categories.iterrows()
]
)
answer_schema = {
"results": {
"type": "array",
"description": "List of categorized reviews.",
"items": {
"type": "object",
"properties": {
"review_id": {
"type": "string",
"description": "Id of the review",
},
"categories": {
"type": "array",
"description": (
"List of categories associated with the review."
),
"items": {
"type": "string",
"description": "One of the category labels: "
+ ", ".join(df_categories["label"].tolist()),
},
},
},
},
},
}
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-16k",
temperature=1,
messages=[
{
"role": "system",
"content": """
You are an experienced Product Manager. You are tasked with associating reviews
to categories. A review should be associated to at least one category, or
multiple if relevant.
""",
},
{
"role": "user",
"content": (
"Here is the list of reviews to categorize:\n\n"
+ reviews_text
+ "\n\n"
+ "Here is the list of categories:\n\n"
+ categories_text
),
},
],
functions=[
{
"name": "show_categorized_reviews",
"description": (
"Display the categories associated with each review id"
),
"parameters": {
"type": "object",
"properties": answer_schema,
},
}
],
function_call={"name": "show_categorized_reviews"},
)
data = json.loads(
response["choices"][0]["message"]["function_call"]["arguments"]
)
# Create a dictionary for faster lookup
category_dict = {}
for review in data["results"]:
category_dict[review["review_id"]] = review["categories"]
# Add a column for each category and set values
for category in df_categories["label"]:
df_reviews[category] = df_reviews["id"].apply(
lambda x: 1 if category in category_dict.get(x, []) else 0
)
return df_reviews