Skip to main content

Sentiment analysis

Let's say you have a pandas DataFrame with reviews and you want to calculate their rating automatically using a model, well you can use the following code

from transformers import pipeline

def compute_sentiment(df):
print("Loading sentiment analysis model ...")

sentiment_pipeline = pipeline(
model="nlptown/bert-base-multilingual-uncased-sentiment"
)
sentiments = sentiment_pipeline(list(df["content"]))
df["sentiment"] = [int(s["label"][0]) for s in sentiments] # type: ignore
return df

I recommend the nlptown/bert-base-multilingual-uncased-sentiment for its multilingual capabilities.

I ran the function on Play Store reviews and compared the self reported ratings with the ratings given by the model. Ratings are notes from 1 to 5.

>>> median_error = (df["metadata"] - df["sentiment"]).abs().mean()
0.5015974440894568

>>> average_error = (df["metadata"] - df["sentiment"]).std()
0.8585423638865525

Moreover, the model gave the correct label 59% of the time. Looking at specific reviews, while the model was off by one sometimes, it was never off by two.

This means that even if we don't have reviews, we can safely use the model to fill in the blanks.