Sentiment analysis
Let's say you have a pandas
DataFrame
with reviews and you want to calculate their rating automatically using a model,
well you can use the following code
from transformers import pipeline
def compute_sentiment(df):
print("Loading sentiment analysis model ...")
sentiment_pipeline = pipeline(
model="nlptown/bert-base-multilingual-uncased-sentiment"
)
sentiments = sentiment_pipeline(list(df["content"]))
df["sentiment"] = [int(s["label"][0]) for s in sentiments] # type: ignore
return df
I recommend the nlptown/bert-base-multilingual-uncased-sentiment for its
multilingual capabilities.
I ran the function on Play Store reviews and compared the self reported ratings with the ratings given by the model. Ratings are notes from 1 to 5.
>>> median_error = (df["metadata"] - df["sentiment"]).abs().mean()
0.5015974440894568
>>> average_error = (df["metadata"] - df["sentiment"]).std()
0.8585423638865525
Moreover, the model gave the correct label 59% of the time. Looking at specific reviews, while the model was off by one sometimes, it was never off by two.
This means that even if we don't have reviews, we can safely use the model to fill in the blanks.