Skip to main content

Multilingual models

Problem​

Is it better to use multilingual models rather than google translate + monolingual model for reviews-to-themes association?

Control points​

  • I tested multiple models from docs and huggingface βœ…

Short Answer​

No, google translate + monolingual model give better results.

How?​

Tested models from documentation (https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models) and other models from HuggingFace (https://huggingface.co/models?library=sentence-transformers&language=multilingual&sort=downloads)

Description of the methodology: Evaluate the results of reviews-to-themes associations

A: Themes found (true positive)B: Themes not found (true negative)C: Falsy found (false positive)% themes found =A/(A+B)% found that are correct=A/(A+C)Comment
all-MiniLM-L6-v2 (+ Google Translate)655811352,85 %36,52 %Reference (monolingual model)
paraphrase-multilingual-MiniLM-L12-v2636016951,22 %27,16 %
distiluse-base-multilingual-cased-v28115656,50 %10,96 %
paraphrase-multilingual-mpnet-base-v2814228065,85 %22,44 %More themes found but also a lot more false positives
intfloat/multilingual-e5-large626120150,41 %23,57 %
intfloat/multilingual-e5-large + google trans735024459,35 %23,03 %More themes found but also a lot more false positives
paraphrase-multilingual-MiniLM-L12-v2 + google trans675617354,47 %27,92 %More themes found but also more false positives
distiluse-base-multilingual-cased-v19114757,32 %10,71 %

Limitations​

As stated in Evaluate the results of reviews-to-themes associations, the dataset of tests might not be the best one.