Skip to main content

PCA before classification

Question: Does PCA help us classify interviews ?

Current issue: When trying to group interviews into categories, your categories are too broad and when GPT tries to generate titles for them, we get titles like:

  • User Feedback: Pros, Cons, and Improvements Needed
  • Improvements and Usefulness of Restaurant Review App
  • Praise for an App that Helps Users Find Restaurants and Explore Local Attractions

While the system is able to distinguish between good and bad reviews, it struggles with finer classification. This is not always the case and we have better groups like:

  • Mixed Experiences with App: Bugs, Limited Gift Card Acceptance, and Lack of Introduction
  • "User Frustrations: Update Issues, Accessibility Problems, and Functional Dissatisfaction"

But the categories remain too broad.

Does applying PCA to the vector embeddings (see Embedding models) before feeding the vectors to the clustering algorithm yield better results ?

Answer: probably not, it might even make it worst.

Titles without PCATitles with PCA
Mixed Experiences with App: Bugs, Limited Gift Card Acceptance, and Lack of IntroductionIssues and Complaints with a Restaurant Platform
Improvements and Frustrations with Navigation and Search FeaturesCriticism and Frustration with Mobile Application Features
Issues and Concerns with Reservation SystemUsers' Mixed Reviews and Suggestions for Improvement
"User Frustrations: Update Issues, Accessibility Problems, and Functional Dissatisfaction"User Frustration with Recent App Update
Improvements and Usefulness of Restaurant Review AppEnhancing User Experience with Restaurant and Hotel Selection