PCA before classification
Question: Does PCA help us classify interviews ?
Current issue: When trying to group interviews into categories, your categories are too broad and when GPT tries to generate titles for them, we get titles like:
- User Feedback: Pros, Cons, and Improvements Needed
- Improvements and Usefulness of Restaurant Review App
- Praise for an App that Helps Users Find Restaurants and Explore Local Attractions
While the system is able to distinguish between good and bad reviews, it struggles with finer classification. This is not always the case and we have better groups like:
- Mixed Experiences with App: Bugs, Limited Gift Card Acceptance, and Lack of Introduction
- "User Frustrations: Update Issues, Accessibility Problems, and Functional Dissatisfaction"
But the categories remain too broad.
Does applying PCA to the vector embeddings (see Embedding models) before feeding the vectors to the clustering algorithm yield better results ?
Answer: probably not, it might even make it worst.
| Titles without PCA | Titles with PCA |
|---|---|
| Mixed Experiences with App: Bugs, Limited Gift Card Acceptance, and Lack of Introduction | Issues and Complaints with a Restaurant Platform |
| Improvements and Frustrations with Navigation and Search Features | Criticism and Frustration with Mobile Application Features |
| Issues and Concerns with Reservation System | Users' Mixed Reviews and Suggestions for Improvement |
| "User Frustrations: Update Issues, Accessibility Problems, and Functional Dissatisfaction" | User Frustration with Recent App Update |
| Improvements and Usefulness of Restaurant Review App | Enhancing User Experience with Restaurant and Hotel Selection |