Spam Detection in Social Media Posts
1. Requirements Clarification
Candidate: Social media posts can include text, images, or videos. Should the spam classifier work on all of these, or only on text ?
Interviewer: That's a great starting point. For now, we will focus only on text-based posts to keep the scope manageable.
Candidate: How do we define what counts as spam? Can you share some examples?
Interviewer: Sure. Posts like "Earn $5000 per week from home!! Sign up NOW" are good examples of spam. They are often misleading, overly promotional, or linked to scams.
Candidate: Once we identify a post as spam, what action should the system take?
Interviewer: For Simplicity, we will block the post from being published. The author will be notified and asked to edit their content before trying again.
Candidate: Since posts must be screened before they go live, this has to be an online classifier that gives instant predictions? Is this understanding correct?
Interviewer: Yes , you are right.
Candidate: What kind of latency is acceptable for this online model?
Interviewer: Ideally, we want the lowest possible latency to keep the user experience smooth. For this problem, let us assume predictions under 100 milliseconds are acceptable.
Refined Problem Statement :
We want to build a spam classification model for text-based social media posts. The model will run in an online deployment setting, producing predictions in real time, with a latency requirement of under 100 milliseconds.