spam_classification/problem_overview

Spam Detection in Social Media Posts


Design a Spam Detection ML system for Social Media Platform

To tackle this problem, we will use the structured framework discussed here. The diagram below outlines the key steps in this framework. We will begin by asking clarifying questions to fully understand the problem statement, then move on to data collection, followed by model training and architecture design. After that, we will cover model evaluation, and finally discuss deployment and monitoring to ensure the system performs reliably in production.
Framework steps

1. Requirements Clarification


Candidate: Social media posts can include text, images, or videos. Should the spam classifier work on all of these, or only on text ?
Interviewer: That's a great starting point. For now, we will focus only on text-based posts to keep the scope manageable.

Candidate: How do we define what counts as spam? Can you share some examples?
Interviewer: Sure. Posts like "Earn $5000 per week from home!! Sign up NOW" are good examples of spam. They are often misleading, overly promotional, or linked to scams.

Candidate: Once we identify a post as spam, what action should the system take?
Interviewer: For Simplicity, we will block the post from being published. The author will be notified and asked to edit their content before trying again.

Candidate: Since posts must be screened before they go live, this has to be an online classifier that gives instant predictions? Is this understanding correct?
Interviewer: Yes , you are right.

Candidate: What kind of latency is acceptable for this online model?
Interviewer: Ideally, we want the lowest possible latency to keep the user experience smooth. For this problem, let us assume predictions under 100 milliseconds are acceptable.

Refined Problem Statement :
We want to build a spam classification model for text-based social media posts. The model will run in an online deployment setting, producing predictions in real time, with a latency requirement of under 100 milliseconds.