Top 15 Data Scientist Interview Questions (And How to Answer Them)

5 min read

Data science interviews are multi-layered: statistics, machine learning, coding, business intuition, and communication. The mix depends on the company — FAANG leans heavy on ML and coding, startups care more about end-to-end delivery and business impact.


Statistics & ML

1. "Explain the bias-variance tradeoff."

Fundamental ML concept.

Answer: "Bias is error from oversimplifying the model — it underfits. Variance is error from being too sensitive to training data — it overfits. High bias: the model misses patterns. High variance: the model memorizes noise. The sweet spot minimizes total error. Regularization, cross-validation, and ensemble methods help balance the two."

2. "How do you handle overfitting?"

Answer: "More training data if possible. Regularization (L1, L2). Cross-validation to detect it. Simpler model architecture. Dropout (for neural nets). Feature selection to remove noise. Early stopping. And always evaluate on a held-out test set that the model never sees during training."

3. "When would you use logistic regression vs. random forest vs. a neural network?"

Shows practical judgment, not just theory.

Answer: "Logistic regression: when I need interpretability, the relationship is roughly linear, and I have limited data. Random forest: when I want strong performance with minimal tuning and can sacrifice some interpretability. Neural network: when I have large datasets, complex non-linear relationships, and unstructured data (images, text, sequences). I always start simple and add complexity only if it improves results."

4. "Explain precision, recall, and F1 score. When do you optimize for each?"

Answer: "Precision: of all positive predictions, how many are actually positive (minimize false positives). Recall: of all actual positives, how many did we catch (minimize false negatives). F1: harmonic mean of both. Optimize for precision when false positives are costly (spam detection — don't want legit emails in spam). Optimize for recall when false negatives are costly (cancer screening — don't want to miss a case)."

5. "How do you approach feature engineering?"

Where data science gets creative.

Answer: "I start with domain knowledge — what features would a human expert look at? Then: create interaction terms, aggregate features (rolling averages, counts), encode categoricals (one-hot, target encoding), extract from dates (day of week, month, time since event), handle missing values as a signal, and normalize/scale. I test feature importance and drop what doesn't help."


Coding & Technical

6. "Write a function to [data manipulation task]."

Expect Python or SQL live coding. Common tasks: cleaning data, merging datasets, computing aggregations, implementing an algorithm from scratch.

Tip: Use pandas fluently. Know groupby, merge, apply, and vectorized operations. For SQL: window functions, CTEs, and self-joins.

7. "How do you handle imbalanced datasets?"

Answer: "Options: oversample the minority class (SMOTE), undersample the majority, adjust class weights in the model, use metrics that handle imbalance (AUC-ROC, precision-recall curve instead of accuracy), or collect more data for the minority class. The right approach depends on the dataset size and business context."

8. "Explain how you'd design an A/B test."

Answer: "Define the hypothesis and success metric. Calculate required sample size for statistical power. Randomly assign users to control and treatment. Run the experiment long enough to account for novelty effects and weekly patterns. Analyze using appropriate statistical test (t-test, chi-squared). Check for segment effects. Make a decision with a pre-committed significance level."


Business & Communication

9. "Tell me about a project where your model drove a business decision."

Impact, not just accuracy.

Structure: Business problem → data available → approach → model performance → how results were communicated → what decision was made → business impact.

10. "How do you explain a complex model to a non-technical stakeholder?"

Answer: "I focus on what the model does and why it matters, not how it works. 'This model predicts which customers are likely to churn next month with 85% accuracy. Here are the top 3 factors driving churn. If we intervene with these 500 customers, we estimate saving $200K in revenue.' I use visuals — feature importance charts, simple decision trees, before/after comparisons."

11. "A model performs well in testing but poorly in production. What could be wrong?"

Answer: "Data drift — production data distribution differs from training data. Feature leakage — training included information not available at prediction time. Latency — model is too slow for real-time serving. Pipeline bugs — features computed differently in production. I'd check each systematically: compare feature distributions, audit the feature pipeline, and monitor model performance continuously."


Behavioral

12. "How do you decide which problems to work on?"

Answer: "I evaluate by business impact (revenue, cost, user experience), feasibility (data availability, technical complexity), and timeline. I prefer quick wins that demonstrate value early, then build toward more complex projects. I also push back on 'ML for ML's sake' — sometimes a simple heuristic or SQL query solves the problem better than a model."

13. "Tell me about a project that failed or didn't get deployed."

Common in data science. Models often don't make it to production.

14. "What tools and frameworks do you use?"

Python (pandas, scikit-learn, XGBoost, PyTorch/TensorFlow), SQL, Jupyter, Git, Airflow/dbt for pipelines, MLflow for experiment tracking, Docker for deployment. Mention cloud: AWS SageMaker, GCP Vertex AI, or Databricks.

15. "What questions do you have for us?"

Ask about: the data infrastructure, how models are deployed and monitored, the team structure (ML engineers, analysts, data engineers), what projects are in the pipeline, and how success is measured for the data science team.


Want questions tailored to your exact role? Paste the job description at PasteJob and get a personalized cheat sheet in 15 seconds.

🎯

Want questions specific to your job listing?

These are generic questions. For questions tailored to your exact role and company — paste your job listing at PasteJob

Your interview isn't generic. Your prep shouldn't be.

Paste the actual job listing you're interviewing for and get a cheat sheet with questions, STAR answers, red flags, and smart questions to ask — all tailored to that specific role.

Paste your job listing