Precision-Recall Tradeoff in Real-World Use Cases

Lavanya Gupta
Analytics Vidhya
Published in
5 min readFeb 19, 2021

--

Ace your ML interview by quickly understanding which real-world use cases demand higher precision, and which ones demand a higher recall and why?

Why you should read this article?

All machine learning interviews expect you to understand the practical application of precision-recall tradeoff in real-world use cases, beyond just the definitions and formulas.

I have tried to capture this essence by defining a 🔑 “secret key” that you can exploit to ace your next ML interview and impress your interviewer by providing articulate justifications!

Definitions

💡 Precision measures that out of all the positive predicted examples, how many detections were correct?

💡 Recall measures that out of all actual positive examples, how many were we able to identify?

Make sure you completely understand the above definitions because all our further discussions will build upon this. Remembering just these two definitions is sufficient to answer any ML interview question related to precision and recall that comes your way. Yes, you can, trust me!

You also do not even need to memorize the (very) confusing formulas for precision, recall, true positive rate, false positive rate, specificity, sensitivity, and the list goes on…
So, let’s go ahead and convert the above definitions into mathematical formulas.

Focus on the italics part of the two definitions: positive predicted examples and actual positive examples. These two quantities form your denominator of precision and recall respectively.

▶️ What quantities from the above table constitute positive predicted examples?
Lookup under the “predicted” heading, the “positive” row. It comprises of True Positive (TP) and False Positive (FP). So your denominator quantity for precision is TP+FP.

▶️ What quantities from the above table constitute actual positive examples?
Lookup under the “actual” heading, the “positive” column. It comprises of True Positive (TP) and False Negative (FN). So your denominator quantity for recall is TP+FN.

The only missing piece in the formula is now the numerator. The numerator quantity is easy to remember since it is exactly the same in both formulas. Since we are primarily interested in finding the number of correct positive predictions from our model, the numerator will be equal to True Positive (TP).

Hence, the resultant formulas are:

🔑 The secret key 🔑

Finally, let’s look into the secret key that you can easily understand to differentiate use-cases that should prioritize precision over recall and others that should prioritize recall over precision.

When you cannot afford to have any false negatives, you prioritize recall.

When you cannot afford to have any false positives, you prioritize precision.

In other words, when you cannot afford to miss any detection, you look for high recall; and when you cannot afford to have any incorrect detection you look for high precision.

A popular interview question that is often asked around this concept is: If the cost of FP is higher than FN, what would you prefer — precision or recall?
I hope you can now easily answer this question and even provide a justification to the interviewer (bonus!).

4 Scenarios of Precision-Recall

Let’s move away from the tabular dataset examples that are usually used to explain precision-recall. I am instead going to use a very easy yet intuitive example of an object detection model to help you understand different scenarios of precision-recall.

  1. High recall but low precision implies that most ground-truth objects have been detected, but most detections are incorrect (many false positives).
  2. High precision but low recall implies that most the predicted boxes are correct, but most ground-truth objects have not been detected (many false negatives).
  3. High precision and high recall implies an ideal detector that has detected all ground-truth objects correctly.
  4. Low precision and low recall implies a poor detector that does not detect most ground-truth objects (many false negatives), and most detections are incorrect (many false positives).

Real-world use cases (with justifications)

If you have made it till here, congratulations! You now understand a lot about the practical usage of precision and recall.

Real-world problems typically have a different interpretation for each type of error made — false positive Vs false negative. In most situations, one is more important than the other.

Let’s dive right into what we have been waiting for — the different use-cases where we prioritize one metric over the other.

1. Medical test (eg. cancer detection): Recall is more important
🔖 It is okay to classify a healthy person as having cancer (false positive) and following up with more medical tests, but it is definitely not okay to miss identifying a cancer patient or classifying a cancer patient as healthy (false negative) since the person’s life is at stake.

2. Recommendation Systems: Precision is more important
🔖 Missing out to recommend a particular famous movie is okay (low recall), but the overall recommendations should be good. If the customer is shown a lot of irrelevant results (false positives), it will be a very bad experience for the user.

3. Predicting a good day based on weather conditions to launch satellite: Precision is more important
🔖 Missing out to predict a good weather day is okay (low recall), but predicting the wrong weather day (false positive) to launch the satellite can be disastrous.

4. Criminal death punishment: Precision is more important
🔖 Missing out to punish a criminal is okay (low recall), but incriminating an innocent person (false positive) is undesirable.

5. Email spam detection: Precision is more important
🔖 Missing out to detect/classify a spam email is okay (low recall), but no legit or important email must go into the spam folder (false positive).

6. Not Safe For Work (NSFW) images detection: Recall is more important
🔖 It is okay to classify a valid image as NSFW (false positive) — it can always be rectified by manual data quality checks later. But it is definitely not okay to classify an NSFW image as a safe image (false negative) and displaying it on your company’s website since it will damage the company’s reputation.

7. Identifying good customers for a bank loan: Precision is more important
🔖 Missing out to identify/classify a good customer eligible for the loan is okay (low recall), but approving a loan to a bad customer (false positive) who may never repay it is undesirable.

8. Flagging fraudulent transactions: Recall is more important
🔖 It is okay to classify a legit transaction as fraudulent — it can always be reverified by passing through additional checks. But it is definitely not okay to classify a fraudulent transaction as legit (false negative).

Summary

I hope with the above examples it was clear that one type of misclassification (classification error) is much worse than the other depending on the use case.

If you have made it this far and found this article useful, then please hit 👏 . It goes a long way in keeping me motivated, thank you!

--

--

Lavanya Gupta
Analytics Vidhya

Carnegie Mellon Grad | AWS ML Specialist | Instructor & Mentor for ML/Data Science