It has been difficult for me to find a simple explanation of Mean Average Precision (MAP) on the internet, so I figured I would take what I’ve discovered through multiple sources and attempt to boil this evaluation metric down through an easily understandable example.
MAP is a performance metric of Information Retrieval (IR). IR is the (automated) process of returning appropriate results to a user, given some form of user input. The most obvious example of IR is a Google search result. The user inputs a text string, and Google returns the webpages that best match the user’s input. Another example is a streaming movie service that recommends a selection of movies based on the user’s viewing history. For this post, we will use a fictional streaming movie example to illustrate how MAP evaluates an IR system’s performance.
Let’s pretend that John, a streaming movie company customer, is an avid fan of horror movies. This is obvious because the movies he has watched so far on the streaming service have included The Exorcist, The Blair Witch Project, and Poltergeist. The goal of the streaming movie company’s IR system is to make additional recommendations to John that he will most likely watch next. The data frame below shows the next 5 movies that John actually watched compared to the next 5 movies that were recommended to John.
user watched recommended
1 John Jaws Jaws
2 John Psycho Taken
3 John Alien Fear
4 John Tremors Alien
5 John Fear Titanic
So how well did the IR system perform on recommending the next 5 movies that John should watch?
Average Precision (the “AP” portion of MAP) concerns itself with two things: (1) how many of the recommended movies were actually watched and (2) whether or not the recommendations that were actually watched are towards the top of the recommendation list.
The first thing we need to do is calculate the cumulative accuracy of the recommendations. i.e., for every correct recommendation that the IR system made, we sum the total accurate recommendations up to that point. This calculation does not care whether the accurate recommendations are in the same order as the actual movies watched.
user watched recommended cumul_acc
1 John Jaws Jaws 1
2 John Psycho Taken 0
3 John Alien Fear 2
4 John Tremors Alien 3
5 John Fear Titanic 0
The next part of the calculation is to divide each row’s cumulative accuracy value by the row number.
user watched recommended cumul_acc ratio
1 John Jaws Jaws 1 1/1
2 John Psycho Taken 0 0/2
3 John Alien Fear 2 2/3
4 John Tremors Alien 3 3/4
5 John Fear Titanic 0 0/5
We then take the average of the ratios, which calculates an AP@5 of .4833. You can see that positioning the accurate recommendations towards the top of all recommendations would result in a larger ratio. e.g., if Fear had been the 2nd recommendation instead of the 3rd recommendation, then its corresponding ratio would have equaled 1 (2/2) instead of 2/3.
If the IR system had accurately recommended every one of the next 5 movies watched (regardless if the recommendations were in the same order as the movies watched), then the AP@5 would have received a perfect score of 1.
Finally, the “M” portion of MAP comes into play by evaluating the average AP performance of an IR system on a set of recommendations. Let’s pretend that in addition to John’s AP@5 score of .4833, Jane had an AP@5 score of .6565 and Tom had an AP@5 score of .4449. The MAP@5 for this set would be (.4833 + .6565 + .4449)/3 = .5282