The Fallacy of Matching Algorithms

Netflix, Amazon, eHarmony, Match and many other online businesses - all try to predict what you would like. They use a matching algorithm that says something like "This is what you liked before, so this is what you will likely enjoy".

In my view, is a fallacy to all of these types of algorithms : the idea that new content, or types of data - can match cleanly to old data types. Let's take an online date matching. What they are trying to do is to match your personality type and general ideal - to that of another persons stated personality type and/or their ideals. That other person represents a new type of data - and there are alot of hidden elements. For example, you might want to date a person who is tall - and they can match that by height. But when you get the match, you might reject them because they live too far away. This is not something that was on the questionnaire.

The easiest way to see that all of these approaches, in some way or another - fail. Is to ask yourself whether or not your likes or dislikes -in art, or in life - can be reliably regenerated using just a questionnaire?

The basis that they use to generate the predicted like or dislike hinges largely on their ability to create categories that blend the different types of data - and then match new data into that category. This , from first principle, is completely violated whenever new data types come along. Think of a film you might like to see. Let's say, you're into scary films.

What happens , when a film comes along that scares you in a way you can't fully articulate - suppose it gets so far under your skin it activates a defense response. What if your first reaction to the film, is that you hate it. Alternately, you may not wish to admit you liked it. Some films carry theological baggage. For whatever reason, a film that actually scared you - will be scored as 'hated'.

And again, in Netflix the primary criterion of collecting film is in direct opposition to building a film queue. The two are just not alike - and new data types in one, are treated totally differently than the other. A queue entry is just something that is there so that I can conveniently access it, when I am watching films on my TV. The interface to search for film is so primitive and awful, that I generate queue entries on my PC and then roll through the queue on my TV. When I want to collect a film, however, I usually ask myself "would I want to see that film twice?". And if the answer is yes, I usually collect it. The queue is there to hold place, and the collectible is there because I have a place for it.

Our decision making process simply cannot be generated by questionnaire type / empirical matching.