Reading Morozov’s To Save Everything, Click Here, I came across this nugget:
Thus, even tweeting that you don’t like your yogurt might bring police to your door, especially if somone who tweeted the same thing three years before ended up shooting someone in the face later.
Morozov is talking about predictive models of criminal behavior based on “tweets or Facebook updates that exploit nonthreatening verbal cues that tend to precede criminal acts.” He is obviously trying to be provocative but it points to a misunderstanding of how prediction works that makes him target the wrong problem on an relevant issue.
The survival of any company using predictive models depends on not making the kind of mistake that Morozov uses as argument. Very little value would have a model that recommends a rake to a user who bought a biography of Bismarck just because one customer happened to make the two purchases in the past.1 Similarly, it is very easy to beat an algorithm that suggest that the police should chase everyone tweeting about yogurt.2 Although the relation between predictive models and substantive expertise is fairly strange, this seems a very clear example of a type I error, which is a technical term for predicting false positives (or classifying as criminal someone who isn’t). Minimization of false positives and negatives is the whole point of a learning algorithm so, if anything, Morozov is supporting his claim on very weak ground: he would not be criticizing “algorithms,” but “bad algorithms.”
It is sometimes hard to find examples of a bundle of two purchases that is absurd. Paraphrasing my advisor when we used to talk about how our codebook was always falling short in the classification of political events, “you are always surprised by really events that happened that exceed the expectations of the wildest imagination”. ↩
I am going to state the obvious, but the probability of being a criminal conditional on tweeting about yogurt is not the same as the probability of tweeting about yogurt conditional on being a criminal. ↩