The Digital Skeptic: The Dumb Reason the Web Doesn't Make Money
"Dumb algos are what work," Joseph Turian told me over cocktails on a rainy night here in New York last fall. (That's not the whole secret. You'll have to read on.)
Turian knows his way around the Big Data ordnance. He got a doctorate in computer science from NYU after an undergrad stint at Harvard. And he runs a gun-for-hire data analysis outfit called MetaOptimize.
Here's his drop on dumb algorithms: Complex computer programs that manage nasty so-called Big Data problems, including recognizing speech or buying and selling stock, can be mighty impressive. But it turns out, what really rocks the information house when it comes to brutally tough, data-heavy computer problems is programming simplicity to the point of stupidity. There's a catch, though.
"You need access to massive amounts of data to make these dumb algos work," he said.
Take Google Translate, the search giant's marvelously effective online translation app. Apparently, programs such as this tend to stay away from complex computer models that account for linguistic nuance such as object-oriented grammar or deep syntax. Rather, they tend to rely on basic concepts such as the simple probability that a set of words will appear in a given order. Then these not-so-smart algos do little more than rank and compare the number of times such multiword phrases occur in a language to quickly translate one equivalently ranked word set to another.
"It's not exactly that simple, but it is not that far from it," Turian said. "The important thing is, have as much text as possible. And what Google probably does is grab every written word from all the websites on Earth to make Translate work."
Turian is not out on an algo limb here. I confirmed the "dumb algos work" sentiment with pretty much every data nerd I could get my hands on.
That includes Michael Selik , a New York economist and software engineer; my big data buddy Brian Dalessandro , vice president of data science for Media6Degrees, the New York advertising data shop; and most interestingly, Dennis Mortensen , who helped developed Yahoo!'s (YHOO) big data apps and now runs a New York media data analysis firm called Visual Revenue .
"When you have smaller data sets, you need to fill that with modeling," Mortensen confirmed to me over the phone. "It is not as good as having more data."
Now, take a deep breath and let this idea sink in. And don't Dumb Algos make the massive, inexplicable, intractable problems of the information age suddenly become as pristine as a moonless Antarctic midnight sky.
Peering at the edge of the Web
Think about it. No matter how smart Google's geniuses think they are and how effective heavily modeled algos can be for certain data problems, for the truly awesome consumer experience that Google (GOOG) , Facebook (FB) , Amazon (AMZN) and all the rest invest in, there is little choice but to harvest as much digital stuff as possible. But there's a rub: The grim, no-money economics, created by these information age giants, means even the Googles don't have the resources to pay for the data it needs. So it has no choice but to lure users to give it their information -- which almost invariably means offering ever-more complex and expensive services for nothing.