Data Mining 101: Finding Subversives with Amazon Wishlists
Wow. This blew me away.
Via Boing Boing.
Frequent Make contributor Tom Owad just published a mind-blowing how-on on his website explaining how to mine Amazon's wish list database to uncover "subversives."
Using a pair of 5-year-old computers, two home DSL connections, 42 hours of computer time, and 5 man hours, I now had documents describing the reading preferences of 260,000 U.S. citizens.Link.
I downloaded all the files to an external 120 GB Firewire drive in UFS format. The raw data occupied little more than 5 GB. I initially wanted to move all the files into a single directory to facilitate searching, but as the directory contents exceeded 100,000 items, the speed became glacially slow, so I kept the data divided into chunks of 25,000 wishlists.
Next comes the fun part – what books are most dangerous? So many to choose from. Here's a sample of the list I made. Feel free to make up your own list if you decide to try some data mining. Send it to the FBI. I'm sure they'll appreciate your help in fighting terrorism.
0 Comments:
Post a Comment
<< Home