In our case, we use Cascading as our step up in abstraction on top of Hadoop.
S3 -> EC2 -> Cloudera -> HDFS -> Hadoop -> Cascading -> Clojure. I’m not sure if those layers are exactly the right order, but you get the point. The key is go keep layering until you encapsulate the plumbing and get to the level of abstraction that lets you focus on solving your problem.
(Link: How FlightCaster Squeezes Predictions from Flight Data » Data Wrangling Blog)


August 26, 2009

No comments yet... Be the first to leave a reply!