(Full Story: Nate Silver Skeptical of Big Data Trends)
In other words, Facebook had to estimate its user count because actually counting them user by user would have been too taxing on its servers. So User No. 1,000,000,000′s identity will remain a mystery.
(Full Story: Who Is Facebook’s Billionth User? – Business Insider)
Identify the data you care about from the sources you work with (e.g. Excel spreadsheets, files, SQL Server databases).
Discover relevant data and services via automatic recommendations from the Windows Azure Marketplace.
Enrich your data by combining it and visualizing the results.
Collaborate with your colleagues to refine the data.
Publish the results to share them with others or power solutions.
(Full Story: Microsoft Codename “Data Explorer”)
It’s hard to understate the sophistication of the tools needed to instrument, track, move, and process data at scale. The development and implementation of these technologies is the responsibility of the data engineering and infrastructure team. The technologies have evolved tremendously over the past decade, with an incredible amount of collaboration taking place through open source projects.hive
(Full Story: Building data science teams – O’Reilly Radar)
Teiid is a data virtualization system that allows applications to use data from multiple, heterogenous data stores.
Teiid is comprised of tools, components and services for creating and executing bi-directional data services. Through abstraction and federation, data is accessed and integrated in real-time across distributed data sources without copying or otherwise moving data from its system of record.
(Link: Teiid is a data virtualization system that allows applications to use data from multiple, heterogenous data stores.)
PatientsLikeMe managed to block and identify the intruder: Nielsen Co., the privately held New York media-research firm. Nielsen monitors online “buzz” for clients, including major drug makers, which buy data gleaned from the Web to get insight from consumers about their products, Nielsen says.
“I felt totally violated,” says Bilal Ahmed
(Link: ‘Scrapers’ Dig Deep for Data on the Web – WSJ.com)
Here at Live Labs we’re all about experiments, and Pivot is our most ambitious to date. Pivot makes it easier to interact with massive amounts of data in ways that are powerful, informative, and fun. We tried to step back and design an interaction model that accommodates the complexity and scale of information rather than the traditional structure of the Web.
(Link: Microsoft Pivot – easier to interact with massive amounts of data in ways that are powerful)
In our case, we use Cascading as our step up in abstraction on top of Hadoop.
S3 -> EC2 -> Cloudera -> HDFS -> Hadoop -> Cascading -> Clojure. I’m not sure if those layers are exactly the right order, but you get the point. The key is go keep layering until you encapsulate the plumbing and get to the level of abstraction that lets you focus on solving your problem.
(Link: How FlightCaster Squeezes Predictions from Flight Data » Data Wrangling Blog)