A Year with MongoDB

Over the past 6 months, we’ve scaled MongoDB by moving data off of it. This process is an entire blog post itself, but the gist of the matter is that we looked at our data access patterns and chose the right tool for the job. For key-value data, we switched to Riak, which provides predictable read/write latencies and is completely horizontally scalable. For smaller sets of relational data where we wanted a rich query layer, we moved to PostgreSQL. A small fraction of our data has been moved to non-durable purely in-memory solutions if it wasn’t important for us to persist or be able to query later.

In retrospect, MongoDB was not the right solution for Kiip. Although it may be a bit more upfront effort, we recommend using PostgreSQL (or some traditional RDBMS) first, then investigating other solutions if and when you find them necessary. In future blog posts, we’ll talk about how we chose our data stores and the steps we took to migrate data while minimizing downtime.

(Full Story: A Year with MongoDB)

Is Your Product Ready for Its Close-up? | Inc.com

In general, here’s how software firms test for doneness. At the outset of a project, the company drafts something called a functional specifications document, or spec. The spec clearly documents what the product requires in order to be considered shippable. Once the items on the spec have been checked off, the product is ready to go.

This method works. But I’ve never much liked it, mostly because it requires you to define the finished product months or even years in advance. So we’ve always taken a different approach. We define, in general terms, the problems we’re trying to solve and begin by designing around those. We don’t try to predict the product’s final form, or even its full feature set. The only thing we know is where to begin.

(Full Story: Is Your Product Ready for Its Close-up? | Inc.com)

WalmartLabs is building big data tools and will then open source them — Cloud Computing News

Stephen O’Sullivan, senior director at of Global e-commerce, at WalmartLabs is prepping the retail giant to move from 10 different web sites to one and from a trial-sized 10-node Hadoop cluster to a 250-node Hadoop cluster. Along the way his team will build several tools to migrate data from the current Oracle, Neteeza, Oracle and Greenplum gear that he hopes to open source.

(Full Story: WalmartLabs is building big data tools and will then open source them — Cloud Computing News)

Behind-the-scenes look at Facebook release engineering

When Rossi is about to roll out an update, he initiates a checkin procedure on IRC. All of the developers who have submitted code for inclusion in the pending update are notified in the channel and have to respond to verify that they are present and ready for the update to go out.

When a developer doesn’t respond within a few minutes, Rossi can send a command to a bot that will attempt to get the developer’s attention through several different communication channels, including e-mail and text messages. As Rossi explained to me, he typically prefers to have all of the contributing developers on hand when deploying an update.

An important aspect of Facebook’s development culture is the idea that developers are fully responsible for how their code behaves in production. This philosophy mirrors the “DevOps” movement, which encourages lowering the wall between software development and IT operations.

(Full Story: Behind-the-scenes look at Facebook release engineering)

NoSQL Data Modeling Techniques

SQL and relational model in general were designed long time ago to interact with the end user. This user-oriented nature had vast implications: End user is often interested in aggregated reporting information, not in separate data items, and SQL pays a lot of attention to this aspect. No one can expect human users to explicitly control concurrency, integrity, consistency, or data types validity. Thats why SQL pays a lot of attention to transactional guaranties, schemas, and referential integrity.

On the other hand, it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves. Besides this elimination of these features had extremely important influence on performance and scalability of the stores. And this was where a new evolution of NoSQL data models began

(Full Story: NoSQL Data Modeling Techniques)

Adoption of Open Source Software

Tests indicated that users of any OSS system have significantly higher revenues and assets than users of proprietary systems (see Table 2). Furthermore, two logistic regression analyses showed a positive relationship between assets or revenues and open source adoption (see Table 3).

(Full Story: Adoption of Open Source Software)

High Scalability – 7 Years of YouTube Scalability Lessons

Cheating – Know How to Fake Data -
Awesome technique. The fastest function call is the one that doesn’t happen. When you have a monotonically increasing counter, like movie view counts or profile view counts, you could do a transaction every update. Or you could do a transaction every once in awhile and update by a random amount and as long as it changes from odd to even people would probably believe it’s real. Know how to fake data.

(Full Story: High Scalability – 7 Years of YouTube Scalability Lessons)

Facebook shares some secrets on making MySQL scale

Facebook’s Mark Callaghan, who spent eight years as a “principal member of the technical staff” at Oracle, explained that using open-source software lets Facebook operate with “orders of magnitude” more machines than people, which means lots of money saved on software licenses and lots of time put into working on new features (many of which, including the rather-cool Online Schema Change, are discussed in the talk).

Additionally, he said, the patch and update cycles at companies like Oracle are far slower than what Facebook can get by working on issues internally and with an open-source community. The same holds true for general support issues, which Facebook can resolve itself in hours instead of waiting days for commercial support.

(Full Story: Facebook shares some secrets on making MySQL scale)

Follow

Get every new post delivered to your Inbox.