Archive | nosql RSS feed for this section

Why MongoDB Never Worked Out at Etsy

But what I can say is that if you are considering Mongo plus another database like MySQL, then in all likelihood you shouldn’t do it. The benefits of being schemaless are negated by the pain you will feel sorting out: Logging. Monitoring. Slow query optimization. init scripts. Graphing. Replication. Sharding strategy. Rebalancing strategy. Backups. Restoration. Probably like 50 other things Allspaw knows about that we developers don’t have to care about.

(Full Story: http://mcfunley.com/why-mongodb-never-worked-out-at-etsy )

Which freaking database should I use? – a CAP theorem primer

Part of the reason there are so many different types of NoSQL databases lies in the CAP theorem, aka Brewer’s Theorem. The CAP theorem states you can provide only two out of the following three characteristics: consistency, availability, and partition tolerance. Different datasets and different runtime rules cause you to make different trade-offs. Different database technologies focus on different trade-offs. The complexity of the data and the scalability of the system also come into play.

Just as we shouldn’t try to solve all of our problems with an RDBMS, we shouldn’t try to solve all of our math problems with set theory. Today’s data problems are getting complicated: The scalability, performance (low latency), and volume needs are greater. In order to solve these problems, we’re going to have to use more than one database technology.

(Full Story: Which freaking database should I use? – a CAP theorem primer)

Seven Databases in Seven Weeks – The Pragmatic Bookshelf

Redis, Neo4J, CouchDB, MongoDB, HBase, Riak, and Postgres: with each database, you’ll tackle a real-world data problem that highlights the concepts and features that make it shine. You’ll explore the five data models employed by these databases: relational, key/value, columnar, document, and graph. See which kinds of problems are best suited to each, and when to use them.

(Full Story: Seven Databases in Seven Weeks – The Pragmatic Bookshelf)

A Year with MongoDB

Over the past 6 months, we’ve scaled MongoDB by moving data off of it. This process is an entire blog post itself, but the gist of the matter is that we looked at our data access patterns and chose the right tool for the job. For key-value data, we switched to Riak, which provides predictable read/write latencies and is completely horizontally scalable. For smaller sets of relational data where we wanted a rich query layer, we moved to PostgreSQL. A small fraction of our data has been moved to non-durable purely in-memory solutions if it wasn’t important for us to persist or be able to query later.

In retrospect, MongoDB was not the right solution for Kiip. Although it may be a bit more upfront effort, we recommend using PostgreSQL (or some traditional RDBMS) first, then investigating other solutions if and when you find them necessary. In future blog posts, we’ll talk about how we chose our data stores and the steps we took to migrate data while minimizing downtime.

(Full Story: A Year with MongoDB)

NoSQL Data Modeling Techniques

SQL and relational model in general were designed long time ago to interact with the end user. This user-oriented nature had vast implications: End user is often interested in aggregated reporting information, not in separate data items, and SQL pays a lot of attention to this aspect. No one can expect human users to explicitly control concurrency, integrity, consistency, or data types validity. Thats why SQL pays a lot of attention to transactional guaranties, schemas, and referential integrity.

On the other hand, it turned out that software applications are not so often interested in in-database aggregation and able to control, at least in many cases, integrity and validity themselves. Besides this elimination of these features had extremely important influence on performance and scalability of the stores. And this was where a new evolution of NoSQL data models began

(Full Story: NoSQL Data Modeling Techniques)

Dealing With JVM Limitations in Apache Cassandra

Jonathan Ellis’s slides presented at Fosdem 2012 are covering some of the problems with GC and the way Cassandra tackles them. While this is one of those presentations where the slides are not enough to understand the full picture, going through them will still give you a couple of good hints.

(Full Story: Dealing With JVM Limitations in Apache Cassandra)

Tenzing A SQL Implementation On The MapReduce Framework

Tenzing is a query engine built on top of MapReduce for ad hoc analysis of Google data. Tenzing supports a mostly complete SQL implementation (with several extensions) combined with several key characteristics such as heterogeneity, high performance, scalability, reliability, metadata awareness, low latency, support for columnar storage and structured data, and easy extensibility. Tenzing is currently used internally at Google by 1000+ employees and serves 10000+ queries per day over 1.5 petabytes of compressed data. In this paper, we describe the architecture and implementation of Tenzing, and present benchmarks of typical analytical queries.

(Full Story: Tenzing A SQL Implementation On The MapReduce Framework)

4 Months with Cassandra, a love story | Cloudkick

Advantages of Cassandra
> Linear scalability
> Low operational costs
> Hybrid NoSQL

Administration and operational issues
> nodetool, previously known as nodeprobe
> Major compactions
> Tombstones
> Client reconnection
> Thrift issues

(Full Story: 4 Months with Cassandra, a love story | Cloudkick)

Google BigQuery Service

Google BigQuery Service is a web service that enables you to do interactive analysis of massively large datasets—up to billions of rows. Scalable and easy to use, BigQuery lets developers and businesses tap into powerful data analytics on demand.

(Full Story: Google BigQuery Service)

Follow

Get every new post delivered to your Inbox.