(Full Story: The Etsy Shard Architecture: Starts With S and Ends With Hard)
High Scalability – 7 Years of YouTube Scalability Lessons
Cheating – Know How to Fake Data -
Awesome technique. The fastest function call is the one that doesn’t happen. When you have a monotonically increasing counter, like movie view counts or profile view counts, you could do a transaction every update. Or you could do a transaction every once in awhile and update by a random amount and as long as it changes from odd to even people would probably believe it’s real. Know how to fake data.
(Full Story: High Scalability – 7 Years of YouTube Scalability Lessons)
How Much Would Debian Cost to Develop? How About $19 Billion?
The developer version of Debian GNU/Linux (“wheezy”) contains 17,141 packages of software, or 419,776,604 lines of code. With that figure, James Bromberger estimates that Debian would cost about $19.1 billion to produce. Bromberger also looks at the cost of individual projects like PHP, Apache and MySQL. Even at more than $19 billion, the figure is likely far short of what it would actually cost to produce.
(Full Story: How Much Would Debian Cost to Develop? How About $19 Billion?)
Facebook Engineering – Building Timeline: Scaling up to hold your life story
The schedule for Timeline was very aggressive. When we sat down to build the system, one of our key priorities was eliminating technical risk by keeping the system as simple as possible and relying on internally-proven technologies. After a few discussions we decided to build on four of our core technologies: MySQL/InnoDB for storage and replication, Multifeed (the technology that powers News Feed) for ranking, Thrift for communications, and memcached for caching. We chose well-understood technologies so we could better predict capacity needs and rely on our existing monitoring and operational tool kits.
(Full Story: Facebook Engineering – Building Timeline: Scaling up to hold your life story)
Calculating number of Connection to MySQL | commandlinefu.com
$ mysql -u root -p -e”show processlist;”|awk ‘{print $3}’|awk -F”:” ‘{print $1}’|sort|uniq -c
(Full Story: Calculating number of Connection to MySQL | commandlinefu.com)
MySQL Leads Open Source DB Market Share
MySQL accounts for nearly half of the population to date, but MariaDB is coming on strong.
(Full Story: MySQL Leads Open Source DB Market Share)
Facebook trapped in MySQL ‘fate worse than death’
According to database pioneer Michael Stonebraker, Facebook is operating a huge, complex MySQL implementation equivalent to “a fate worse than death,” and the only way out is “bite the bullet and rewrite everything.”
(Full Story: Facebook trapped in MySQL ‘fate worse than death’)
Replicate MySQL to MongoDB with Tungsten Replicator
You can now replicate data from MySQL data to MongoDB using Tungsten Replicator, an open source data replication engine for MySQL. It’s sponsored by Continuent, makers of Tungsten Enterprise.
(Full Story: Replicate MySQL to MongoDB with Tungsten Replicator)
Tips for Tuning MySQL
- Innodb parameters – the 4 “Usual Suspects”
- innodb_buffer_pool_size
- innodb_log_buffer_siz
- einnodb_thread_concurrency
- innodb_flush_method
- Bonus #0 – flushing logs at transaction commit
- Bonus #1 – mysql logging
- Bonus #2 – linux ‘swappiness’
- Bonus #3 – my “Virtualization Rant”
(Full Story: Tips for Tuning MySQL)
Sample datasets for benchmarking and testing – MySQL Performance Blog
- The venerable sakila test database: small, fake database of movies.
- The employees test database: small, fake database of employees.
- The Wikipedia page-view statistics database: large, real website traffic data.
- The IMDB database: moderately large, real database of movies.
- The FlightStats database: flight on-time arrival data, easy to import into MySQL.
- The Bureau of Transportation Statistics: airline on-time data, downloadable in customizable ways.
- The airline on-time performance and causes of delays data from data.gov: ditto.
- The statistical review of world energy from British Petroleum: real data about our energy usage.
- The Amazon AWS Public Data Sets: a large variety of data such as the mapping of the Human Genome and the US Census data.
- The Weather Underground weather data: customize and download as CSV files.
(Full Story: Sample datasets for benchmarking and testing – MySQL Performance Blog)


May 25, 2012
