How Facebook dealt with its big data problem

Former technical lead Kannan Muthukkaruppan explains how the company changed its storage to cope with 250 billion photo uploads

For a social media company that manages billions of messages and photo uploads per year, Facebook's ability to keep on top of that didn't come without experiencing some “growing pains”, according to former technical lead Kannan Muthukkaruppan.

Muthukkaruppan, who now works at Nutanix in the United States as an engineer, spent almost six years at Facebook from 2007 to 2013, a time when the social media company made significant changes to its data storage due to growing pains.

Muthukkaruppan gave an example where all user details, which included people’s profiles and list of their friends, were kept in an MySQL open source data base.

“MySQL is good for some workloads but not data intensive workloads. Everything was bursting at the seams in 2007 so we had to look at this,” he said.

Within the company’s data warehouse, information was kept in an open source file system called Hadoop distributed file system (HDFS).

Muthukkaruppan said the social media company was constantly buying new servers because it was running out of storage space for photos. It was using network attached storage (NAS) for photo storage.

“The challenge Facebook had was to design infrastructure out of simple building blocks that you don’t have to throw away every two years,” he said.

To solve the growth issues, Muthukkaruppan considered HBase, a distributed database that can scale to petabytes.

Facebook messaging was the first application that used HBase in 2010.

“[Facebook CEO] Mark Zuckerberg’s vision was to unify all mediums of communication into a single product. The new version of Facebook messages meant that every chat message had to be stored,” he said.

This was because Facebook chat was generating five billion messages per day in 2010 with a user base of 350 million. Today, the amount of messages is at 10 billion.

In 2007, there were 1.7 billion photos uploaded to the site. By 2013, this had grown to 250 billion photos per year.

By November 2011, Facebook was growing at 20 petabytes of disk space per year.

“All of this data needs to be protected for disaster recovery purposes so we gave up the traditional NAS. We went with servers and x86 Intel boxes. That gave us huge cost savings,” he said.

Follow Hamish Barwick on Twitter: @HamishBarwick

Follow Techworld Australia on Twitter: @Techworld_AU

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags social mediaopen sourceFacebookbig dataNutanix

More about eBayFacebookIntelMySQLNASNutanix

Show Comments