Techworld

How Facebook dealt with its big data problem

Former technical lead Kannan Muthukkaruppan explains how the company changed its storage to cope with 250 billion photo uploads

For a social media company that manages billions of messages and photo uploads per year, Facebook's ability to keep on top of that didn't come without experiencing some “growing pains”, according to former technical lead Kannan Muthukkaruppan.

Muthukkaruppan, who now works at Nutanix in the United States as an engineer, spent almost six years at Facebook from 2007 to 2013, a time when the social media company made significant changes to its data storage due to growing pains.

Muthukkaruppan gave an example where all user details, which included people’s profiles and list of their friends, were kept in an MySQL open source data base.

“MySQL is good for some workloads but not data intensive workloads. Everything was bursting at the seams in 2007 so we had to look at this,” he said.

Within the company’s data warehouse, information was kept in an open source file system called Hadoop distributed file system (HDFS).

Muthukkaruppan said the social media company was constantly buying new servers because it was running out of storage space for photos. It was using network attached storage (NAS) for photo storage.

“The challenge Facebook had was to design infrastructure out of simple building blocks that you don’t have to throw away every two years,” he said.

To solve the growth issues, Muthukkaruppan considered HBase, a distributed database that can scale to petabytes.

Facebook messaging was the first application that used HBase in 2010.

“[Facebook CEO] Mark Zuckerberg’s vision was to unify all mediums of communication into a single product. The new version of Facebook messages meant that every chat message had to be stored,” he said.

This was because Facebook chat was generating five billion messages per day in 2010 with a user base of 350 million. Today, the amount of messages is at 10 billion.

In 2007, there were 1.7 billion photos uploaded to the site. By 2013, this had grown to 250 billion photos per year.

By November 2011, Facebook was growing at 20 petabytes of disk space per year.

“All of this data needs to be protected for disaster recovery purposes so we gave up the traditional NAS. We went with servers and x86 Intel boxes. That gave us huge cost savings,” he said.

Follow Hamish Barwick on Twitter: @HamishBarwick

Follow Techworld Australia on Twitter: @Techworld_AU

Tags open sourceNutanixbig datasocial mediaFacebook

More about eBayFacebookIntelMySQLNASNutanix

1 Comment

Big data conference

1

Nice article, Big data is the future and is playing key role in operations. Register for Apr 4-6 Big Data Bootcamp-Austin http://globalbigdataconference.com/32/austin/big-data-bootcamp/event.html

Comments are now closed

Twitter Feed

Featured Whitepapers