The traffic inside Facebook's data centers is growing so fast that the company is changing the basic architecture of its networks in order to keep up.
The new design, which Facebook calls a data center fabric, doesn't use a new link speed to make the network run faster. Instead, it turns the whole system into a set of modules that are less expensive and more widely available than what the company's using now. It's also easier to deploy and manage, according to Facebook's networking chief.
Unlike older hierarchical networks, the modular design can provide consistently fast links across the data center for any two servers to talk to each other. The new architecture was used in a 476,000-square-foot (44,000-square-meter) data center that goes online today in Altoona, Iowa. Facebook plans to use it in all newly built centers and retrofit older facilities as part of its regular upgrade cycle.
Facebook and other companies with sprawling Internet data centers have turned to homegrown or inexpensive white-box gear for networking as well as for computing. They add their own software on top of that hardware, which can mean they don't buy dedicated products from networking specialists such as Cisco Systems. Though most enterprises don't have the network scale or in-house expertise to do the same, software-defined technologies developed and shared by these trailblazers are changing some aspects of networking.
Facebook's current data-center networks are based on clusters, each of which may have hundreds of racks of servers linked together through a massive switch with high-speed uplinks to handle all the traffic that the servers generate. That's a traditional hierarchical design, which makes sense when most traffic goes on and off the Internet, said Najam Ahmad, vice president of network engineering.
The problem is, most of the communication in a Facebook data center now is just Facebook talking to itself. The applications that organize shared content, status updates and ads into the familiar news feed are highly distributed, so what the company calls "machine-to-machine" traffic is growing many times faster than the bits actually going out to the Internet.
Hundreds of racks per cluster meant hundreds of ports on the switch where all those racks link up. That's an expensive and specialized need, and it was getting worse.
"We were already buying the largest box you can buy in the industry, and we were still hurting for more ports," Ahmad said.
In addition, traffic between servers often has to get from one cluster to another, so the company had to constantly worry whether the links between those big clusters were fat enough.
What Facebook needed was a network that could keep carrying all those bits internally no matter how many there were or which servers they had to hit. So in place of those big clusters, it put together pods: much smaller groups of servers made up of just 48 racks.
Now Facebook just needs switches with 48 ports to link the racks in the pod and 48 more to connect with other switches that communicate with the rest of the pods. It's much easier to buy those, and Facebook could even build them, Ahmad said.
With the new architecture, Facebook can supply 40-Gigabit Ethernet pipes from any rack in the data center to any other. Rather than oversubscribing an uplink between two switches and assuming that all the racks won't be sending data full-throttle all the time, it can equip the data center to handle maximum traffic all the time, a so-called non-blocking architecture.
The identical pods and standard fabric switches allow for easy expansion of both computing and network capacity, Facebook says.
"The architecture is such that you can continue to add pods until you run out of physical space or you run out of power," Ahmad said. In Facebook's case, the limiting factor is usually the amount of energy that's available, he said.
The company has also developed software to automatically discover and configure new components and automate many management tasks. In fact, the fabric switches it uses have only standard, basic capabilities, with most other networking functions carried out by Facebook's software.
Facebook has developed its own top-of-rack routing switch, called the Wedge, which wasn't ready for the initial deployment in Altoona but can be used in the network fabric in the future. The architecture calls for servers to connect to the top of the rack over 10-Gigabit Ethernet.
In Altoona, Facebook has been able to design the data center from the start for its new network architecture. It's deployed fiber throughout the facility, installed core network equipment in the center and used straight-line fiber runs that are as short and direct as possible. When needed, the network fabric can be upgraded from 40-Gigabit Ethernet to 100-Gigabit and beyond, Ahmad said.
In addition to sharing the core concepts of the network fabric, Facebook says it may later share designs and code with other companies. The company has made some of its data-center technologies available through the Open Compute Project that it founded in 2011.