The Open Compute Project began in 2011 when Facebook published the designs of some homebrew servers it had built to make its data centers run more efficiently.
Facebook hoped that other companies would adopt and adapt its initial designs, pushing down costs and improving quality – and they have: Sales of hardware built to Open Compute Project designs topped $1.2 billion in 2017, double the previous year, and are expected to reach $6 billion by 2021.
Those figures, from IHS Markit, exclude hardware spending by OCP board members Facebook, Intel, Rackspace, Microsoft and Goldman Sachs, which all use OCP to some degree. The spend is still a small part of the overall market for data-center systems, which Gartner estimated was worth $178 billion in 2017, but IHS expects OCP’s part to grow 59 percent annually, while Gartner forecasts that the overall market will stagnate, at least through 2019.
Reasons to adopt OCP
When Facebook designed the hardware for its first dedicated data center in Prineville, Ore., it wanted to make savings on three fronts: energy, materials and money.
It boosted energy efficiency by cutting wastage in the power supply and by making the servers taller, which left room for bigger, more effective heatsinks and meant that it could use fans of a larger diameter, able to move more air with less energy.
By doing away with vanity faceplates, paint, logos, unneeded expansion slots and components such as video cards and even mounting screws, it saved more than 6 pounds of material per server.
That inevitably led to cost reductions, as you don’t pay for electricity you don’t consume or parts you don’t use. On top of that, it made savings on labor: Without the mounting screws, racking and unracking servers was quicker; standardization saved time dealing with spares, and overall systems coulc be deployed more quickly.
Barriers to adopting OCP
In its 2018 spending study, IHS Markit identified the three main barriers to the adoption of OCP hardware as being concerns about security, sourcing, and integration.
One of the risks of giving everybody the specification to make OCP hardware is that anybody can make it: Somebody could tamper with it before delivery, and nobody would be any the wiser. In other words, supply chain security becomes a problem.
At the OCP Summit held in San Jose in March 2018, OCP leaders said they were addressing supply chain security with the creation of a new Security Project focusing on the creation of a standard hardware interface and protocols for ensuring boot code integrity.
Microsoft has already contributed its Project Cerberus, a hardware root of trust for firmware on the motherboard designed to comply with NIST 800-192, Platform Firmware Resiliency Guidelines.
Building on this base, they also plan to develop security firmware APIs, open-source firmware for dedicated security hardware, secure firmware provisioning methodologies, and tools to secure and verify all mutable storage, including flash for BIOS, microcontrollers and complex programmable logic devices (CPLDs). In this way, enterprises taking delivery of OCP hardware can be sure it’s only running the firmware they expect it to be running.
The project leads aren’t just concerned about new hardware: They’re also thinking of the second-hand gear. To secure the resale market, they will look into providing tools for recovering hardware from a compromised or untrusted state, and for tracking and changing its ownership.
Integrating hardware and software is getting easier – particularly at the operating system level since Microsoft joined the OCP board and contributed designs for the racks and servers it is now using to deliver Azure services to its customers.
There is still work to be done at other levels, including the very lowest, the firmware that enables OCP servers to boot up.
That’s where another new OCP initiative comes in: the Open System Firmware Project. It’s working on open-sourcing the code that initializes server chipsets so that it can be used on a variety of platforms and processor types. Building on projects like UEFI and Linux Boot, it aims to provide support for all cloud operating systems and processor architectures found in the data center, including GPUs, FPGAs and other hardware optimized for applications such as machine learning.
With Open Compute hardware increasingly finding a role in network virtualization, there are also moves afoot to integrate open software and hardware here, too. Traditional networking equipment vendors like Cisco Systems or Juniper Networks tightly link the two, delivering proprietary software tailored to proprietary hardware.
OCP is working with the Linux Foundation to integrate its hardware with that organization’s Open Platform for NFV (OPNFV) software, and the two recently renewed their commitment to joint testing of hardware and software products meeting their respective specifications.
Where to buy OCP gear
Sourcing Open Compute Project hardware is getting easier. The project website features a marketplace through which you can research equipment specifications and contact Open Compute Project vendors.
There are over 100 products referenced that have achieved either OCP Inspired or OCP Accepted recognition. The OCP Inspired label can only be used on products that comply fully with an existing, accepted OCP specification and that are made by an OCP Silver, Gold or Platinum member. OCP Accepted products can be made by anyone, but they too must comply fully with an existing, accepted OCP specification and open-source design files must be made available for them.
Types of Open Compute Project hardware
The early focus on OCP servers and power supplies has grown to encompass racks, storage and Open Compute Project networking. The organization has even accepted a specification for open Wi-Fi hardware.
While OCP servers started out as simple, commodity devices, there have been moves since to tailor servers to different workloads and, inevitably, the computing demands of machine learning applications have influenced those designs.
Facebook in particular is continuing to push the OCP server envelope. At the 2018 U.S. OCP Summit in San Jose, it showed its third-generation machine learning platform, Big Basin v2. This uses Nvidia Tesla V100 GPUs, a step up from the P100s used in Big Basin v1. Individually, the new processors offer two-thirds more performance than the previous ones, and Facebook said that thanks to some other tweaks to the Big Basin design, it had managed to retain almost all that performance gain as the number of processors used increases.
Facebook also showed a new system, Fabric Aggregator, designed to link neighboring datacenters within a region, and also datacenter regions with one another. Built using Facebook’s own OPC 100G switch, the Wedge 100, and FBOSS (Facebook Open Switching System) software, Fabric Aggregator allows businesses with networks like Facebook to scale intra-region and inter-region traffic independently.
Microsoft is hoping for a share of the open switching market too: Its containerized Software for Open Networking in the Cloud (SONiC) has found its way into new devices from Mellanox Technologies that allow enterprises to extend their spine and top-of-rack switches from their own premises into the Azure cloud. This could be a way for it to leverage hardware vendors’ sales into a market for its own cloud services.
Through another OCP initiative, Microsoft could end up changing the structure of the storage market too. The makers of flash storage devices and storage subsystems disagree about where the intelligence that handles address mapping, garbage collection and wear leveling should reside. Putting this intelligence in the storage subsystem may make sense for workstations or consumer devices, but in cloud services functions such as garbage collection become slow and wasteful if the controller is unaware of where the data is coming from. That’s because the storage system’s cache typically mingles data from different applications and VMs -- data that will be freed up at different times. With its Project Denali, Microsoft wants to allow OCP storage device makers to move that intelligence higher up the stack, from the SSD drive into the host, allowing it to adapt drive behavior to particular workloads.