How many ports are enough at the core of the data center? How does 1,024 sound?
That’s the configuration we used to assess Cisco Systems’ Nexus 9516 data center core switch. In this exclusive Clear Choice test, we assessed the Cisco data center core switch with more than 1,000 50G Ethernet ports. That makes this by far the largest 50G test, and for that matter the highest-density switch test, Network World has ever published.
As its name suggests, the Nexus 9516 accepts up to 16 N9K-X9732C-EX line cards, built around Cisco’s leaf-and-spine engine (LSE) ASICs. These multi-speed chips can run at 100G rates, for up to 512 ports per chassis; 50G rates for up to 1,024 ports; or 25G rates for up to 2,048 ports. We picked the 50G rate, and partnered with test and measurement vendor Spirent Communications to fully load the switch’s control and data planes.
The results were staggering. Among the key takeaways:
- Line-rate throughput for all frame sizes in tests involving IPv4, IPv6, and multicast traffic
- Support for more than 1 million IPv4 and 1 million IPv6 routes
- Support for 10,000 IP multicast groups and 10.2 million multicast routes. Both numbers are the highest levels ever achieved in multicast testing of a single system
- Power consumption between 13-22 watts per port
The recurrent theme through all tests is highly scalable performance. The Cisco switch forwarded every single frame we threw at it in every test and never dropped even one.
A Million Routes
Switch tests usually involve loading up the data plane with traffic, and we did that, but we didn’t stop there. Besides blasting the switch fabric with IPv4 unicast, IPv6 unicast, and IPv4 multicast traffic at line rate on all 1,024 50G Ethernet ports, we also fully loaded the switch’s control plane with routing state – a lot of routing state.
In the IPv4 and IPv6 unicast tests, we enabled Border Gateway Protocol (BGP) – now a common choice even inside large data centers – and advertised more than 1 million unique routes, both in IPv4 and IPv6 tests.
To get a sense of what a million routes represents, consider that the entire public Internet consists, at this writing, of around 671,000 IPv4 and 40,000 IPv6 routes. Thus, the Nexus 9516 could route traffic to every network reachable on the public Internet and still have plenty of headroom left.
In fairness, the Nexus 9516 is a data center switch, not a core router. The latter typically communicate with multiple BGP peers, each with a different view of the Internet, resulting in much larger control-plane routing tables called the routing information base (RIB).
But routers of all kinds also install just one optimal route for a given destination into hardware, called the forwarding information base (FIB). In this test, the Cisco device installed more than 1 million unique entries into its hardware FIB, and then, on the data plane, forwarded traffic at line rate to every route (see sidebar).
That’s meaningful because data center architects increasingly use BGP, rather than an interior gateway protocol, not only for global connectivity but also to reach huge numbers of hosts withineach data center. In the largest data centers, such as those operated by web-scale companies and telcos, BGP can be the most stable choice, and the easiest to troubleshoot when something goes wrong.
To test unicast switching capability, we used the Spirent TestCenter traffic generator/analyzer to blast IPv4 and IPv6 traffic in a fully meshed pattern, the most stressful way to load up the switch fabric. We offered traffic to all 1,024 ports (see “How We Did It” sidebar). At the end of each test, we measured throughput, latency, and jitter for each of eight frame sizes.
In switch testing, throughput has a specific meaning: It’s the highest rate at which a switch forwards all offered traffic without dropping a single frame. For the Nexus 9516, reporting throughput is easy: It always moved traffic at line rate, regardless of frame size. As the test results table shows, the Cisco switch never dropped a single frame in any of our tests.
We should note that we turned off all unnecessary services to fully load the switch fabric with test traffic. This included protocols we weren't using, such Internet Control Message Protocol (ICMP), which is an important part of IPv4 and especially IPv6 operation in production networks. We also disabled an internal diagnostic service that Nexus switches use for health checks of system components (using the “
no diagnostic monitor module slot # test all” command, which is exposed to users). The only frames on the wire during our tests were test traffic, BGP keepalives and TCP ACK messages.
Some may claim, correctly, that this isn’t a “real-world” approach, but that complaint misses the point of benchmark testing. As described in RFC 2544, the industry-standard methodology for benchmarking network device performance, nonessential traffic should be disabled to make way for test traffic.
Our goal was to describe the Cisco device’s forwarding and delay characteristics under worst-case conditions (see sidebar on how we did it). Replicating some definition of real-world conditions wouldn’t work here because there’s no one definition of the term, and also because replaying production traffic won’t fully load the switch. A good benchmark must be stressful, and these tests produced the most stressful traffic conditions.
We also measured latency and jitter (latency variation), which are even more important metrics than throughput for delay-sensitive applications such as voice and video. Some high-performance computing and high-frequency trading applications also are highly sensitive to latency and jitter.
RFC 2544 also requires that latency be measured at, and only at, the throughput rate, and that’s what we’ve done here. Average latencies were less than 4 microseconds (µsec) in all cases, and maximum latencies were less than 6 µsec in all except one case (see Table 1, again). The lone exception is IPv6 traffic with 1,518-byte frames.
On the face of it, latencies in the low single microseconds seem very unlikely to affect application performance, since it usually takes milliseconds of delay before users notice something amiss.
Still, we should note that at 50G rates, it takes only about 100 nanoseconds to put a 64-byte frame on the wire, and only about 1.5 µsec to insert a maximum-length 9,216-byte jumbo frame. Why, then, the extra delay?
As noted, we designed the test to be maximally stressful on the switch. In addition to using the most stressful traffic patterns, we also used dynamic routing protocols in both unicast and multicast tests.
These protocols add a little extra traffic, such as BGP keepalive messages every 30 seconds on each of the 1,024 ports. It’s not much, but given that the pipe is already full with test traffic, it means there are situations where at least two frames will arrive at the same destination port at the same instant. This forces the switch to buffer at least one of the frames, increasing latency and jitter. Given a long enough test duration (we ran all test iterations for 5 minutes), latency and jitter can rise well above theoretical minimums. We don’t consider latencies in the low single microseconds to be problematic, even at 50G Ethernet rates.
Multicast: Scaling New Heights
Not all enterprises make heavy use of IP multicast, but for those that do scalability is a key concern. In certain industries, such as financial trading and cable TV, rapid delivery over multicast to the greatest number of users is critical, and represents literally billions of dollars in revenue.
This test broke new ground for multicast on at least three counts. We tested with 10,000 unique IPv4 multicast group addresses, nearly twice as many groups than in any previous Network World project. Moreover, we configured the switch to replicate multicast traffic to 1,023 ports, another record. And finally, because the Cisco switch used multicast routing to forward traffic, it had to build a routing table of 10.2 million unique entries (10,000 groups times 1,023 destination ports). All these attributes made this, by far, the largest-scale multicast test we’ve ever attempted.
To measure multicast performance, we configured the Spirent test tool to emulate multicast receivers on 1,023 of the switch’s 1,024 ports. Each receiver joined the same 10,000 multicast groups. Then we verified that the switch had correctly built a multicast table with 1,023 subscriber ports and 10.2 million unique multicast routes (mroutes). Finally, we had the Spirent instrument offer multicast traffic to one port, destined to all group addresses, forcing the switch to replicate every input frame 1,023 times.
As in the unicast tests, we measured throughput, latency, and jitter for each of eight frame sizes. As for multicast throughput, the switch always ran at line rate for every frame size, and never dropped a single frame in any of our five-minute stress tests (see the results table, above, again).
Latency and jitter were low and constant across frame sizes. In fact, both metrics were significantly lower than similar measurements for unicast traffic, by up to 2µsec. In part, this is because multicast’s point-to-multipoint nature is less stressful on switch fabrics than the fully meshed traffic pattern in the unicast tests.
Still, it’s worth noting that the highest maximum latency recorded in the multicast tests – 4.085 µsec, with 1,518-byte frames – is around 32 percent lower than the highest number from the IPv4 unicast tests (6.21 microseconds maximum latency, again with 1,518-byte frames).
The difference in multicast jitter is even greater. Here, the worst-case maximum jitter result (345 nanoseconds with 64-byte frames) is more than eight times lower than the worse-case maximum jitter for the IPv4 unicast traffic (2.853 µsec with 1,518-byte frames).
Simply put, multicast latency and jitter are so low with the Cisco Nexus 9516 – even at an unprecedented scale – that the impact on application traffic will be negligible.
Power Consumption: A Greener Switch
Our final test measured how much power a switch needs when equipped with so many ports. Using two of the device’s 10 power supplies, we measured current draw when idle, and when handling 64-, 1,518-, and 9,216-byte frames at line rate, using the same parameters as in the IPv4 unicast tests.
After determining that the Cisco device uniformly load-shares power across all 10 power supplies, we were able to take our measurements for two power supplies and multiply by five to obtain total wattage.
Power consumption ranged from a low of around 13.7 kilowatts when idle to a maximum of 22.5 kW when fully loaded with 64-byte frames (see Power Consumption table). Short frames consume more power because they generate more electrical state transitions; the higher the frame rate, the higher the power draw. Power usage for 1,518- and 9,216-byte frames, both around 15 kW, were much closer to the idle number than the wattage with short frames.
These numbers come with two caveats. First, on its face, a 22,543-watt power budget for one switch seems like a lot, and it is. But remember that this is a rather large data center core switch, fully loaded with 1,024 50G Ethernet ports.
Expressed on a per-port basis, power consumption ranges from 13.04 watts when idle to 22.02 watts when moving 64-byte frames to 1 million routes across all ports. For multi-speed 25/50/100G ports, those are actually quite efficient power numbers.
Second, all power measurements include about 3.5 watts per port needed to power up each fiber-optic transceiver. Transceivers aren’t part of the switch per se, but then again a customer can’t forward traffic without them. That’s why we do not subtract transceiver power from these measurements.
The Nexus 9516 broke a lot of new ground in this project: It has the highest 50G port count, and the by far highest port density in any Network World switch test. It routed traffic to more than 1 million IPv4 and IPv6 routes, and to a record-breaking 10.2 million multicast routes. It delivered low and consistent latency and jitter, and never dropped a single frame in any of our stress tests. For such a large system, its power consumption is quite reasonable, especially when calculated on per-port basis. Highly scalable numbers like these, both on the control and data planes, means the Cisco Nexus 9516 offers network architects a degree of future proofing for today’s data centers and tomorrow’s.
Network World gratefully acknowledges the support of test and measurement vendor Spirent Communications throughout this project. In addition to supplying its Spirent TestCenter traffic generator/analyzer with quint-speed MX3 traffic modules, Spirent’s Brooks Hickman, Morgan MacDonald, Bob Paull, Vijai Raghu, and Bala Ramakrishnan provided extensive engineering support before, during, and after production testing.
(David Newman is president of Network Test, a network benchmarking and design consultancy in Westlake Village, Calif. He can be reached at email@example.com.)