The University of Florida is just putting the wraps on a remarkable year of IT upgrades. The school, which has a 2,000-acre campus and more than 900 buildings, installed a new supercomputer in a new data center, installed a 100Gbps link to Inernet2, and upgraded its Campus Research Network from 20G to 200Gbps while adding support for Software Defined Networking (SDN). Network World Editor in Chief John Dix got the lowdown on all of the developments from Erik Deumens, director of research computing.
You folks have accomplished an awful lot in one year. What got the ball rolling?
A few years ago the University of Florida hired a new CIO, Elias Eldayrie, and one of his primary goals was to improve the research computing infrastructure at the University of Florida. And when the Internet2 Innovation Platform movement got going he said we should be part of that. He talked to the president and provost and the VP for research and other administrators and got an agreement in-principle that that would be a good thing to do.
We wrote a proposal to NSF for a CC-NIE award, which is for campus cyber infrastructure, and got funding for switch equipment to connect to the nearest Internet2 point that had been upgraded to 100 Gig, which is in Jacksonville. And we were lucky because we had another proposal in with the NSF MRI (Major Research Instrumentation) program that was funded to upgrade the internal campus research network from 20Gbps to 200Gbps.
With the awards in place the university agreed to provide some extra funding to pay for the missing things, because there are always components that cost more. And so on the 1st of February of 2013 we deployed the connection from the University of Florida campus network to Internet2 as an innovation platform. And then the month after that we upgraded the core of the campus research network. The full campus research network upgrade has been in place since September.
Was your network tapped out or does the higher capacity just open new doors?
It's a little bit of both. We were not fully tapped out on the 20 Gig Campus Research Network, but the outgoing link was only 10 Gig, and that reached maximum capacity of 9.6Gbps several times a week. So we really were close to needing to do something, and we decided that going to 100 Gig was the best way to do it.
One of the reasons we needed the extra capacity is to support the Compact Muon Solenoid (CMS) experiment at CERN's Hadron collider. This is one of the experiments that contributed to the discovery of the Higgs Particle, which was awarded the Nobel Prize this year. We have a large research group at the University of Florida that manages what is called a Tier 2 distribution center for Hadron data. CERN takes the collider data and distributes it to about 10 labs across the world, with Fermi Lab here in the US being one of them. And within the US there are another 10 Tier 2 sites that the data gets replicated to, and the University of Florida is one of those.So we get a lot of traffic from local researchers, but also we are serving up data to the nation. Any high-energy physics researcher who wants to analyze some of that data will request data from our site. So that's why this network connection is very important to us and why it's so heavily used.
As I understand it, the networks also support your new supercomputer?
That's correct. The HiPerGator. We completed a new data center on Jan. 30. It's a 25,000-square-foot building with 10,000 square feet of machine room, 5,000 dedicated to research and 5,000 to enterprise computing. It's a Tier 3 data center and the new HiPerGator supercomputer is in the research section. That new building is connected at 200Gbps via our upgraded Campus Research Network to the point of presence where the University of Florida connects to the Florida Lambda Rail regional network and to the Internet2 access point, and also to other machine rooms on campus.
How did the Internet2 part of the project go? Any challenges?
We actually didn't encounter any challenges. Basically we carefully planned it and we got the funding and everybody was in agreement, even at the highest levels of the administration. The cost was about a million dollars. Some $380,000 of that was for a 100-Gig Brocade switch with extra 100-Gigabit ports for connection to the Campus Research Network and the rest was for the Florida Regional Lambda Rail connection and the Internet2 fees to connect in Jacksonville.
Did you stick with Brocade for the new campus network?
Yes. With a $1.5 million NSF MRI grant we got at the same time we installed several other Brocade switches to upgrade the Campus Research Network.
Is the research network separate from the campus data network?
Yes. Actually the University of Florida was a bit of a pioneer in that regard, because in 2004 when we were seeing data contention and governance conflicts between how to manage the data for research and data for enterprise security and stability, we created a 20Gbps network with separate fiber links between machine rooms that had large data processing equipment. That's the network we upgraded with this grant.
Is the whole network 200-Gig now?
The core between the most important data centers are 200-Gig, and then there are a few outliers at 40-Gig and a few more at 10-Gig. But all of these are separate fibers, and they're completely separate from the standard campus network. They also have their own governance structure, because that's important in terms of keeping the security rules simple so we can have faster turnaround. On the research network, for example, we don't have a firewall. We just have ACLs, which are higher performance.
When you say 200-Gig, I presume you're talking about multiple 100-Gig interfaces, right?
Yes. And they are both active but we use different paths so if a backhoe hits one we still have the other path.
The Campus Research Network connects what?
It links data-center-type rooms, machine rooms and special equipment rooms in about 10 buildings. It doesn't go to every building. So you have the campus network that goes to the Genetics Institute, but then there is one room inside the Genetics Institute where all the gene sequencers feed their data into a machine that is connected to the Campus Research Network so that data can be easily transferred to High Performance Computing (HPC) resources in another data center.
The other data center rooms on the research network have smaller clusters that are usually associated with certain advanced engineering labs. For instance, there is one lab that is called the Center for Autonomic Computing. It's an NSF-funded center with several grants to do advanced research on virtual machines, and they also provide Web services to a community of researchers. They're part of FutureGrid, which is another NSF-funded grant with Purdue in Indiana, which allows them to connect in a more flexible way to reach collaborators across the nation.
Then there's another machine room that has a small cluster for ocean simulation. These clusters are separately managed by research groups, so they're usually smaller resources, whereas the HiPerGator is managed by my division, which is a department under Information Technology (UFIT) reporting to the CIO, and we provide services to everybody on campus.
What was powering the existing 20-Gigabit network before the upgrade?
Switches from Force10, a company that was later acquired by Dell. With this new network we needed high-speed capability, but also wanted a robust implementation and a good roadmap for future support of SDN and OpenFlow, because that is one of the thrusts we want to explore. We not only wanted to put in higher bandwidth, we also wanted to enable our computer science students and professors to do research on OpenFlow and software-defined networks. So that was a critical component in our selection for the upgrade, and we chose Brocade because they met the requirements very well.
So you spelled out the need for OpenFlow and SDN capabilities in your request for proposals?
We made that a clear case in both proposals. And that is actually a requirement of the Internet2 organization. When they upgraded their backbone with the big award from the federal government, they basically said, "We're creating a new class of member called an Internet2 Innovation Platform member, and there are three conditions: One, you need to connect to the backbone at 100Gbps; two, you have to have an assigned Science DMZ (demilitarized zone) so you can do research without having to go through the firewall of a production enterprise network; and third, you have to have active research in software-defined networks.
And it turned out that, when we went to the April annual meeting of Internet2, the University of Florida was the first university to meet all three conditions. We had our 100-Gig connection, we had the assigned DMZ, and we had several researchers on campus doing active research with NSF GENI (Global Environment for Network Innovations) and Future Grid projects that involved SDN. So we're pretty proud of that.
What did you specify in terms of SDN support from Brocade? Did the equipment have to support OpenFlow out of the box?
Out of the box, yes. And we wanted to know their roadmap, to see they were going to stay on top of it in terms of development, because OpenFlow is evolving very rapidly. So if new features were added, we wanted to know they would commit to implement them and make them available quickly. Brocade has official statements about that, which was important. Some of the other vendors we considered made more wishy-washy statements so that's why they were ruled out.
+ ALSO ON NETWORK WORLD Planning for SDN +
Are you using the Brocade SDN capabilities yet?
We actually did something that I think is a bit innovative. While we required the Brocade switches to be OpenFlow and SDN-enabled, currently we don't actually run them in that mode. What we did is bought a bunch of small Pronto 3920 OF and 3290 OF switches from Pronto Systems (now merged with Pica8) to put behind each of the Brocade switches. So when our computer scientists are doing early stage experiments with OpenFlow, when the research is still a bit unstable, we can use these Pronto switches to support software-defined traffic, and they can break whatever they want without impeding any of the production traffic. And then once we validate that the work is stable using that second layer of Pronto switches, we can move it to the bigger switches.
So that is our long-term strategy to mitigate the risk and have a research network that at the same time can provide reliable production traffic.
I'm not familiar with Pronto Systems.
They're some of the cheapest SDN-capable, OpenFlow-capable switches you can get. They provide basic functionality, and if anything goes wrong with them we just buy another one. They're very affordable.
And you dedicate capacity between the Pronto switches?
Yes. These Pronto switches have just been deployed, and we're working with the researchers on getting some traffic going. But basically we have them on a separate VLAN, and for now they're not restricting any capacity because there isn't any. But we're observing the traffic and, if at some point it causes a problem, we will take appropriate action. Because the Campus Research Network is designed to enable innovative research, we don't want to put in something with strict policies that can cause problems.
Brocade doesn't offer its own SDN controller. Did that matter to you?
No. Because controllers are general and generic. There's a basic open-source one that you can deploy on your own on a Linux box. And until people have more experience, I think it will be a while before people buy these as appliances from vendors. We have several controllers already and that's OK.
How about the importance of OpenFlow? It once seemed like it was going to be the one thing that united the SDN community, but now many vendors are shunning it.
We think OpenFlow is important, and other vendors, even if they support something else today, will eventually join in some version of OpenFlow support. There are some use cases for which you could quickly deploy something special instead of waiting for the standard to agree. And that may be what some vendors are seeing. And with SDN a lot of the complicated logic is done by the controller, and you can have a software implementation that really performs very well. You don't necessarily have to do this in an ASIC. So I think that may be part of why there is diversity.
So SDN for you folks represents some research opportunities, but do you have any idea yet how it might benefit the running of your own network?
No. That's still quite open. We can see several use cases that we're exploring with both the scientists who need to move data and the computer scientists who have the graduate students and the knowledge to try and implement it in an OpenFlow program. Because most of those SDN applications are not going to be developed by the end users who need to move the data, we as research computing IT providers are the go-between.
Right now we're exploring some use cases where we think we'll be able to do things better. One thing we're considering the Campus Research network links specific machine rooms, and we don't necessarily want to build it out to reach other machine rooms, but there may be occasions where we want to support some activity in a building on the campus data network and temporarily, or on a regular but repeating basis, have traffic on that network treated like Campus Research traffic. Well, we see the possibility to use SDN to temporarily set up a virtual route to meet that need, giving a cost-effective and easy-to-manage way to provide better services to our users.
Some other campuses are already doing that well. Clemson, for example, doesn't have a campus research network that consists of separate physical fiber. They do it all with VLANs and VRFs, and we see that as a way we can grow without having to dedicate extra fiber.
Changing gears back to some of the other applications you support, I understand there is a lot of medical research on campus as well?
Yes, and it's a very big application and growing. The University of Florida has a big, affiliated hospital and a big academic health center, and we also have a Clinical and Translational Science Institute that does a lot of collaboration between health and clinical activities and scientists, social scientists, engineers, and computer scientists, and we're providing more and more support for their research activity. One of the tasks I have to complete in the next six months is to create a HIPAA-compliant system so that data that's spread out over all these buildings can be stored and processed in one place. So all the infrastructure we're building for the Campus Research Network will need to store that HIPAA data and allow people access to it.
Is that hard to achieve on the Campus Research Network given the different approach to security you have adopted?
This is a problem that everyone is struggling with, but some campuses like Indiana have done something that works and we will look into copying that. It takes good planning and talking to all the stake holders, but it can be done.
Speaking of big applications, what are you folks doing on the big data front?
We're deploying a virtual cloud environment so that people can provision their own Hadoop cluster. There are a couple of small Hadoop clusters in various computer science labs, but we want to provide that as a general service to everybody. So that is another project I'm working on.
The sales pitch I want to make is, instead of the university bringing a big pipe to your desk which is not scalable and we don't have the money for, I will provide an infrastructure where you can load your data in the central repository and give you the compute resources and infrastructure where you can quickly and easily do all the modern research you want. So that's where the campus research network becomes more important. So that's the vision.
To complement the infrastructure that my department of Research Computing in UFIT support, the University of Florida is creating an Informatics Institute to get the faculty and researchers to do advanced research using that infrastructure.
Read more about infrastructure management in Network World's Infrastructure Management section.