The international competition to build an exascale supercomputer is gaining steam, especially in China and Europe, according to Peter Beckman , a top computer scientist at the U.S. Department of Energy's Argonne National Laboratory.
An exascale system will require new approaches in software, hardware and storage. It is why Europe and China, in particular, are marshaling scientists, research labs and government funding on exascale development. They see exascale systems as an opportunity to build homegrown technology industries, particularly in high-performance computing, according to Beckman.
An exascale system is measured in exaflops; an exaflop is 1 quintillion (or 1 million trillion) floating point operations per second. It is 1,000 times more powerful than a petaflop system, the fastest systems in use today.
The Department of Energy (DOE) is expected to deliver to Congress by Feb. 10 a report detailing this nation's plan to achieve exascale computing. The government recently received responses from 22 technology firms to its request for information (RFI) about the goal to develop an exascale system by 2019-2020 that uses no more than 20 megawatts (MWs) of power. To put that power usage in perspective, a 20-petaflop system being developed by IBM, which will likely be considered one of the most energy efficient in the world, will use seven to eight MWs.
Beckman, the director of the Exascale Technology and Computing Institute at DOE's Argonne National Laboratory, talked with Computerworld about current developments in exascale. Excerpts from that interview follow:
The Department of Energy wants an exascale system by 2019-2020, and one that operates on no more than 20MW. What did DOE learn from the tech industry responses? About 22 companies replied. [DOE isn't disclosing the names of the responding companies.]They had a wide range of types of companies. Some were integrators; some were chip designers, software companies. All of them said that this is a great challenge and that we think we can make fantastic progress on this, but it will be really hard. We're setting pretty lofty goals, hard things. But if you start out saying that 100MW will be just fine, then you're not really pushing the envelope. The 20MW is very difficult to achieve, but we want to see new technology to make that happen, and so all of them, universally, said that was hard.
Did they ask you to adjust the 20MW requirements? All the responders said it would be a difficult target to reach without a strong investment. If we allowed them twice as much power, 40MW or 50MW, then it is much simpler. They also said that the system software and the whole software stack required an integrated approach. Most of the responses, I would say, were light on the data challenges. People know that data is a challenge, but they really focused, in the responses, on the computing.
What is the exascale data challenge? If we imagine that we have a machine that is an exascale, exaflop machine, generating petabytes and petabytes of data, it becomes its own, in some sense, computation problem. We can't solve the bandwidth storage problem by just buying more disks. A multi-level plan is what will have to evolve, including NVRAM and even novel technologies such as phase change memory . But there has to be a comprehensive data solution that includes analysis. It can't be, 'Oh, we just need to be able to store the data.' We need to look up the architecture necessary to analyze the data. If you look at Google and the other web-based technologies, they have come up with ways to store and analyze data -- a way in which you have a programming model where the storage and analysis are very close.
In computing we haven't done that yet. We've always had the model where the data is over here, the computing is over [there]; you ask for the data, you get a copy of it, you put it in the computer, you work on it a lot, and then you put it back. And so as we move to exascale, where this computing becomes really more powerful and the data sets become bigger, sloshing this back and forth is way too costly in terms of power and performance -- power, especially. It's movement that cost a lot of electrical power. We need to find to ways to compute and then analyze and do the storage and analysis closer together.
Is there anything out there like that today? Some types of data lend themselves to spreading out the computation though the data -- satellite images and other things. People have had this sort of capability for certain types of data sets. But we really need to think broadly about the problem. What you want to do is figure out ways to slice and dice the data, and do analysis on the data in an integrated architecture. And that's something that will become more important at exascale that we haven't addressed very well, yet.
What about the February exascale report due to Congress? What's that about? Congress asked the DOE for a written plan for exascale and it is to be delivered no later than Feb. 10. In the last couple of years, the labs, the scientists, have been driving this exascale discussion, because of a need to do the science, and these are big challenges: power, resilience, how to program these things. What hasn't happened is, in some sense, a formal plan from DOE for reaching exascale... [the] plan for getting us there.
Is this report the gateway to funding? Congress is not going to fund an exascale initiative without a clear plan, so real funding is gated on convincing through this plan, and through discussions, of the importance of this for the nation.
What's going on internationally to develop exascale computing? A year and half ago, the Europeans got together as part of working in this space and said, 'We need to put together a European plan.' They created this plan over the last year. In October, I was at the meeting in Barcelona when they presented the plan to the European Commission and said, 'This is what we need for exascale -- two-to-three billion Euros.' In addition to presenting this to the European Commission, which is favorably disposed, they have already boot-strapped three projects. It is a step along the way, but it is bold and it is already started and people are already working on it. If they are successful, it paves the way to put more funding into that and go take it to the next level and eventually look at building a system.
Why is it so important for Europe to develop its own system? A good way to look at this is Airbus and Boeing. An IDC report ( download PDF ) said to the Europeans: You have all this technology but its spread out through all of Europe. If you were to bring it together, you could, like Airbus, compete quite well. I don't want to put too much emphasis on this, but I think it's pretty clear that the Europeans want to develop a platform that can be sold at their supercomputer centers and sold back to us.
What about the Chinese? The Chinese are moving full speed ahead. They have a machine that is very similar in character to some of our machines. It's a water-cooled machine, with 16 cores on a die in a socket at about a petaflop in about nine racks. It's a pretty amazing feat and they are in it to win it. If you look at their investment in people, they're training up the scientists and building platforms to continue that innovation so that they can have their own homegrown industry as well; where they will own all the technology from the chip all the way to the software stack to top.
What does winning look like? Right now, if you look at China, a lot of their machines are still made from components in the U.S. However, this one machine that they built, the 16 core has its own interconnects, uses Chinese technology. What they would like to do, just like any country, is to be able to reap the benefits of developing that technology across their entire infrastructure, so that everything that's in their cell phones all the way up to their to supercomputers is jobs in China. And of course once that happens, they will be selling this back to Brazil to South America, to India. Whether or not they can sell it back into to the U.S. is a good question, but the rest of the markets are open.
Intel says it can deliver an exascale system by 2018, ahead of U.S. government's requested date. What do you think about that? I think Intel's technology is pretty exciting and they have mapped an aggressive roadmap. They have unmatched technology in the chip and in process, and if they want to go after this new piece, I think they will do very well.
Nvidia believes 2019 is possible, but also says government help will be needed. Given how far out in the future we are looking it's pretty hard to predict what date people will finish their products by. We know that there are certain things that both companies (Nvidia and Intel) would not address unless we give them government funding. For example, for scientific computing resilience is something we think is a really big issue. If you are selling a laptop you don't need to make it a 1,000 times more fault resistant, but if you put it in an exascale system, you do. That will not be developed unless the government invests in it.
The second one is power. Most people around the planet are going to buy a couple of dozen racks. For them the price sensitivity, whether it's a couple of hundred kilowatts or twice that, [is not a] big deal. But when you are talking about a machine as big as ours, that is a big deal. So putting the investment in power, in making it extraordinary lower, there probably isn't a market driver in their short-term time frame except in government exascale.
Patrick Thibodeau covers cloud computing and enterprise applications, outsourcing, government IT policies, data centers and IT workforce issues for Computerworld. Follow Patrick on Twitter at @DCgov or subscribe to Patrick's RSS feed . His e-mail address is email@example.com .
Read more about mainframes and supercomputers in Computerworld's Mainframes and Supercomputers Topic Center.