When Google first told the world about its Tensor Processing Unit, the strategy behind it seemed clear enough: Speed machine learning at scale by throwing custom hardware at the problem. Use commodity GPUs to train machine-learning models; use custom TPUs to deploy those trained models.
The new generation of Google’s TPUs is designed to handle both of those duties, training and deploying, on the same chip. That new generation is also faster, both on its own and when scaled out with others in what’s called a “TPU pod.”
But faster machine learning isn’t the only benefit from such a design. The TPU, especially in this new form, constitutes another piece of what amounts to Google building an end-to-end machine-learning pipeline, covering everything from intake of data to deployment of the trained model.
Machine learning: A pipeline runs through it
One of the largest obstacles to using machine learning right now is how tough it can be to put together a full pipeline for the data—intake, normalization, model training, model and deployment. The pieces are still highly disparate and uncoordinated. Companies like Baidu have hinted at wanting to create a single, unified, unpack-and-go solution, but so far that’s just a notion.
The most likely place for such a solution to emerge is in the cloud. As time goes by, much more of the data collected for machine learning (and everything else, really) lives there by default. So does the hardware needed to produce actionable results from it. Give people a single end-to-end, in-the-cloud workflow for machine learning, one with only a few knobs on it by default, and they’ll be happy to build on top of it.
Already mostly realized, Google’s vision is that each phase of the pipeline can be executed in the cloud, as close as possible to the data, for the best possible speed. With TPUs, Google’s also seeks to provide many of the phases with custom hardware acceleration that can be scaled out on demand.
The new TPUs are meant to boost pipeline acceleration in several ways. One speedup comes from being able to gang multiple TPUs. Another comes from being able to train and deploy models from the same slab of silicon. With the latter, it’s easier to incrementally retrain models as new data comes in, because the data doesn’t have to be moved around as much.
That optimization—operating on data where it is to speed up operations on it—is also right in line with other machine learning performance improvements in the works, such as some proposed Linux kernel fixes and common APIs for machine learning data access.
But are you willing to lock yourself into TensorFlow?
There’s one possible downside to Google’s vision: that the performance boost provided by TPUs works only if you use the right kind of machine-learning framework with it. And that means Google’s own TensorFlow.
It’s not that TensorFlow is a bad framework; in fact, it’s quite good. But it’s only one framework of many, each suited to different needs and use cases. So TPUs’ limitation of supporting just TensorFlow means you have to use it, regardless of its fit, if you want to squeeze maximum performance out of Google’s ML cloud. Another framework might be more convenient to use for a particular job, but it might not train or serve predictions as quickly because it’ll be consigned to running only on GPUs.
None of this also rules out the possibility that Google could introduce other hardware, such as customer-reprogrammable FPGAs, to allow frameworks not directly sponsored by Google to also have an edge.
But for most people, the inconvenience of being able to use TPUs to accelerate only certain things will be far outweighed by the convenience of having a managed, cloud-based everything-in-one-place pipeline for machine-learning work. So, like it or not, prepare to use TensorFlow.