Google Brain announced release 1.0 of its machine learning (ML) library yesterday at the TensorFlow Developer Summit in Mountain View. ML is a method of programing computers with data to make highly reliable predictions, instead of creating a program in a language like Java, C# or Python.
ML is a more efficient method of solving problems such as image recognition, language translation and ranking things like comments and recommendations. Google, along with Facebook, IBM and Microsoft has used ML internally to solve problems like ranking search results.
A little over a year ago Google released TensorFlow based on its experience with its proprietary ML library, DistBelief. TensorFlow is in use within Google with about 4,000 source code directories including TensorFlow model description files. It is used in many applications including Google Search, Maps and Gmail’s spam filter.
According to Google senior fellow and artificial intelligence superstar Jeff Dean, TensorFlow is the most popular ML repository on Github. Some of the measures he cited were: 500 independent programmers developed and submitted their software, 1,000 commits per month (application of new code modules and patches,} and 5,500 independent Github repositories, with the name Github in them. The University of Toronto, Berkley and Stanford are teaching Machine Learning with TensorFlow as a framework for teaching ML courses.
Creating original ML models requires highly skilled developers with advanced training in linear algebra, probability, and ML which many developers do not possess. Other developers, though can apply it. There are three types of ML applications, that can be applied by developers with differing skill sets.
Experts: Original ML models, for example, a model that learns to play Atari Deep Q better than a human are built by a minority of developers with specialized skills in linear algebra, probability, and machine learning.
Domain Experts: There is a second group, developers who can take an existing model such as Inception v.3 image recognition and apply with domain-specific expertise to solve problems in fields of their expertise. In the cases cited at the Summit, the field was healthcare. Inception V3 recognizes many different types of objects within images and can be retrained to recognize new types of images.
Stanford Ph.D. student Brett Kuprel presented an application of Inception V3 that differentiates images of benign lesions from skin cancer and between benign and malignant skin cancer based with accuracy equal to a human dermatologist that was published in the Journal Nature last month.
Google’s Lilly Peng presented another application, published in the Journal of the American Medical Association, of Inception V3 that diagnosed the major cause of blindness, diabetic retinopathy at slightly better accuracy than a human ophthalmologist.
Application Developers: Developers skilled in the use of restful APIs – almost all web and mobile developers - can use pre-trained models to add features to existing applications. Examples of this are a speech to text, language translation, and ranking comments.
Abstraction brings ML to more developers and use cases
TensorFlow Release 1.0 brings higher level constructs that enable more developers with varied capabilities to use TensorFlow in application-specific use cases.
The diagram above corresponds to the different developer skills explained earlier. Experts work to build original models with the tools in the stack beginning at Layers and below, domain experts at the Estimator and Keras Model level of the stack and application developers at the top denoted Models in a Box.
Extensible hardware platforms
TensorFlow is a software platform that is extensible to hardware platforms at different scales, from experimentation to production.
The experimental XLA compiler will compile ML models to different hardware architectures that use the LLVM open source framework, a collection of modular and reusable compiler and toolchain technologies written in C++ that abstracts the compiler from the specific platform such as X86, ARM, and Nvidia DSP.
XLA is designed to address two important hardware scale considerations. The first, small models can be built to run on IoT and smartphone devices to extend their capabilities. Qualcomm demonstrated an 8X performance improvement on the Snapdragon 820 by moving the ML application from the CPU to the Hexagon DSP. Though Qualcomm did not use the XLA compiler, it demonstrated the potential of running optimized ML models within the performance and power profiles of a smartphone. XLA has been applied to IBM’s PowerAI distribution and Movidius’s Myraid2 accelerator so far
Today’s hardware, though, is not fast enough to meet the production requirements of large ML models. It is fast enough for research but in some cases the cycle time for researchers to train a model and get results can be days, weeks or even approaching a month. Research can prove that a researcher’s thesis can work, but it is an entirely different problem to apply this proof to production running on hundreds of data center servers used by hundreds of thousands or millions of users.
The ML deep and wide programming paradigm is new. It calls for an architecture that accommodates large matrixes and vectors and is very tolerant of reduced precision. Much of the optimization applied is to extend the capabilities for this type of computation on hardware architectures not yet optimized as a silicon system architectures for ML.
Keras integrated with TensorFlow and demonstrated by François Chollet is an open source neural network library designed to enable fast experimentation with deep neural networks with minimal coding that is modular and extensible. Chollet demonstrated an application with less than 20 lines of Keras code that analyzed a video to answer two questions, what is the woman doing? and what is the color of her shirt?
The first question is not so simple because the ML model had to determine if she was packing or unpacking. This demonstration showed the high level of abstraction and the limited amount of software that needed to be written to accomplish a fairly complex task to answer these two questions. But it is very possible, that in production the latency of this or larger ML models could be too great for a given application and the model would need optimization.
Within the stack, and within the scope of the Keras program, the performance bottlenecks could be solved by rewriting the constraining code with native C++ code. Really big models, like Google Search rankings, can not run on a single GPU or even multiple GPUs, but rather have to run on interconnected banks of GPUs communicating over a high-speed internal systems bus or fiber optic bus interconnecting multiple systems.
An ML model proven during the research phase, may not economically scale for production, especially if low latency is needed in the order of hundreds of milliseconds like the user’s expectation for a response from Google Search to his query. Applying more hardware would increase cost significantly and might not still solve the problem because builders of ML at scale are still awaiting needed capabilities to be implemented in silicon hardware architectures. TensorFlow is optimized with a set of distributed TensorFlow APIs that enables an expert to write native code in C++ to distribute and coordinate the workload across a systems bus and interconnected systems.
What’s in it for Google and why open source
Google built TensorFlow and the community around it for four reasons:
1.) Independently developed code contributions and new use cases that will extend the capabilities of TensorFlow. Try as it may, no matter how good the perks are at Google, not all expert developers want to work at Google who also want to contribute to TensorFlow. And Google’s applications of ML do not represent all the potential use cases that will define TensorFlow 2.0 and later releases. Open source licensing brings code and perspective.
2.) Google thrives by acquiring a steady stream of talent from open source communities and student interns from which the best talent can be recruited.
3.) The academic and industry artificial intelligence and ML community is hyper open, sharing and building on each other’s research. So open that Google and competitor Facebook’s R&D teams contribute to one another’s papers. A proprietary approach would slow innovation.
4.) Google can monetize the innovation from the community by improving its products and offering TensorFlow on Google’s Cloud Platform.
TensorFlow is a platform, and Google’s intention is to build it to be as pervasive as Linux and as large as Android.