Today we got our first in-detail look at the internals of Google’s Tensor Processing Unit, the hardware inference accelerator introduced last year but whose details have been kept under wraps until now.
The design team behind the Chip, led by Norman Jouppi, have released a paper that will be presented at the International Symposium on Computer Architecture this coming June in Toronto and, while the reveal isn’t complete (the paper at one point refers the reader to a thicket of patent filings for further details, how’s that for a sense of humor?) there is enough to get a good sense of the chip.
The basic design philosophy is simple and direct: Neural nets don’t need fancy scheduling and inference doesn’t need floating point so junk all of that and jam enough multipliers and memory onto a chip to handle the biggest inference model you can think of without needing to go back to the CPU. This diagram (shamelessly lifted from the paper) shows what they did to achieve that end.
The heart of the die is a systolic array of 64K (216) 8-bit multiplier-accumulators connected to 4096, 256-element, 32-bit accumulators and a 24Mbyte memory array. The MAC array is set up to produce 256 results per cycle, with a pipeline delay that is dependent on the instruction (matmul or conv) being executed. A matrix operation takes in variable size B*256 input and multiplies it by a 256x256 constant weight input, taking B pipelined cycles to complete. 16x8 (or 8x16) multiplies run at half speed; 16x16 drops to one quarter. Interconnect widths between blocks are 256 bytes in order to keep the array busy and the results flowing back to memory.
The chip performs two basic operations: Matrix multiply/convolution and the various flavors of activate; the rest is moving data in and out of memory and keeping the multipliers busy. It acts as a coprocessor, being fed by the CPU (rather than seeking its own work like a GPU) and is programmed using Tensorflow.
Google doesn’t give a die size, although it does reveal that the chip uses a 28nm technology and is less than half the size of a Haswell E5 2699 V3 die, although there is no saying how much less. The claimed performance for the TPU running at 700MHz is 92 TOPS for 8-bit operands while dissipating 75W.
Naturally, the benchmarks quoted show the TPU blowing away both the CPU and GPU for both outright performance and performance per watt.
The paper finishes up with a discussion of where TPU design might go in the future, concluding that higher performance is all about memory bandwidth. Interesting.
The news this week that Apple intends to start using its own GPUs starting in eighteen month’s time has left a lot of people wondering if IMG can survive this latest blow. Last year, remember, IMG cut loose a lot of employees, including CEO Hossein Yassaie (and myself) and then posted a huge loss before selling off chunks of its business, such as the Pure digital radio business.
As usual, there is a lot of hyperventilation from people who you might expect to have steadier nerves but this is a defining moment for IMG, which has never really found itself after the ill-advised acquisition of MIPS back in 2013. Apple makes up about half of IMG’s royalty revenue and it would be difficult for any company to recover from the sudden removal of that sort of income but that’s not what is happening. Let’s look into the details a little.
First some background; Apple has been hiring GPU architects and designers for years now. They snagged some big names, such as Bob Drebin and Raja Koduri from AMD way back in 2009, and there have been persistent rumors (although not from people who know anything about the subject) that Apple has been busily designing its own GPUs all these years. There’s no doubt that they are capable of doing such a thing and have now announced they will be ready within a couple of years to do just that but, 8 years later and some of the best GPU architects on the planet are only just starting to produce results? Not very likely. In fact, Koduri went back to AMD in 2013 - with a great job offer, no doubt - but also because I believe he was not getting to design GPUs at Apple.
The role of Drebin and company seems to have been to drive the GPU suppliers of choice, Intel and IMG, to continue to excel at their jobs, and to ensure that the GPUs they produced were properly tailored to Apple’s needs. (I believe this was one of the sources of IMG’s problems; more on that later). Although they do seem to have been adding functions adjacent to graphics, Apple were taking IMG drivers and using them unmodified, certainly until last year and probably still to this day. That fact alone should put a stake through the heart of any argument that they have been using non-PowerVR graphics (or whatever it is you need to do to kill off zombie ideas that refuse to lie down).
So, something happened fairly recently that caused Apple to decide they needed to get into the GPU business themselves and that event is fairly obvious. Things weren’t going well at IMG for several years and it is clear that numbers of key employees were looking for places to go. It’s inconceivable that they didn’t approach Apple (in fact a key senior person did join Apple in late 2015) and then last year the IMG board pushed out a CEO that Apple felt they could rely on. Two and a half years from design start to chip integration for a new architecture GPU sounds about right to me: that right there – April 2016 - is when Apple kicked off their GPU design project.
Oh, by the way, if you ask me what all those engineers and architects have been doing in the meantime? Building vision accelerators, of course! In common with almost everyone else, they see that as the prime strategic investment and held off diverting resource from it for as long as possible until IMG’s problems last year forced their hand. This also gives the lie to the narrative of Apple as the faithless monster; I’m sure they are as ruthless as any other business but no more so and not in this case.
So, what now for IMG? All that guff about patents is just hot air; sure, they have lots of patents but with at least a dozen other GPU companies successfully negotiating the so-called patent minefield, to suggest that Apple is incapable of doing the same is just laughable. That really doesn’t matter, though because they are not going to implode and with a steady hand can turn this around – although I do think that acquisition in about three years is the best they can aim for.
First off, Apple revenues will continue unchanged for the eighteen months it will take them to introduce their own device and there will be a long tail of royalty-producing products stretching out after that. The only immediate effect will be the loss of the annual license fee (and they might even be able to protect that) so IMG need do absolutely nothing right now and in fact, if they have the nerve, they can seize the moment.
Remember I said earlier that being driven by Apple has been a huge problem? Always being pulled to service their biggest customer, always being pushed to design GPUs that satisfy Apple‘s needs, and to design them first, has been a huge burden for the company. IMG’s business had always been run on a relationship basis and this suffered under the stress of working for Apple. Multiple opportunities were sacrificed in this way.
Free of that burden, IMG can now refocus their GPU design effort on products the rest of the market wants and also take the enormous engineering resource sucked up by Apple support and put it into vision - the strategic investment they have consistently failed to address.
There are a few other things the company needs to do, such as finish cutting loose businesses they have no chance of succeeding at but overall, I don’t see them as being dead yet. The big unknown is the management: this is not a time for caution. IMG is back into startup mode and this is when big ideas and big risks come into play. It could be there are too many suits and not enough rolled-up sleeves.
The Khronos Group recently launched an initiative to standardise the way VR applications access the many available hardware platforms that have arrived on the scene over the last couple of years. On its website Khronos identifies industry demand as driving the initiative and judging by the number of companies that have added their names to the announcement and the industry leaders who have supplied quotes, that is no exaggeration. Khronos supplies a graphic to show the extent of support:
And a selection of quotes will give a flavor of just how enthusiastic the industry is for this standard to come into being.
“With VR on the verge of rapid growth across all of the major platform families, this new Khronos open standards initiative is very timely. We at Epic Games will wholeheartedly contribute to the effort, and we'll adopt and support the resulting API in Unreal Engine,” Tim Sweeney, founder & CEO, Epic Games.
“Khronos’ open APIs have been immensely valuable to the industry, balancing the forces of differentiation and innovation against gratuitous vendor incompatibility. As virtual reality matures and the essential capabilities become clear in practice, a cooperatively developed open standard API is a natural and important milestone. Oculus is happy to contribute to this effort,” John Carmack, CTO, Oculus VR.
“Open standards which allow developers to more easily create compelling, cross platform experiences will help bring the magic of VR to everyone. We look forward to working with our industry colleagues on this initiative,” Mike Jazayeri, director product management, Google VR.
There are more (and you can see them in full on the announcement page) but just those three cover a huge chunk of the industry, and include three companies that are critically important to the success of this move.
2017 is being billed as a crucial year for the VR industry, with headsets and software at last coming together in a way that might just provide a breakthrough - although in my opinion the jury hasn't even left the room yet. There are rumours of new headsets, new technologies including eye tracking with foveated rendering and optical navigation systems which require new sensor technologies and new software to make use of them so standardisation can only help. A second graphic from Khronos helps to explain the problem and their proposed solution:
Their basic point is that the market is fragmented wth multiple proprietary runtimes and driver interfaces and that this impedes the creation of widespread VR experiences that can easily run across multiple platforms. Developers can only afford to support a limited number of platforms so fragmentation leads to less content choice for consumers, and slower VR market growth. Khronos’ proposed standard will include cross-platform APIs for the various sensors, tracking devices, controllers and displays that go into a VR system, solving that problem and stimulating growth of a VR software ecosystem.
This is not the first time that Khronos has moved quickly to head off market fragmentation in the face of rapid innovation; the timely introduction of the Vulkan graphics API last year and the swift move to provide API and file format support for deployment of deep learning based vision systems are two other recent examples of this responsiveness.
This is unusual in a collaborative industry body of this type and is a testament to the organisation and its members. A standard like this can only help, and if the adoption is anything like as widespread as Vulkan, it could turn out to be one of the crucial factors in making this market work.
We've reached a significant milestone here at Visualise The World with the launch of the inaugural issue of The VPU Report!
What is it? It's a direct-to-the-point, detailed analysis of vision processors, as SoC or as IP, from four of the leading companies in the field. Future quarterly issues of the report will present up to the minute information on VPUs available from a different selection of companies building up into a detailed resource for marketing and management professionals. The current issue starts the ball rolling with analysis of Movidius, Intel, Ceva and Inuitive: four of the most interesting companies operating in the consumer edge device category.
At just under fifty pages, packed with over twenty detailed diagrams, plus feature and performance tables along with concise description, the report strips down to basics and presents just the facts so that you can quickly absorb technical details in a convenient, easy to read style.
Along with that, you will find a quick summary of the companies reviewed (useful when so many entrants into this field are startups) and commentary to shed light on the potential and pitfalls of VPU design.
If you are interested or involved in the coming revolution in vision enabled devices, or if you need to know about the chips and IP that will run the vision and neural network algorithms that will power them, you will need to get this report. And occasionally there will be a scoop, such as in this issue a first public description of an as-yet unreleased device!
The report actually speaks for itself, so I've attached a pdf of the contents and introduction. Take a look, then click on the link above and I'll get in touch to give you rates and subscription options.
Machine learning is an umbrella term for a group of related technologies which are critical to the development of visually-aware systems. There has been a huge explosion of interest in these technologies driven by their astonishing successes and a consequent explosion in the range and diversity of tools, formats and platforms available for developers to work with.
Neural Networks in particular are being successful at pattern matching and classification tasks and their new popularity is shown by the number of incompatible development and deployment tools that are emerging. Now The Khronos™ Group (www.khronos.org) is moving to reduce confusion and enable interoperability by announcing two new standardisation initiatives in that area.
A news release from Khronos today states that a new working group has been set up to define an “… API independent standard file format for exchanging deep learning data between training systems and inference engines.” and “… the OpenVX™ working group has released an extension to enable Convolutional Neural Network topologies to be represented as OpenVX graphs and mixed with traditional vision functions.” (The full press release is available here).
The first move will establish a standard way to move defined networks from the training phase, which is typically done offline using a wide range of rapidly developing tools and techniques, and the inference phase where the network is actually used in an application. This phase, especially in embedded systems is usually mapped onto highly optimised and constrained platforms so by providing a standard exchange format, Khronos’ Neural Network Exchange Format (NNEF) will eliminate much of the need for development and deployment platforms to share implementation details.
By concentrating on a flexible format to exchange data, rather than on the structure of the actual networks being deployed, Khronos aims to avoid stifling innovation in a fast-moving technology while still achieving the objective of reducing deployment friction. According to Khronos President Neil Trevett: “So many companies are actively developing mobile and embedded processor architectures the market is in danger of fragmenting, creating barriers for developers seeking to configure and accelerate inferencing engines across multiple platforms.”
With the related move by the OpenVX working group to provide an API to a set of standardised neural network topologies as well as, crucially, to allow the import of non-standard nets developed elsewhere into the openVX graph structure, Khronos is again smoothing the path for deployment of machine learning without restricting innovation.
This move by Khronos is very timely, with many companies looking for a viable deployment path into end-user equipment while at the same time, researchers in both academia and industry are continuing to come up with novel ideas almost on a weekly basis.
Much of the impetus for the new working group came from Adasworks, one of the frontrunners in promoting machine learning software. Laszlo Kishonti, Adasworks’ CEO, commented that: “We see the growing need for platform-independent neural network-based software solutions in the autonomous driving space. We cooperate closely with chip companies to help them build low-power, high-performance neural network hardware and believe firmly that an industry standard, which works across multiple platforms, will be beneficial for the whole market.”
It is unusual to see a standards body get ahead of the market in this fashion and it is to the credit of Khronos that they are doing it is a way that is sensitive to the needs of the technology. The new working group is only just starting up and there is no word on when a ratified standard will emerge but based on the recent activity from other Khronos groups, which has been very rapid for a body like this one, perhaps we will see something within the next year.