Machine learning is an umbrella term for a group of related technologies which are critical to the development of visually-aware systems. There has been a huge explosion of interest in these technologies driven by their astonishing successes and a consequent explosion in the range and diversity of tools, formats and platforms available for developers to work with.
Neural Networks in particular are being successful at pattern matching and classification tasks and their new popularity is shown by the number of incompatible development and deployment tools that are emerging. Now The Khronos™ Group (www.khronos.org) is moving to reduce confusion and enable interoperability by announcing two new standardisation initiatives in that area.
A news release from Khronos today states that a new working group has been set up to define an “… API independent standard file format for exchanging deep learning data between training systems and inference engines.” and “… the OpenVX™ working group has released an extension to enable Convolutional Neural Network topologies to be represented as OpenVX graphs and mixed with traditional vision functions.” (The full press release is available here).
The first move will establish a standard way to move defined networks from the training phase, which is typically done offline using a wide range of rapidly developing tools and techniques, and the inference phase where the network is actually used in an application. This phase, especially in embedded systems is usually mapped onto highly optimised and constrained platforms so by providing a standard exchange format, Khronos’ Neural Network Exchange Format (NNEF) will eliminate much of the need for development and deployment platforms to share implementation details.
By concentrating on a flexible format to exchange data, rather than on the structure of the actual networks being deployed, Khronos aims to avoid stifling innovation in a fast-moving technology while still achieving the objective of reducing deployment friction. According to Khronos President Neil Trevett: “So many companies are actively developing mobile and embedded processor architectures the market is in danger of fragmenting, creating barriers for developers seeking to configure and accelerate inferencing engines across multiple platforms.”
With the related move by the OpenVX working group to provide an API to a set of standardised neural network topologies as well as, crucially, to allow the import of non-standard nets developed elsewhere into the openVX graph structure, Khronos is again smoothing the path for deployment of machine learning without restricting innovation.
This move by Khronos is very timely, with many companies looking for a viable deployment path into end-user equipment while at the same time, researchers in both academia and industry are continuing to come up with novel ideas almost on a weekly basis.
Much of the impetus for the new working group came from Adasworks, one of the frontrunners in promoting machine learning software. Laszlo Kishonti, Adasworks’ CEO, commented that: “We see the growing need for platform-independent neural network-based software solutions in the autonomous driving space. We cooperate closely with chip companies to help them build low-power, high-performance neural network hardware and believe firmly that an industry standard, which works across multiple platforms, will be beneficial for the whole market.”
It is unusual to see a standards body get ahead of the market in this fashion and it is to the credit of Khronos that they are doing it is a way that is sensitive to the needs of the technology. The new working group is only just starting up and there is no word on when a ratified standard will emerge but based on the recent activity from other Khronos groups, which has been very rapid for a body like this one, perhaps we will see something within the next year.
First things first: Pokemon Go is not an augmented reality app. Yes, it has the little monsters superimposed on your phone camera feed and you can interact with them but the physical world seen through the camera has no role to play in the game. You can’t interact with it and it has no effect on anything you do. The much-publicised unintended outcomes where players have injured themselves or others simply serve to illustrate this point: the physical world is an impediment to the game, not part of it.
The only purpose of the phone in the game is to tell you that you have arrived in the right location and are oriented correctly to see the monster. The game is using GPS, compass and screen, that’s all: the camera is superfluous and the screen could just as easily display anything or nothing except the monsters without affecting the gameplay at all.
So what we have is an amazingly successful Treasure Hunt/GeoCacheing game with almost none of the attributes of augmented reality. Even so, it still has important lessons for how and why successful AR applications will make the grade and for why VR will always be a niche application.
The key feature of Pokemon Go is that it is highly social, it’s hardly a secret that the most successful online apps by far are social apps, and the more immediate it can be, the more successful an app will be. This is immediacy in the sense of responsiveness; real time, lightweight and disposable. The more an app is like chatting to real people in real life, the more appeal it has; snail mail is trumped by fax which is trumped by email and so on through text, snapchat and the rest.
We see it also in games, where quake arena is better than quake on its own. People want to be with people and Pokemon Go gives anyone who already plays the game an excuse to go out and enjoy it in the company of others. Of course, it also helps that Pokemon is an established game in its own right and the new version was made available on a ubiquitous platform so that more or less anyone could join in with no waiting, no learning curve and no real effort. Download, login and go. The trick now will be to keep them playing but that is another story.
So, out of all of this, we can discern the outlines of what can lead to a successful AR application: it must enable some existing activity in an improved way, it must involve little to no effort to take part and it must bring people together to share the experience.
These last two are the requirements that will sink VR. While there is certainly a place for immersion, it isn’t enough on its own and its corrollary, isolation, will be enough to restrict VR to a niche. Apple’s Tim Cook alluded to this recently when explaining why Apple has not yet taken any public steps into VR when he said: “Virtual reality sort of encloses ... the person into an experience that ... has a lower commercial interest over time ” and he’s right; that’s not compelling for most people.
If the requirements set out above seem daunting - how will it be possible to launch an AR app on a ubiquitous platform when it doesn’t exist yet? – Pokemon Go has another lesson for us: baby steps are acceptable. Simply adding SLAM to a phone and using that as the basis for real world interaction will take us to the next level and thanks to Tango, we will be getting that within the year.
Advances in display technology like Epson’s Moverio, perhaps, will make the headset experience better but until then, we have evidence that head mounted displays are not necessary. AR doesn’t have to be immersive which means it doesn’t have to be hands-free, leaving developers the freedom to deliver a much wider range of AR experiences without needing to address the headset problems which plague VR.
So, while Pokemon Go has caused much eye-rolling in some quarters for stealing the AR limelight while not actually being AR, it has turned out to be a very positive thing. Simply by being different, it has broken down preconceived notions of what AR is, pointed out once again what is really important and shown us a way forward based on users, rather than technology.
Augmented reality is usually seen as a successor technology to virtual reality, a different application with a different (and harder) set of problems to solve which will come into its own in a few years.
VR is here today, the problems are being rapidly knocked down, billions of dollars are being invested and there is a huge user base just waiting for the hardware to cross that availability threshold and Bam! the whole thing is going to explode.
That’s the buzz, anyway but there are a number of measured arguments being made which cast serious doubt on the viability of VR as a mass consumer technology for today, and there are good reasons to expect that the problems with VR can only be solved by merging it with AR.
Most of the arguments predicting limited success for the current crop of VR implementations are non-technical, which makes sense now that many of the technical barriers which doomed previous generations of VR are being rapidly demolished.
We have now had plenty of opportunity to try out a whole range of headsets and it is clear that the classic problems: field of view, lens correction, responsiveness, are either solved or well on the way.
Issues like user isolation, the difficulty of mapping established interactive paradigms onto VR and the problem of providing an equivalent visual experience have not gone away, or even really been addressed in a meaningful way and are the focus of these arguments.
And then there’s the headset.
Those last two issues: equivalent visual experience to the best of what is already available and the need for a headset, were the twin nemeses of 3D TV. VR boosters bristle at the mention of that epic failure, arguing that these are two different things but that is not the point. People are the same, which is what matters, and visual quality is important to them so we do need to pay attention to the lessons learned from that failure.
3D TV has the problem that, because of the need to deliver a different image to each eye, the visual quality is degraded by comparison with a standard video. This in itself has been enough to dissuade many viewers but gamers are more resilient in that regard than general consumers so why is it still a problem for VR? The answer lies in the extreme nature of the requirements and the gap between where graphics hardware is today and where it needs to be to deliver a true, no compromise experience.
The driving constraint of VR is latency, what is referred to as motion–to-photon delay, so that in addition to delivering a left/right pair, the system must also meet a very tight deadline for getting the pixels on the screen. From sensing a motion of the head, modern systems aim to get the resulting image on screen within a frame time and they aim to do it consistently, without the variations which are acceptable in a standard gaming system.
Standard game systems observe the twitch limit, the tenth of a second or so which represents the limit of human reaction time. This means that the various stages of the game can be pipelined: each stage must stay within the target frame rate but it’s OK to have the final displayed image several frames behind, as long as the overall delay stays within the twitch limit.
VR systems don’t have that luxury. Everything, including responding to the sensor plus animating the scene and then rendering it must happen an order of magnitude faster, with a corresponding reduction in the sophistication of the models and quality of the rendering. This is not a trivial point: it means that GPU performance must increase by 10 before the visual quality of today’s PC is available on a VR system and according to one study the last order-of-magnitude increase took eight years to happen. There’s a reason the best looking VR games are mostly static, and this is it.
VR has to offer a compelling reason, even to gamers, if they are to adopt it. Immersion is not enough: game play and realism are at least as important. If these are substandard then you are just immersed in a crappy game. Fortunately, VR has much more to offer than visual immersion so we are seeing games like “The Climb” from Crytek which tries to leverage that but in fact has been more useful in exposing more of the shortcomings of current equipment. Which brings us back to the headset.
If polarized glasses were an issue for 3D TV, the isolation and discomfort of a headset is the same issue in spades for VR.
I believe the hardware manufacturers when they say that the enormous sums being spent on design of lighter and more ergonomic headsets will result in devices no more inconvenient than a pair of sunglasses. But that only gets us to the same status as 3D glasses, which is still a problem, and it overlooks the real issue which is isolation. Isolation from the real environment and isolation from real people.
Once in a VR world, the user is blind to the real world, and now that we are starting to see apps and demos which encourage moving around, the disadvantages of that are starting to become obvious.
A world which gives no indication of its extent is not a comfortable place to be. Add a second VR user to the same space and the problem gets even worse; as long as the headsets are unaware of each other’s positions, there is the potential for collisions. This is a very poor user experience, no matter how compelling the immersion feels otherwise.
Without a solution to these problems, VR remains a solo activity conducted while seated or constrained to very limited movement (Google’s Tilt Brush app works nicely within those constraints but is hardly the basis for mass adoption of the technology). As long as that is the case, its consumer appeal will never cross over from enthusiast to mainstream.
Now, I am not going to argue that VR will fail completely but I am convinced that mass deployment is much further away than a lot of the hype would suggest. This is because the required technologies are coming from AR and are in a much earlier phase of development, which puts back the adoption curve by several years.
To be fair to the VR hardware companies, they are aware of the need for positioning and they have come out with systems like HTC’s Lighthouse to address it. The problem is that these are partial solutions and positioning is not the only problem.
Vision based systems can provide robust solutions to a lot of the problems outlined above, and we are starting to see VR headsets which include forward facing cameras (as well as internal, eye tracking cameras – an approach to solving the GPU performance problem).
With a camera the headset can autonomously map its environment, including stationary and moving objects such as other users, and can also eliminate the need for handheld controllers via gesture control.
More importantly, vision can convincingly locate the user within the virtual world in a way that is not possible today. The spooky disembodied hands in “The Climb” can be correctly connected to the user’s body, giving a feeling of immersion that goes well beyond the visual.
The point at which that becomes possible is the point a compelling user experience has been created. What makes it possible is vision; until now seen as the domain of augmented reality.
The downsides of including vision are that it adds expense, for the sensors plus a new vision-specific processing unit, as well as delay. Google’s Tango project plus recent events in the auto industry have shown us how much more there is to do on the software side but my opinion is that the benefits outweigh the costs, especially since I am convinced that current generation ‘blind’ headsets will soon be regarded as incomplete products.
So my conclusion is that we are not there yet for VR. It will take another hardware cycle to get the equipment right, and the succesful products will leverage a whole new set of skills coming from a different set of companies.
This year’s JPR luncheon panel discussion took a look at the growing use of game engines in moviemaking but first, Jon took time out to give the Jon Peddie Technology Advancement Award to surprise guest Tim Sweeney.
Tim Sweeney accepts the Jon Peddie Technology Advancement Award.
According to Peddie, all of the currently-available game engines were initially custom developed for specific games and only later did their creators realise the value they brought and capitalise on it after the fact. Sweeney confirmed that this was certainly true of Epic and Unreal Engine which surprised them so much that when they were initially approached by developers wanting to use it, they tried at first to discourage them: “they’d say what about that game engine? and we’d say yeah, it’s really, really expensive but they’d keep on wanting it!”
Kathleen Maher, Editor-in-Chief of JPR’s Techwatch discussed the present and future impact of game engines on moviemaking with a panel representing those ‘accidental’ businesses as well as the users of their products. Paul Doyle, CEO of Fabric Software; Jean-Colas Prunier, Creative Director at Crytek and Mark Schoennagel, a Unity evangelist were joined by Milica Zec and Winslow Turner Porter III, co-creators of VR movie ‘Giant’.
After hearing Zec and Porter describe their experience of VR moviemaking as a process of needing to “unlearn everything we knew about making movies” a description that is quickly becoming standard as directors get to grips with the demands of movies in-the-round, the panel quickly established a consensus that game engines are one of the keys to empowering directors with the freedom needed at the visualisation stage for a succesful VR movie industry but integration is a problem.
Prunier made the point that the requirements for a game engine are still very different from those placed on a real time renderer for movies, appealing for the use of ‘real time’ over game engine for that reason.
A brief foray into the world of very large dataset rendering pointed up the need for a real time tool for architectural visualisation. The need for portability and editability of models between tools is common to both industries and Prunier strongly asserted that the significance of Pixar’s open sourcing of USD should not be underestimated. This, he believes, will be a game changer that has the potential to catalyse a dramatic revision and optimisation of movie production practices.
The panel set a high bar with their wish list for future technology developments which included real time ray tracing and ultimately full light field rendering. That should keep the GPU industry in work for a while.
We are now well into the first cycle of storytelling using virtual realityand the creative community is keen to feed back their experiences, good and bad, to the people who build the new tools of their trade.
We have had two great examples of VR movie-making this week: ‘Pearl’, a Google Spotlight production directed by Patrick Osborne which tells the story of a father and daughter road trip and ‘Giant’ from co-creators Milica Zec and Winslow Turner Porter III which deals with the much darker subject of parents struggling to safeguard the emotional wellbeing of a child while trapped in a terrifying situation.
Each of these pieces chose very different approaches: Pearl being animated while Giant is live action, but both production teams zeroed in on a common set of new issues related to moving from single point of view to an environment that the viewer can inhabit in multiple ways.
Milica Zec and Winslow Turner Porter III discuss their virtual reality movie 'Giant'.
While Osborne reminded us that the art of visual storytelling is still about framing the imagery – one reason for his choice of a road trip as his subject was that car windows provided convenient and quite literal windows into his story – he also pointed out that anticipating how the viewer will choose to move around the virtual world is a difficult issue. The story might be focussed on the view through the windshield but if the viewer develops an interest in the contents of the footwell, whether and how to enable that curiosity impacts both the production flow and the arc of the story. Just how far down the rabbit hole can the director allow his audience to travel; and how can he bring them back?
The more naturally contained world of ‘Giant’ might seem to mitigate these concerns but in Zec’s words: “we had to unlearn everything we knew about filmaking” a process she likened to “walking through fire”. In their more constrained visual world, the story became more multi-sensory, with sound being used to augment the immersive experience both by intensifying the main action and also providing cues to guide the viewers’ attention within the 3D space.
The impact of all of this on workflow is profound: a high cost of providing paths into the story via multiple somewhat unpredictable routes will constrain the viability of VR as an art form. When asked for their highest priority, both teams emphasised the importance of real time, fully immersive visualisation tools that integrate as closely as possible with the overall production toolchain and that allow the director the same freedom given to the eventual viewer.
GPU hardware allied to real time rendering closely based on game engines such as Unreal, CryEngine and Unity are set to play a critical enabling role in this process, and we can expect to see a rapid move in that direction, not only as an aid to productivity but also as a foundational technology for VR moviemaking.