It’s time to ask whether recent technological advances should make us revisit some long-abandoned dreams about interactive content

Make no mistake, this is an Oculus Rift impact projection

The ‘enhanced realism’ issues raised below are all driven by the potential implications of the extraordinary and unprecedented experiences of early reviewers of this device.

There is no question that even in the raw, incomplete form that it was recently demonstrated to invited guests in back rooms at CES, the extent to which the ‘experiential game-changer’ issues described below are now salient derives exclusively from the advances that this device represents.

If you aren’t convinced, watch the demo videos and remind yourself that despite their exceptional reactions to the experience, none of these reviewers was actually playing a game (because there wasn’t a game available to play yet: they had to make do with a ‘game-like world’) and yet the impressions of these (comparatively hard-to-impress veterans of countless demos) were nothing short of overwhelming.

They were unquestionably engaging in the kind of ‘exploratory experience’ described in this article, something which might best be described as ‘virtual world tourism’ and it would be unreasonable to expect that this experience was not likely to be as immersive as the game itself might turn out to be (when that game is available for the Oculus Rift).

Non-game experiences in game-worlds (like this one) which turn out to be exceptionally compelling, open up the possibility of making game world experiences attractive for ‘delivering’ non-game content to non-gamers, something which (bearing in mind all the caveats that I detail in this article) may turn out to be a stepping stone to finally being able to offer interactively structured consumer-shaped video content, which, as a potentially limitless source of new video content, is undoubtedly the holy grail of video content creation.

Long, long before YouTube started to give us any serious insight into what impact the widespread proliferation of ‘on-demand video’ and ‘participatory video content’ might ultimately have upon our ‘content consumption appetite’ (which up until that point was almost exclusively dominated by ‘passively’ consuming conventional broadcast network TV) industry insiders were intensely curious to see if something they called ‘interactive TV’ would become popular.

One important early idea explored by intrepid interactive TV pioneers (whose goals included many of those we still pursue today, such as ‘creating more immersive content’ and ‘turning content consumers into content creators’) was that you should be able to watch a movie (or an episode of a TV series) but instead of the ‘narrative flow’ of the content being ‘monolithic’ (where what you are watching follows a conventionally linear plot and dialogue script) you could ‘intercept’ the flow of the storyline and change the way that it all turns out, perhaps by pausing the action (or by the movie itself stopping at a critical moment in the story) and being posed a question along the lines of ‘what would you like to happen next?’ and being offered a menu of plot options.

It turned out, when anyone did research into this kind of content treatment, that interaction done this way proved expensive and showed insufficient promise (in terms of potential mass-market popularity) to attract any serious interest from the investment community (i.e., the experiments didn’t produce results where such things as ‘levels of immersion’ were consistently increased to a ‘transformative’ extent).

Part of the ‘economic’ argument against this kind of ‘interactive video content’ is that for every possibility of a ‘different outcome’ you would need to create a different scene and each of those scenes would in turn need to offer a selection of options, each with their own different scenes, and so on, creating an unmanageable escalation of content creation effort.

If a conventional, ‘linear’ two hour movie is made up of, say, sixty scenes, each two minutes long, the number of scenes in the ‘decision branching’ movie increases exponentially with each ‘outcome selection point’: many movies are already seen by the industry as costing too much to make, but this particular ‘content creation toy-box’ takes those costs to infinity and beyond.

As a way of massively reducing the effort required, implementing this kind of interaction only seems to be a practical option in the context of offering ‘alternate endings’ and while this is inevitably many orders of magnitude cheaper and more practically achievable than making the entire movie ‘interactively structured’, nobody sees the option of offering alternate endings (something which has already been included for many years as a feature on DVDs of a number of movies) as a realisation of the early promise or expectations associated with this kind of interactive TV.

As if those shortcomings and their ramifications (all of which were foreseen quite early on in the history of interactive TV research) were not a sufficiently discouraging portent for the future of this kind of interactive TV, research also reveals that there is no guarantee that the viewer will ever prefer to engage in the effort required to ‘explore the story possibilities interactively’ rather than just wanting to watch a traditional linear movie in passive, decision-free ‘couch potato’ mode.

Interestingly enough, the reason why this old chestnut is worth revisiting, is not just because there is always an interest among content creators for exploring opportunities to ‘increase viewer engagement’.

The incentive for ‘content owners’ to explore ‘replacing passive content consumption with interaction’ is much more focused upon the notion that if:

a)      it does turn out that there is some way to get viewers to get involved in ‘shaping the content consumption experience’

and that as a result of this happening:

b)      ‘new content is created’ by the audience in the course of their interaction with ‘centrally originated content’

and it turns out that:

c)      content that is created in this way can appeal to a broad audience

then a new and potentially extremely lucrative ‘user generated/shaped content-based business model’ can be envisaged.

In the light of such possibilities, a question which needs asking at reasonably regular intervals is this:

Have we got to a stage where any of the obstacles to the widespread adoption of this kind of interactive TV have been overcome?

This breaks into two subsidiary questions:

  • What are the current obstacles to ’fully  interactive TV’ becoming popular (in the light of recent technological developments)?
  • What are the attractions that would make fully  interactive TV compelling as a media format (and how do recent developments, technological, economic and cultural, impact upon our ability to provide those attractions)?

A topical debate on these questions would inevitably find itself looking to the world of realistic video games (as well as the tools employed to create them) for the relevant kinds of developments.

The underlying assumption which makes the consideration of video games unavoidably relevant in this context:

Animated actors can act out one of infinitely many potential plot variations resulting from the interactions of each of an unlimited number of viewers interacting simultaneously with the content. It’s obviously not so easy to do that with live actors.

So, many of the reasons why we might be reviewing ‘interactively structured video content’ would be to check to see if such things as ‘improvements in the quality of character animation’ were helping us answer the ‘are we there yet?’ question.

Not only this, but also, even if technological developments enable us to overcome the obstacles above, we would still need to see if there had been any new ideas on this subject coming from the ‘initial content developers’ (a term we would be forced to use to describe those developing the content that would be intended to be ‘re-shaped by viewers’, where in some sense the viewers themselves would also be content developers, once subsequent viewers were watching/interacting with content which has been shaped by its earlier viewers).

For instance, let us imagine that a certain TV series episode appears to the viewer to be ‘normal’ i.e., no warning of any need for ‘interaction’ is given at the outset (in fact this might be the second episode of a series, where the first episode was indeed entirely ‘normal’ in this way).

Suddenly, at some point, one of the characters in the story faces the viewer and asks them a question, where that question is aimed at getting some insight from the viewer as to how the plot should proceed, and the character waits for a response.

Think of ‘House of Cards’ lead character’s brief ‘asides to the camera’ being evidence that this can successfully be woven into the screenplay without undermining the overall immersiveness of the viewing experience: you just have to take that one step further.

The extent to which this nuance in any way represents an advance/improvement over ‘just stopping and presenting a textual menu of plot options’ is an open question (it isn’t as if this hasn’t been tried before in video games, the question is more about the extent to which this kind of aspect/feature will affect non-game video content, if other ‘barriers to engagement’ are also overcome).

Captions on the screen can perhaps make it clear that the story is not going to develop until the question is answered.

This does not mean that the screen needs to be ‘frozen’ or that the action needs to be in a continuously repeating ‘loop’.

The scene could instead ‘go ambient’ with the setting reverting to ‘real time’ where ‘background activity’ was going on (rather than ‘plot development’).

Notice that none of this really makes much sense (or even constitutes an important improvement) unless such key factors as the ‘interactive content creation cost escalation problem’ has somehow already been solved.

Instead, the example above just constitutes an instance of a ‘nuanced’ attempt to address the ‘passive versus interactive engagement problem’ (once again, this assumes that the cost escalation problem has indeed been solved) and in practice only represents a comparatively minor (although potentially critical) variation in the kinds of things that were tried (and deemed unpromising) when the original research into interactively structured TV content was first conducted.

In a situation where the content cost escalation could be solved, the ‘viewer shaping the content’ success/failure assessment would revolve around not just whether the ‘stop and ask a question’ approach produced either a massive ‘viewer backlash’ or genuine viewer enthusiasm, but the extent to which the viewer’s answers (where such an answer was actually given, rather than, by contrast, producing a catastrophic ‘content abandoned in frustration/disenchantment’ response by the majority of viewers) produced a subsequent experience (which essentially reflected the viewer’s influence upon the plot development) which:

(a)    retained (or potentially even increased) the viewing audience in sufficient numbers (i.e., they weren’t put off by the disruptions caused by the ‘invitation/option to interact’)

(b)    produced a sufficiently positive ‘testimonial’ assessment of the experience by the audience (so that the ‘recommend to a friend’ factor would not be inhibited, such that it could ‘go viral’ if the overall experience was sufficiently compelling

(c)    produced ‘customised/viewer shaped variants’ of the content (by dint of allowing the construction of a new plot structure as a result of a particular viewer’s responses/choices) that other viewers (who were given access to the ‘viewer-shaped’ content either in passive or interactive form) also appreciated (in a way clearly measurable in terms of viewing statistics and viewer ratings)

Bear in mind that the ‘shaping’ could be produced by just one ‘interaction pause’ or a succession of them.

However, as in the case of ‘alternate endings’, if the overall amount of interactivity in the viewer’s content consumption experience was low (few pauses and only ‘fixed’ options) then the whole ‘shaping’ exercise is likely to be seen by industry pundits as having been little more than ‘a/b testing of content variants’ (not exactly an earth-shaking innovation).

If, however, the interaction is more pervasive (from, for example, many more interaction points) or is much more sophisticated (where the ‘conversation’ at the interaction point is more ‘unconstrained’ and consequently opens up a wider range of plot development possibilities) and/or the ‘plot changes as a result of the interaction’ are more ‘shaped’ (i.e., they could not necessarily have been foreseen by the initial content developers) and this produces a disproportionately receptive response, then this will inevitably hold out the possibility of being seen as a pivotal moment in the history of interactively structured video.

From that moment until the whole thing can be shown to either be comparatively easily replicated or where subsequent failure to produce comparable results with different content and/or viewers consigns the initial success to having been a flash in the pan, the entire content industry will be on tenterhooks to discover whether the seemingly impossible dreams of the early proponents of interactively structured TV content will have finally been realised.

It’s worth taking a brief look at the kinds of technological developments that hold out the possibility of solving the problems associated with interactively structured TV (when described in the way I have described it above: there are many other things that are referred to under the ‘interactive TV umbrella’: this article is focussing exclusively on the ‘user generated/shaped/structured content’ angle).

In order for the content production costs to be unaffected by the number of possible choices that the user could make, and the number of points at which those choices could be made, the content essentially has to be generated ‘on the fly’.

In this sense (and probably in this sense alone) the kind of interactive TV we are discussing is exactly like a video game.

The aspect that would be profoundly different from any existing video game (the relevant kind of game in this comparison would be one which has an emphasis on ‘maxing-out’ visual realism) is that the ‘physical pursuit and conflict-driven elements’ which are prevalent in those games would not be the primary dynamic of ‘non-video game content’, and that key aspects of interaction with the content (and also the narrative form of the content itself) will need to be quite different.

For the moment, in this context, we can probably best label the interactively structured content that we are considering ‘non-game-like dramatic fiction’.

In this case, the only game-like feature would be the ‘contrivances’ in the script which make the ‘decision points’ compelling, or at least not unacceptably offputting.

A key question which arises at this stage (when we are considering the relationship between interactive TV and video games) is whether there is anything to be learned in terms of new opportunities to introduce increased ‘immersiveness’ into either the content or the content presentation experience.

The kinds of immersiveness (when they are successfully achieved) in passive viewing and in video games may seem similar if not identical, but (for the same reason that nobody can be sure whether the introduction of ‘stop and ask the viewer questions’ interaction with otherwise passively viewed content will be something that can somehow be done in an acceptable way) nobody is quite sure whether there is a way of successfully engaging the viewer (or gamer) when the content doesn’t have the same ‘plot dynamics’ as the video games that are popular today (which the content obviously won’t have if it is not a game).

There is a potential game-changer in immersiveness on the immediate horizon

It is not at all inconceivable that both the video-game industry and interactive non-game TV content industry will be shaken up by a new way of achieving a hitherto impossible level of ‘vicarious immersiveness’.

What if there was a way of having an experience which was much more convincingly like that of ‘physically being in a movie setting’ or ‘physically being in a game setting’ which was so much more realistic than ‘watching the action on a screen’ that it completely changed the nature of either experience.

This is usually described by pundits as a ‘Holodeck experience’.

In this ‘more realistic environment’ the increased level of realism is so ‘immersive in itself’ that all the previously established levels of dependence upon such things as ‘fulfilling missions’ (which is an essential feature of most of the relevant kinds of games) or upon the need for ‘compelling plot dynamics’ (which is an essential feature of passive movie watching) were seriously eroded.

If this were true of a game setting, the realism of the setting might be so strong that instead of just wanting to get on with ‘playing the game’ the player might be tempted to pause the game and spend time ‘drinking in the experience’ of exploring the game’s setting (in a way which they would be far less likely to be tempted to do in a game with the current, less immersive level of realism) because the new, higher level of realism had made it at least as enjoyable to do this ‘non-game based exploring’ as it was to actually play the game.

In a non-game interactively structured TV content setting, this higher level of realism might make a ‘passive viewer’ have exactly the same experience (of finding the temptation to ‘explore the movie’s setting’ irresistible) as the game player would find in their newly discovered ‘more enticing-due to being more realistic’ game setting.

However, this higher level of realism also might also produce two other effects which would constitute ‘overcoming the hitherto insurmountable barriers to interaction for passive viewers’.

Firstly, this enhanced realism could make the interaction so ‘pleasant’ that it would  overcome whatever potential negative reaction there would be to ‘pausing and requiring input’ (which is often referred to as ‘lean forward’ or ‘computer user mode’ and is known to be unacceptably unattractive to us when we want to be exclusively in ‘lean back’, or TV watching mode).

Secondly, the extent to which this enhanced realism might somehow manage to render ‘game-like  interaction’ (i.e., interaction which goes far beyond ‘stop and ask/answer a question’, despite the fact that the content is still nothing like the current ‘conflict and competition’ model of games) palatable to non-game players (or may even somehow manage to ensnare die-hard game-haters) points to what might essentially constitute a completely different kind of content and content consumption experience which has a ‘feel’ which is not quite like either a game or a movie, and possibly not even really anything like a combination of the two either.

We now need to consider the current state of other well-known aspects of ‘realism’ as it relates to creating interactively structured TV.

Realistically natural character animation

The ‘realism’ aspect is essentially addressing the extent to which ‘less than realistic characters and settings’ produce what is known in the trade as a ‘cartoony’ look, which, whilst it has an appeal all of its own in games (where in some cases the realism is deliberately reduced in order to give a more ‘stylised’ feel) has a totally ‘stigmatising’ effect on ‘live action content’.

Where are we now?

Judging by the latest games, we are getting very close to ‘live action quality’ in our animation technology, but it needs to be stated that there is a kind of ‘doubly uncanny valley’ about this: nobody is quite sure where the ‘threshold of acceptability’ of animation realism lies.

Does acceptability occur at a level where realism is not quite perfect (i.e., where it is still possible to detect that it is animation rather than live action) or is total undetectability the only widely acceptable offering, as far as viewers not rejecting the content because it is ‘really only just a cartoon’?

If perfection does turn out to be required, then it also needs to be stated that currently, we certainly can’t do animation with undetectable artifacts reliably in real time and even for ‘cut scenes’ (where the content was carefully ‘hand-crafted’ outside the viewing experience) few would claim that the uncanny valley has been comprehensively and irrevocably crossed.

The real-time vs. cut-scene issue is crucial in the user generated/shaped content issue of interactive TV.

Even if we totally nail the visuals, we still have a showstopping speech synthesis problem

Unless you can make the characters speak convincingly, even the most riveting real time plot development and totally realistic action will do little more than draw unbearably painful attention to the unacceptably robotic sounding speech emanating from the characters’ mouths.

Maybe this will mean that the very first examples of this may turn out to be comedy material (although it is mostly likely to be of the ‘unintentional humour’ variety, because even the most sophisticated AI yet developed is not very good at being able to write deliberately funny dialogue) which is a fact which brings us to the biggest problem of all:

We don’t still know how to make software that is able to write dialogue that anyone would find compelling for any other reason than to see how inhumanly clumsy (and unintentionally funny) it might be.

So the checklist (which enables us to know when ‘interactively structured video’ might be a viable source of content) would be:

Either:

Immersiveness which on its own overcomes both barriers to interaction and shortcomings of realism

when the enhanced realism of a game-like virtual reality technology platform succeeds in rendering non-game content so attractive as an ‘exploratory experience’  that it at least to some extent ‘undermines the passive viewer’s inherent resistance to interaction’ and/or it seriously undermines resistance to other well-established ‘passive viewing barriers’ like ‘cartoonyness’ and any shortcomings in terms of of speech realism well as plot and dialogue ‘compellingness shortcomings’

Or advances in realism of synthetic visuals, speech, and script

    1. when real-time character animation becomes so realistic that it eliminates the stigma associated with ‘non-live-action’ content
    2. when speech synthesis becomes so natural that it ceases to distract the listener from its content and becomes ‘fit for purpose’ as a basis for delivering dialogue that is ‘composed on the fly’ in interactively structured audio (and video) content.
    3. when AI software can successfully compose and implement compelling action scripts and dialogue on the fly in response to user interaction

At the moment, it looks as if 1 is the closest (a few years off at least, possibly something that the next but one generation of video game development engines may deliver) 2 is a bit further off (but probably less than ten years away) and 3 is still pretty much unknown.