Real-time Frame Rate Up-conversion for Video Games (SIGGRAPH 2010 Slides)

1 / 60

Yet another SIGGRAPH is over. That was a great conference with lots of great stuff. But now it's time to look at some of the things a little closer.

[The following comments and notes are intended to explain the talk in more detail.]

2 / 60

How to Get From 30 to 60 Frames Per Second in Video Games for "Free". Presented as a part of "Split Second Screen Space" session on Monday, 26 July | 2:00 PM - 3:30 PM | ROOM 408 AB.

The talk is available on DVD set as well as online at

http://siggraphencore.myshopify.com/products/2010-tl037

Additional material is available at

http://and.intercon.ru/releases/talks/rtfrucvg

3 / 60

As we will see later, the idea of the technique presented is so simple that it might seem obvious at that point. But the truth is that it's not obvious at all. Most of its parts require looking at them from different perspectives, understanding why something is the way it is. Asking what the problem really is, why it's a problem and so on.

Moreover, it turns out that most of the implementation issues are very specific to the rendering pipeline that you might have, your memory budges and platform limitations. Other possible issues could be avoided completely at the art and design levels during production (if there are any).

Thus, the talk is designed to reach a broader audience of people who might be involved in video game production (engineers, artists, designers) and should not be understood or interpreted literally. For that reason, we are going to discuss our current real-time implementation only, as it shows most of the aspects of the technique one should think about when implementing it. Implementation itself is a lot less important.

4 / 60

Real-time live demo is shown running Star Wars: The Force Unleashed 2 on the XBox360. Comparing regular 30 fps rendering without motion blur, 30 fps with motion blur (our current solution) and real-time conversion from 30 to 60 fps.

To better feel the difference between 30 and 60 fps, one should try to look at the background while it moves over the screen. Of course, it is not just about the background, and going from 30 to 60 might not look like a big deal at first. But once you got a feeling of it, it's really hard to switch back to 30 fps as it feels quite uncomfortable.

5 / 60

"How do we come up with new ideas?" is a very interesting topic. It is very different in every case but it probably is the most interesting part of it, which is usually not discussed.

It started in 2008 on our way back home from SIGGRAPH 2008. Cory, Cedrick and I, somehow (I would really like to remember how), got into this discussion about software video players on PC. And how they can do the same thing that those 120Hz HDTV do, making regular movies look smoother, reconstructing natural motion of objects.

I thought that was great, because you don't actually need a TV to play with it. I knew that there were WinDVD with its Trimension and AVISynth with MSU FRC filter that could be applied in FFDShow codec. But it took that discussion to start thinking about it from a different perspective.

http://compression.ru/video/frame_rate_conversion/index_en_msu.html

So as soon as I got back home, I started to play with it and soon after that realized that there are a lot of issues. Mostly the artifacts of a different kind, that appear in more or less complex scenes, as well as performance issues (it is really slow when done properly). And to better understand the problem, I made a very quick and simple prototype to play with.

6 / 60

Most of the techniques have two main steps. The first one is motion estimation. At this point we try to figure out how different parts of the picture move from frame to frame. The result of such estimation is what's called Motion Vector Field.

Diamond Search is the most simple way to do the motion estimation. It is usually used in video compression codecs as a part of motion compensation solution to improve the compression ratio. But it is very limited. More sophisticated optical flow solutions require a lot more computational resources and memory, as well as introduce an extra latency when executed in real-time.

Some video players (try to) cheat by using motion vectors from the MPEG stream itself, but it is not quite correct (doesn't always work). As their main purpose is to help with compression, they don't have to match the actual motions. It is really up to the encoder.

7 / 60

The second step is interpolation itself. At this point the inner frames (A1, A2, A3, etc) are constructed from the outer ones (A, B) based on the motion vector field, the result of motion estimation.

One of the examples of how this could be done for video sequences is MSU Frame Rate Conversion Method by Dr. Dmitry Vatolin and Sergey Grishin.

http://compression.ru/video/frame_rate_conversion/index_en.html

Obviously, there are a lot of different ways of doing it, but the point is that it gets really complicated once you have to adjust for scaling, rotation, transparency and dynamic lighting effects of any kind. In fact, most of the complexity related to interpolation is due to reconstruction of the original data and conditions.

8 / 60

While playing with the prototype, which obviously had a lot of issues related to stability of the motion vector field as I used very simple estimation technique, a new idea was generated.

I was trying to think of ways to implement optical flow of the GPU (just for the sake of it, no real reason) when I noticed some similarities. Usually, when working with video processing and compression, the motion vector field is visualized as a grid of arrows, pointing in the corresponding directions. But when working with real-time graphics (in video games) it is more common to display that kind of data with color coding.

And as all the visualization that I had was working on the GPU through Direct3D, it was a lot easier to visualize the result of motion estimation with color coding. If you have ever implemented motion blur in games, it immediately becomes obvious that the two look the same or at least very similar.

This is where one story ends and another begins, because since then I never got back to the original problem and never got the optical flow working on the GPU. Maybe some other time :)

9 / 60

Now the question is "why?" Why do we even need to do any estimation, when we already know everything. In games.

We already know how things are moving as we have full control over them. This way we don't need to do any kind of estimation. Moreover, when we do the interpolation, we can handle different things differently, depending on the kind of quality we are happy with.

On top of that, we can use different interpolation techniques for different parts of the image, such as layers of transparency, shadows, reflections and even entire characters.

10 / 60

But before we go any further, let's see what it means "to run at 30 fps". There are at least two things. The first one has to do with the game itself.

Compared to 30 fps games, in case of 60 fps, we have to go with fairly simple, more old-school rendering techniques. It is not impossible to make a 60 fps game, obviously, but it requires a lot more strict production process for art, design and engineering. It is fair to say that in a lot of cases, during pre-production, studios try to see what it would take to make a 60 fps game. Then, they get something that doesn't look very pretty when running at 60, realizing that all the art has to be produced very carefully as well as level and game design.

At that point the decision making happens and, usually, it is to go with 30 fps. Running at 30 fps allows for some sloppiness in production process, which is not necessarily bad, since that provides for more team focus on different things.

11 / 60

The second thing has to do with our perception. With the way our visual system and display devices work.

It might seem obvious that the more frames we can render the better. We might think that this is all it is. But then, at some point, we might see no difference and say that we just don't need more frames to get a better perception of motion, like as if it saturates at some point. We can go on and on speculating about it, but the real question that we should ask is "why?". Why does it really look smooth when running at 60 fps?

A very simple experiment could be made. Capture your 30 fps game at a 60Hz rate and paint every other frame black. You will be surprised how good the motion looks. It will flicker really badly though, but in terms of motion it will look very smooth.

So let's see what it really is about.

12 / 60

Perception-wise it has to do with motion eye-tracking.

Our visual system has a certain temporal function. In fact, it is continuous. Numerous experiments have shown no signs of frame based processing of any kind even though we can observe effects such as "wagon-wheel effect" in real life. But those are mostly due to either stroboscopic effect or "motion during-effect", when a motion after-effect becomes superimposed on the real motion.

Display devices, on the other hand, have very different kinds of temporal functions. The one on the slide shows an idealized LCD display.

http://en.wikipedia.org/wiki/Comparison_of_display_technology

Now, suppose we have a moving object that we eye-track. At some point the display goes on, projects it on our retina, and then goes off. Meanwhile, the eye keeps continually moving to the next, expected position of an object.

http://www.poynton.com/PDFs/Motion_portrayal.pdf

http://en.wikipedia.org/wiki/HDTV_blur

http://www.lifesci.sussex.ac.uk/home/George_Mather/Motion

13 / 60

But half way to that point, the next expected position of an object, the exact same frame gets displayed again. Thus, instead of one stable picture of an object, we end up with two overlapping and, in fact, flickering pictures of the same thing.

It really looks like as if the object moves one step backwards and two steps forward all the time, and we really see all the steps. Not quite what our brain would expect from a real-life experience.

When the motion is slow, this temporal artifact is less significant, but as things start moving faster and observed on a bigger screen (HDTV), which give more space for object eye-tracking, it becomes more of an issue.

How do we reduce this temporal artifact?

14 / 60

We don't allow fast motions :) or we use motion blur (not because we see it in movies).

There is still some flickering left. It is just that both images look pretty much the same. The difference between the two is very little, so there is very little chance that something very bright will overlap with something very dark. This way it looks more even.

But real (natural) motion blur happens in the EYE, not the camera of any kind. So when we talk about applications of virtual reality such as video games, we think of a camera as more of a portal, a window. Which doesn't have any temporal properties of its own.

Naturally, when looking at something moving, we try to eye-track it in order to reduce the blur. But when it's blurred relative to the camera and not to the eye, the object appears blurry all of the time, whether we eye-track it or not, thus the sharpness is reduced. Well, we can't even talk about any kind of sharpness at that point.

Ideally, motion blur should not be used at all if we could render and display a moving object at extremely high frame rates since the photon flow into the eye would be continuous.

15 / 60

In movies though it's less of an issue (the use of motion blur), due to non interactive nature of the media and good assumptions of cinematographers about what viewer's eyes are gonna be looking at. Of course, those are more than just assumptions. There are artistic ways to control the eye (e.g. the use of depth of field) but it is not enforced in any way. Just for fun, the next time you go to a movie theater, try not to look at the center of the screen. Always look around and try to eye-track the background. You will see how bad they really are, especially the action scenes.

While eye controlling works for movies, it doesn't quite work for action games that way. You can't make those kind of assumptions. Imagine you stand in the middle of the room and spin the camera around the character. Is it the character you are looking at? Or you are looking at the background, searching for a door and an enemy? Most likely is it going to be the second option, the opposite to the movie. Because whatever is near the character is taken care of by your hands with the controller, you would most likely to look around.

On the picture above is an example of motion blur in games. Our current implementation of it in Star Wars: The Force Unleashed 2 on PlayStation3.

16 / 60

So how do we get both, the quality of 30 fps rendering and fluid natural motions of games running at 60 fps?

17 / 60

18 / 60

19 / 60

The idea is very very simple. We do frame rate up-conversion :)

Render the velocity buffer as we would do for motion blur. Build the middle frame. And present it at the moment in time it was intended for.

Note that in case of 30 to 60 fps conversion, the inner frame has to be presented at the middle of the frame. This is all it is, no more, no less. The rest is implementation itself, which is rather tricky.

20 / 60

In order to be practical, this whole process has to be very fast with minimal memory and performance overhead. It might not seem like a big deal on PC, but in the console world it is. Looking for an extra megabyte of memory as well as an extra millisecond of GPU time might be very challenging. Such that, ideally, we need to minimize the overhead as much as possible (unless you don't have those issues), reusing as much available data as possible.

When running at 60 fps, we can get away without the motion blur, especially on consoles, when playing with a controller and not the mouse. Thus we can not really have more than 2-3 ms to do the job, which is the average cost of motion blur. Otherwise, it would not be practical.

It is possible and it is doable. Let's see.

21 / 60

Typical rendering pipeline of modern video games looks something like this. We render the depth buffer first, to get advantage of the Hi-Z early rejection, as well as use it for further deferred screen-space techniques like shadows, lighting and SSAO (Screen Space Ambient Occlusion). Then we do the main color pass, using the result of shadowing, lighting and occlusion. Render transparent objects. And then do all the post processing effects at the end, resolving the rest of HDR effect, depth of field, motion blur and anti-aliasing.

[DLAA (Directionally Localized Anti-Aliasing) - our custom anti-aliasing algorithm implemented in Star Wars : The Force Unleashed 2. In terms of picture quality it is comparable with MLAA, but due to perception based probabilistic nature it features high temporal stability and implemented on both the GPU (X360, PC) and PS3 SPUs ]

22 / 60

The trick is that between the beginning of the frame and the middle, where the interpolated frame has to be presented, there are 16.6 ms. If depth buffer rendering takes about 2-3 ms (it should not be more that 10%) and the technique itself 2-3 ms, then there is plenty of room to do everything before it gets presented.

Therefore, as soon as the depth buffer is ready (at the beginning of the frame), we render the velocity buffer and do the interpolation. Construct the inner frame, which is a new front buffer, and flip it in the middle. It is most likely that the GPU will be working on the current frame at the time when the flip happens. That might require a custom flipping mechanism though that we will discuss later.

23 / 60

The most simple and efficient solution is to do the interpolation in-place, during the current frame, while the previous one is on screen. This way the previous front buffer can be mapped as a texture (X360) and used for interpolation directly. In terms of latency there is something interesting going on. I said that there is no extra latency, which is true. But if you think about it, latency is actually reduced. Because we get the new visual result 16.6 ms earlier. You see the result of your actions earlier.

All the cons are manageable in this or that way. Usually that depends on what kind of motions you have. Camera as well as character motions.

For instance, in cases of fixed camera there are almost no issues. But when the camera is freely moving, this might cause reflections to move in the wrong direction. Whereas shadows are not interpolated at all. So shadows tend to look like they are rendered at 30fps and displayed at 60. In case of soft and transparent shadows it is not very noticeable.

So it is really about the latency and the memory.

24 / 60

Interpolation, this is where the real fun begins.

Now I must to emphasize one more time that there are many ways of doing the interpolation and it should be customized or done individually in each particular situation. Moreover, it should be thought of as a way to construct a new frame from the cache, which is everything else (previous or next frames).

It turns out that most of it is artifact fighting and the final quality depends on how well we can handle those artifacts. One interesting thing though, is the fact that it doesn't have to be perfect. We perceive the interpolated frame in a sequence. While looking for even better interpolation solution, the other day, I found something funny. There was a bug in the code, which led to very strong distortions of the frame, but I could not notice that there was something wrong with every other frame until I hit the pause. It was a tree with lots of leaves, and every other frame it didn't look like a tree at all. More of a big green blob of noise.

That made it easier, because in the end it's all about perception. Interpolation should be implemented and tested in real situations and always tested in motion.

25 / 60

Alright, now let's discuss our current (one frame based, real-time) implementation step by step.

Let's do the most simple thing and sample the previous frame based on the current velocity half way backwards (direction depends on your particular convention for velocity vectors). It is not quite correct, because the velocity buffer is constructed for either the previous or the current frame. So it might have some issues with the edges as we can see.

26 / 60

27 / 60

28 / 60

Static geometry tails are not very noticeable in motion. The worst thing that can happen is that they will look as if they were rendered at 30 fps. Again, under standard 3rd person camera motion conditions they are very unlikely to cause any major problems. To prove the point, you can look at the supplementary video part 2 (live XBox360 demo). Even though we are going to fix them up, this feature was disabled in the demo that was captured.

Dynamic geometry tails, on the other hand, are more noticeable, due to additional transformation of the characters and the fact that they can move in the opposite directions. So characters need some special treatment.

29 / 60

In case of camera translation, distant objects tend to move in the same direction at almost the same speed due to perspective projection. In practice, it is just a few meters further away from the character where all those same motions happen. Thus pretty much any static geometry behind the character (the point of rotation of the camera) won't cause any noticeable issues.

But when some static objects get really close to the camera, and start moving too fast, we can treat them as dynamic ones (characters, it is coming up).

30 / 60

Rotation around the character (mostly) causes most of the issues for objects that are between the camera and the origin of rotation.

Again, in most situations, for game-play purposes, there aren't any such objects. Because in that case the character would be occluded. So the camera system would always try to avoid that.

But if it happens, those objects could be treated as dynamic ones and handled in a special way as well as the characters.

31 / 60

There are many different ways to filter the velocity buffer as well :) It really depends on what kind of artifacts you get. The easiest one is velocity merging, where we take the previous velocity buffer and somehow (e.g. based on the depth value) combine it with the current one. Which results in adding extrapolation regions in addition to interpolation.

In some motion cases, extrapolation error might be very significant. So another option is to scatter the edges of the objects backwards iteratively, until they cover the problematic area. In case of the GPU implementation that would be rather gathering than scattering.

Now, what about all those dynamic objects and characters? Dynamic objects could be re-rendered completely at 60 fps, or rendered separately from the environment, so that we can handle the overlaps perfectly. But this works best for forward rendering, as it's easy to render things separately. Deferred techniques make this a lot more complicated. Thus, in our particular case, we need something else.

Additionally, we may use as much additional data as we can to help with artifact detection and fix-up. E.g. using previous frames, previous velocity buffers.

32 / 60

The best way, probably, to deal with interpolation artifacts is to use two-frame based solution. When both the previous and the next frames are available. This would obviously introduce an extra frame of latency and will require more memory. There are two aspects to two-frame based interpolation.

One frame could be used to patch the other one. This might have additional issues with dynamic lighting. As we assume that the other frame, in terms of color, looks the same. If, let's say, a lightning strikes, the other frame will have very different colors. We can't technically use it. But we can avoid it, by making the lightning rise the level gradually. So that the possible error is minimized.

Alternatively, we can extrapolate the previous frame half way toward the next frame, as well as do the same thing to the next frame, but backwards and blend them together. This would minimize the artifacts related to semi-transparency, shadows and reflections. Part 1 of the supplementary video shows this particular interpolation technique (as a prototype, it doesn't do any extra artifact removal, as it intended to test the shadows, reflections and the alpha).

33 / 60

Usually, the velocity buffer is rendered by back-projecting the depth buffer. But it could also be re-rendered, which would give more accurate interpolation results, as the edges (objects) are at their exact position. This might be too slow to do though.

Taking into account that static geometry causes a lot less issues than dynamic one, we can re-render dynamic geometry only. Which we will do.

It looks something like this: render static depth, resolve, down sample depth, continue depth rendering with dynamic geometry (this is for the main as well as all other passes). Now, back-project the down sampled depth to construct the static part of the velocity buffer. And render dynamic objects at the exact, interpolated positions. At this point, dynamic objects can generate a mask (e.g. output some kind of id into the alpha channel).

In our case, all the dynamic object have the same id in the alpha channel as it works just fine in most of the cases. Having this mask allows to do velocity filtering and interpolation more accurately.

34 / 60

Velocity merging is the most basic but very efficient velocity filtering technique. The problem is that for static geometry (without re-rendering) velocity buffers are defined for the current and the previous frames only, but not for the frame in-between. Therefore, by going half way backwards based on the current velocity buffer will not affect the tail of the object as it is defined at its previous position. On the other hand, when using the previous velocity buffer only, and doing the extrapolation, will lead to missed head of the object, whereas the tail will be fine.

This simple method merges the two, such that most of the object is interpolated, whereas its tail is extrapolated. When the camera moves continually (interpolated), the difference between extrapolation and interpolation will be minimal. But if direction changes significantly, then it will no longer be correct. At higher speeds (very close objects) this might lead to the "heat distortion" effect. In those cases we can treat them as dynamic objects.

We can also test the direction while do the merge. Such that it won't make it look worse than as if it was rendered at 30 fps. In case of a 3rd person camera this should not happen.

35 / 60

36 / 60

37 / 60

Alright, what about the dynamic characters and deferred rendering? Re-rendering something is usually not an option. Hi-Z and Hi-stencil are already utilized to the maximum at almost each and every step. So new ideas were needed.

It happened that I was studying biological vision around the same time I was trying to solve this dynamic character problem. We have blind spots, two in each eye. One that affects our night vision at the center, and another one where the optical nerve leaves the eye, which is slightly off. The fact the we don't notice them at all is very fascinating. It all happens in the brain, as it reconstructs the missing part of the view. Actually, you can train yourself to notice them.

Well, it would be really nice to make those characters and other problematic regions disappear just like that. So after some experimentation, there it was, working like a charm.

38 / 60

But whatever we do has to work in motion as well. What is so special about the motion?

Take a piece of paper and make a small hole in it, such that you can only see a little bit of the outside. A small patch of the environment enough to recognize the texture only. Now, what are you most likely going to see when you move it around? The exact same texture! The exact same thing. If the area is too small, we can't recognize the texture, then we can not predict what we are going to see next. If the area is too big, then we can't predict anything either, as we will start to recognize the actual objects. So it has to be just the right size.

Those regions that we are trying to fix-up are due to the overlapping. We see the object in front of it as well as we can eye-track it. No problems there. But when that occluded region slides off the occluder, we are not able to track it. It was occluded. Such that we can show whatever we want instead of the real thing, as long as it doesn't violate our expectations. In theory it should work just fine. And, in fact, it does work :)

39 / 60

Assuming that most likely horizontal stays horizontal, vertical stays vertical and texture repeats, all the characters and other problematic regions could be removed from the frame by synthesizing the interior which is defined by the mask. This mask could be constructed in many ways. In case of the character, it is an alpha mask contained in the velocity buffer itself or rendered directly. In fact, it marks all the characters and all the regions with the exact same id.

The whole process could be illustrated with nothing but Photoshop. It is that simple :)

40 / 60

Once the mask is generated, we leak small neighboring image patches around the area inside of it by duplicating and shifting the original layer up, down, left and right.

41 / 60

42 / 60

43 / 60

44 / 60

In case of Photoshop implementation, those four layers should be blended all together with some additional transparency as there is no way to do conditional accumulation.

45 / 60

After repeating this process a few times, the character is gone. Additionally, we can blur the interior slightly. Well, in theory we only need to do this on the edges between the patches, to fix-up the stitching. If it happens that the patch changes gradually from side to side, on the edges we suddenly get higher frequencies that were not present in the original patch. Moreover, the patches that we are talking about are not real patches, so it would be quite expensive to find and handle all the edges individually. That's why it is easier to just blur the interior slightly.

As you can see in the supplementary video Part 2 (live demo) this process also works for multiple characters as well. It doesn't have to go all the way through, completely removing the character. It only depends of the types of motion that we get. The less movement we get, the less iterations we have to run the algorithm.

46 / 60

The real implementation, obviously, is a little different. We don't need to copy and shift anything. The most basic algorithm is this: sample the current pixel, if it's a character, then sample four pixels around it with some delta region offset and accumulate only the ones that don't have the character, and normalize the result of accumulation at the end. If there are no such pixels around, then leave the current one as is. It could further be improved by allowing only patches with more horizontal elements than vertical to move horizontally and vice versa.

Our current implementation, shown in the demo, runs at quarter resolution in 3 fixed passes. The total performance hit is 0.4ms on the XBox360 GPU with dynamic branching (one branch) which is not nearly as efficient as Hi-Z/Hi-Stencil such that it could be further improved. In fact, only about 10% of pixels have to be processed on average. So you do the math.

PlayStation3 SPU implementation is completely branch-less running 3 software pipelined iterations in parallel, processing 2 pixels per cycle (16 bit fixed point math). 0.1ms per pass on 5 SPUs. It could easily be improved as well. By doing in-place pre-pass which would generate a list of blocks to process, and then just run through that list. Branching sucks, software pipelining rulez :)

47 / 60

Interpolation could further be improved once we have a frame without the characters. Sample the front buffer half way backwards based on the current velocity buffer. Sample the synthesized frame without the characters (alpha channel still contains the mask of where things used to be before they got removed). And select the synthesized frame if it had a character (non zero mask) where the current velocity buffer does not.

What about overlapping of the character with itself? In our particular situation it doesn't not seem to be an issue, even though you might see it on a frame by frame playback. But if it is an issue, any kind of patching technique could be used as described in two-frame based case.

Additionally, if there are issues with multiple dynamic characters, character removal could be extended, to take into account the depth and different characters ids.

The point is, that we are not trying to solve all the issues all at once, that might have quite significant performance impact. So it makes more sense to solve those on a per case basis depending on particular motion conditions. There is always some little simple trick that can do the job.

48 / 60

49 / 60

50 / 60

51 / 60

Once the inner frame is constructed, it has to be presented at the moment of time it was intended for. In theory, standard triple buffering will do it if the frame rate never drops under 30.

Now let's look at the typical flipping mechanism on the PlayStation3, as it is easier to understand. When running at 30 fps, a single complete frame could be flipped as soon as the second VBlank impulse is received or immediately, if by that time we received more that two VBlank impulses. The first condition is true when we always run at 30 fps and thus can do the vertical synchronization. But when the frame rate drop under 30, we don't want to wait for the next VBlank, as in that case the frame rate will drop down to 20 fps. So we flip the buffer immediately by sacrificing the vertical synchronization and getting some screen tearing.

There are different variations of this technique, but this is the most basic one and is easily implemented on PlayStation3 with immediate flip functionality provided by cellGcmSetFlipImmediate and related functions. On the XBox360 it is even easier, but only until you need to modify it, which we will need to do.

52 / 60

In order to flip the interpolated frame and maintain frame coherency, the original flipping mechanism has to be modified. And the easiest way to do this, is to flip the interpolated frames based on time between the two previous regular flips since the last one. When we run at solid 30 fps then predicted frame flipping will match the VBlank impulse quite nicely. But once we go under 30 fps, it will try to present the frame in between of the two regular ones, assuming that the frame rate changes continuously. This could be done from a separate thread, spinning and waiting for the right time to come, and then do the flip.

On the XBox360 though implementation of the same fallback mechanism is quite tricky. Direct3D API provides almost no control over the hardware which really sucks and makes implementation of rather basic things quite complicated.

Of course, if we can guarantee that the frame rate never drops under 30 fps, then there are no issues at all. Just do the basic triple buffering through asynchronous swaps and that's it. What if it does drop? Well, we can always detect that and simply disable the whole thing and run at a regular 30 fps. That's an option. But what it would take to match the PlayStation3 flipping?

53 / 60

We have to manipulate memory mapped GPU register called D1-GDPH_PRIMARY_SURFACE_ADDRESS which is mapped to 0x7FC86110. It sets the address of the current front buffer which affects the video output immediately. This way by using SwapCallback and VerticalBlankCallback mechanisms, we can flip the front buffer manually. Since it is just one instruction to flip the buffer, it could be safely done in the callbacks themselves, at the time of deferred procedure calls (DPC). And flipping the inner frame from a separate thread based on time in the same way.

Swap method should still be called but supplied with a set of fake surface handles, such that when this swap happens it won't override the value that we set manually. On top of that, we need to handle the system menu case manually as well (there are callbacks to do that).

Why 0x7FC86110? GetRasterStatus method works at any time from any thread, and when you trace inside, you will find very short assembly routine which loads 0x7FC80000 as a base address and then reads something directly with a certain offset. As you can notice, it is outside of the 512Mb range. Now it is not that hard to find which one is related to the front buffer. It's 0x6110. The same offset could be found in AMD's (ATI) documentation for Linux driver developers.

It violates TCR #012, and it would be nice to have a real API for that.

54 / 60

Our current implementation on consoles features one-frame based solution with very efficient character removal technique which makes it work with all the deferred techniques including deferred lighting, deferred shadows, SSAO and DOF.

In terms of memory it introduces only one extra frame buffer (or two on the XBox360, depending on your frame presentation setup) whereas all other buffers are reused from the motion blur. XBox360 implementation doesn't use any Hi-Z/Hi-Stencil rejection such that it could be done faster than in 1.5 ms total but the interpolation shader is texture setup bound anyway.

Playstation3 implementation takes about 1.2 ms on 5 SPUs when running in parallel and is DMA bound for the most part, but despite that fact all the SPU programs are software pipelined with SPA (Spu Pipelining Assembler) and are almost perfectly balanced between the even and the odd pipelines. Most of the math is performed on two pixels at the same time (per cycle) by using 16bit fixed point math. And it could also be optimized down to 0.8 ms. This cost doesn't include an extra GPU time (very little ~0.3 ms) to render the masks and dynamic character velocities.

55 / 60

In comparison, our current Motion Blur solution is a little slower than the interpolation technique. It leaves some room for further improvements and makes the interpolation technically free.

This is quite a challenge, actually, to make the motion blur run fast enough on both consoles and make it look good. Especially with the lightsabers, which are semi-transparent, that have to work properly with the motion blurred objects.

On the XBox360 it takes about 1.8-2.6 ms at full resolution with a quarter resolution velocity buffer and quarter resolution artifact fighting mechanism, performing variable 5 to 11 samples per pixel and special edge conditions to avoid the halos.

PlayStation3 SPU solution is a lot more advanced in terms of quality and takes about 0.9-1.9 ms on 5 SPUs to do 16 samples and complex edge conditions.

It is worth saying that motion blur makes a huge difference when compared to running with no motion blur. But 60 fps rendering brings it to a different level. When running at 60 fps we can get away without the motion blur. It could still be used as an effect for things that are moving at non eye-trackable speeds.

56 / 60

We have presented a high performance framework to increase rendering frame rate of next-gen video games, allowing to keep high scene complexity as well as fluid motions of camera and characters, giving it a more interactive feel. Our method fits well with common game rendering pipeline, deferred techniques and standard front buffer flipping mechanism.

This is an ongoing research and there are things that still have to be done in our particular case. But this technique can already be used, as is, in production of many great games out there.

We are going to try to re-render the lightsabers which might require a custom anti-aliasing (maybe simplified) solution. Remove the HUD completely or try using the overlays or restore it partially. Try to play more with interpolation from the lower frame rates such as 20 or even 15. In fact, you can see that the environment doesn't look that bad at all (check out the video). Do custom (camera only) motion blur to shadows and reflections. Actually, it does seem to work quite well for reflections as they are rendered at very low resolution.

Additionally, find a way to prevent any drops of frame rate at all. But trying to predict how long a frame is going to take and do all the necessary adjustments upfront, and not after the fact, as we do now.

57 / 60

References

Simonyan, K., Grishin, S., and Vatolin, D., AviSynth MSU Frame Rate Conversion Filter.

Rosado, G. 2007. Motion Blur as a Post-Processing Effect. In GPU Gems 3, H. Nguyen, Ed. 575-582.

Castagno, R., Haavisto, P., Ramponi, G., 1996. A method for Motion Adaptive Frame Rate Up-Conversion. IEEE Transactions on circuits and Systems for Video Technology 6, 5.

Pelagotti, A., and de Haan, G., 1999, High quality picture rate up-conversion for video on TV and PC. Proc. Philips Conf. on Digital Signal Processing, paper 4.1, Veldhoven (NL).

Chen, Y.-K., Vetro, A., Sun, H., and Kung, S.Y., 1998, Framerate up-conversion using transmitted true motion vectors. In Proc. IEEE Second Workshop on Multimedia Signal Processing, 622-627.

Poynton, C., 1996, Motion portrayal, eye tracking, and emerging display technology. In Proc. Advanced Motion Imaging Conference 192-202.

Mather, G., 2006, Introduction to Motion Perception.

58 / 60

A few special words for SIGGRAPH 2010 attendees.

59 / 60

Thank you for your time and attention.

I would also like to thank my friends and colleagues for their criticism and support. Especially Szymon Swistun, Ruslan Abdikeev, Cory Bloyd, Cedrick Collomb, Axel Wefers. And the SIGGRAPH Audio/Video guys, for staying late on Sunday and helping with the XBox360 setup.

60 / 60

Alternatively, you can email me at andcoder@gmail.com