If you’ve been following recent game technology developments you must have heard about data-oriented design. I believe Unity is kind of pioneering this with its Entity Component System(ECS). We’ve even had a presentation where our experts explain this very idea. Anyway, data-oriented design revolves around the idea that you get a massive performance increase when your data is stored in a way that is compact and easy for CPU to access. Today we will explore and test this theory in the easiest way possible!
First, let’s go through some theory
Let’s say we have a texture that consists of pixels. This texture has width and height. We, people, imagine this texture like this:

But since computer memory has only one dimension, our texture ends up in memory in this format:

You might think that since we have all of our data stored in one chunk of memory, we cannot make any further optimizations as our data is already all in one place. However, this example will get a little bit more interesting once we start exploring data access possibilities.
Let’s say we want to fill this texture with a specific color. We have two basic algorithms for this job. The first one is that we can go to the first row and fill every pixel before going down a row. I call it “Horizontal fill”. On the other hand, we could fill every pixel in a column before going to the next one. I’ll call it “Vertical fill”.
Now here is a question for you: “Which algorithm is going to be faster?”
How does CPU interpret this?
The answer to the question “Which is going to be faster?” is going to be dictated by how fast CPU can work with the memory of our texture in each algorithm. When you want to access or change a variable, CPU cannot read one variable and will read a bigger chunk of memory. We should take advantage of this and try to work with the data that is currently at CPUs “fingertips”. For our texture filling situation, let’s say that CPU can read and work with one line at a time:

Knowing this we can take horizontal fill and vertical fill algorithms and try to compile them into CPUs pseudo-code. First, let’s look at horizontal fill:

Looks quite nice. One read and write per four clear operations. It looks like we cannot get any more efficient. But what about this vertical fill thingy:

Now, this one doesn’t look that good. We have one read and write operation per one pixel. But read and write operations work with four pixels. So basically three pixels are not needed when they are in CPUs scanline. This method is going to be far slower than filling texture horizontally as CPU will have to read four times more data. Not good.
How does this translate into the real world?
I’ve written a test that will measure your CPU performance with both of these algorithms. You will need to compile a C# Visual Studio solution to get the results. Or if you’re lazy you can review my test results instead:
My test results[Archive]
My test results[Google Sheets]
It’s obvious, horizontal fill comes up on top all the time. You can even get like 20x improvement in some cases. The more memory you manage, the more effective data-oriented design patterns are going to be. Also, You can clearly see when CPU puts your memory into a higher capacity cache or RAM. Seeing this clear change in the performance gap between horizontal fill and vertical fill, you could take into account in what cache your data will reside in and then decide whether it is worth for you to optimize parts of your program or not.
Conclusion
I think we can put this case to rest. Data-oriented design works and is very effective if we can organize our data and come up with algorithms that take full advantage of it. Keep it in mind when working on parts of your projects that access a lot of data in a short amount of time. Have fun!


Leave a comment