Optimization is a must in computer graphics applications, particularly in those devoted to real-time interaction, such as visualization and games. One possibility is to optimise everything that can be optimised, but this is in most cases a waste of time. There are segments in our applications that when optimised bring an overall gain in efficiency to our application, but this is not true for all segments.
Optimizing without fully grasping where are the performance bottlenecks and critical sections is generally a waste of resources, mainly programmer resources. When this happens time is being spent in parts of the code without any benefit performance wise.
Furthermore, optimisation tends to make the code and data structures more obscure, sacrificing code legibility by humans (yes, programmers are humans, well … most of them anyway). This in turn makes code maintenance and debugging harder and, as all programmers know, maintenance and debugging is where most of the time is spent considering a typical application life cycle, and CG applications are no exception.
Software should be well designed and implemented with clean algorithms and clear data structures. This does not imply that software should be slow. Software well designed can be fast and performance should be taken into account in the design phase. However this should be done without sacrificing the code and data structures readability.
As Donald Knuth once stated, “We should forget about small efficiencies, say 97% of the time: premature optimization is the root of all evil”.
Knuth stated this in an article back in 1974, yet the concept is still applicable today. The article, “Structured Programming with go to Statements“, is well worth reading (see References).
By premature optimization it is meant that a programmer should not compromise the design by being totally immersed in small hacks to increase performance. Knuth didn’t meant that optimisation was not to be considered at all, he just meant that one should approach it wisely.
In order to do so, it is necessary to identify which parts of the code are really taking the longest time. Applying the Pareto Principle to software would yield that 80% of the resources are consumed by 20% of the code (Pareto was an Italian economist, amongst other things, born in 1848). It is essential to correctly identify these performance critical bits to be able to really make an application run faster.
Profiling is a requirement to fully understand where time is being spent in our application. Profiling is about measuring the milliseconds that each task really takes. Unless this measurement is performed it is very difficult to efficiently optimize an application. A profiler can also report on resources being used, such as memory, but in here we’re focused on performance.
Profiling can show us where those critical parts are located. Only afterwards should optimization take place.
When considering a pipeline system, which is common in graphics application, a system will only run as fast as the slowest stage. Hence it is essential to first identify this stage, the bottleneck, and then optimize it. Ideally the problematic code will be sufficiently optimised so that it is no longer a bottleneck. And this is enough, there is no need to continue to optimise this code since no further gains can be obtained regarding performance. The bottleneck will now be located somewhere else in our application.
This is an iterative process where bottlenecks are located and eliminated (hopefully) until the gains, performance wise, are no longer significant. This approach provides yet another advantage: when the slowest stage can no longer be further optimized, the remaining stages can perform additional computation, without hurting the applications performance, as the overall performance is only conditioned by the slowest stage. Note however that the slowest stage in one frame may not be the slowest stage in another frame. For instance when the camera moves around the scene there will be times when there is a lot of geometry, and times when there is little geometry. In these cases it is natural that the slowest stage is not always the same for all frames. The book “Real-Time Rendering” (see References) discusses optimizations and provides more pointers for further information.
In Computer Graphics applications we have another issue to deal with. Normally we are working with two different processors: CPU and GPU. It is fundamental to keep them both busy, and eliminate waiting periods where one processor is waiting for the other to finish some task before it goes to work.
Once again profiling can help by determining where these situations occur. A typical case to look for these scenarios is by checking how long the CPU is waiting for the swap buffers command to be completed.
If an application has too many of these situations then it can usually be severely optimised. In some cases optimisation could be achieved simply by relocating the code so that both processors are at work at the same time.
After this optimisation step is concluded, if one processor is still idle for some period of time, then the load of this processor can be increased without sacrificing the overall performance of the application. This is our chance to add more quality to already implemented features or even add new ones.
Since the communication between this two processors is not synchronous being able to measure the time it takes to execute a command on the GPU may also be important. For instance, this information can provide valuable comparisons on the options to submit your geometry to the GPU. VSProfileLib, from version 0.2.0, uses OpenGL Time Queries to profile the GPU calls. In order to keep the application from being stalled, VSProfileLib uses a double buffering scheme for the queries, as described in the OpenGL Timer Query tutorial.
There are several tools out there to profile you code. gDebugger is an excellent example. Visual Studio and other programming environments also provide some type of profiling.
This lib, a component of the Very Simple * Libraries, brings a profiler which you can display on top of your application using OpenGL. In OpenGL you can use VSFontLib, another component of the Very Simple * Libraries. VSProfileLib lets you profile both the CPU and the GPU using OpenGL time queries (see here for a quick tutorial).
On the next page you can see VSProfileLib in Action.
OpenGL Time Queries
Donald Knuth, “Structured Programming with go to Statements“, ACM Journal Computing Surveys, Vol 6, No. 4, Dec. 1974. p.268 (PDF)
Real-Time Rendering has a good general discussion on optimization.
Game Programming Gems I features the article, Real-Time In-Game Profiling, by Steve Rabin, where VSProfileLib is based on.