OpenGL Atomic Counters

Atomic counters we’re introduced in OpenGL 4.2 via the ARB_shader_atomic_counters extension.

This feature adds the ability to create a counter that can be accessed/incremented/decremented in a shader, and subsequently read in the OpenGL application. This can be put to use in many different situations, for instance, creating auxiliary data to allow one to easily create data structures combined with image load/store, debug, and algorithm evaluation. Another example is comparing two algorithms that should ideally produce similar images, atomic counters can be used to check how many different pixels are obtained in the resulting images. It can be used to quantify errors/differences when comparing algorithms, for instance how many pixels differ when applying shadow mapping vs. shadow volumes.

From a shader point of view, atomic counters are opaque types, and are declared as uniforms. This is close to the way we work with samplers. Being an opaque type, atomic counters can only be used as an argument of a small set of functions, which we will introduce later. These functions allow the reading of the value of the counter, and to increment/decrement it by one unit.

In the OpenGL side we must create buffers to provide storage for this counters. This is accomplished in a manner very similar to other OpenGL buffers, and in particular to uniform buffers. These buffers are bound to an index binding point, which is then referred to in the atomic counter declaration in the shader.

The OpenGL side of the equation

First we create the buffer(s). As mentioned before this procedure is very similar to the creation of a buffer for uniform variables. The main difference is that the buffer type is now GL_ATOMIC_COUNTER_BUFFER. A buffer can have many counters, and there can be more than one of these buffers. It’s up to us to determine how we organize our counters.

To create a buffer for atomic counters we can proceed as follows:

// declare and generate a buffer object name
GLuint atomicsBuffer;
glGenBuffers(1, &atomicsBuffer);
// bind the buffer and define its initial storage capacity
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicsBuffer);
glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint) * 3, NULL, GL_DYNAMIC_DRAW);
// unbind the buffer 
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);

Internally, an atomic counter is an unsigned integer, so it requires 4 bytes in the buffer. The above created buffer has room for three counters. We can set the buffer with initial data, as passing null to glBufferData results in undefined data. On the other hand, we will probably want to reset the buffer values on each frame. In this last scenario we can simply just perform the reset, and hence the initialization, on the beginning of each frame.

To reset the atomic counter buffers we can do:

// declare a pointer to hold the values in the buffer
GLuint *userCounters;
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicsBuffer);
// map the buffer, userCounters will point to the buffers data
userCounters = (GLuint*)glMapBufferRange(GL_ATOMIC_COUNTER_BUFFER, 
                                         0 , 
                                         sizeof(GLuint) * 3, 
                                         GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT
                                         );
// set the memory to zeros, resetting the values in the buffer
memset(userCounters, 0, sizeof(GLuint) *3 );
// unmap the buffer
glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER);

In the above code, all three atomic counters were reset to zero.

A simpler approach is to use the function glBufferSubData as follows:

glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, m_AtomicCountersBuffer);

GLuint a[3] = {0,0,0};
glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0 , sizeof(GLuint) * 3, a);
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);

Finally to read back the values from the buffer we proceed as usual for other buffer types:

GLuint *userCounters;
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, atomicsBuffer);
// again we map the buffer to userCounters, but this time for read-only access
userCounters = (GLuint*)glMapBufferRange(GL_ATOMIC_COUNTER_BUFFER, 
                                         0, 
                                         sizeof(GLuint) * 3,
                                         GL_MAP_READ_BIT
                                        );
// copy the values to other variables because...
redPixels = userCounters[0];
greenPixels = userCounters[1];
bluePixels = userCounters[2];
// ... as soon as we unmap the buffer
// the pointer userCounters becomes invalid.
glUnmapBuffer(GL_ATOMIC_COUNTER_BUFFER);

An alternative approach is to use function glGetBuferData:

GLuint userCounters[3];
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, m_AtomicCountersBuffer);
glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint) * 3, userCounters);
glBindBuffer(GL_ATOMIC_COUNTER_BUFFER, 0);
redPixels = userCounters[0];
greenPixels = userCounters[1];
bluePixels = userCounters[2];

GLSL point of view

Declaration

When writing a shader that uses atomic counters we must know the index binding point for the atomic counter buffer. This info is required to declare the atomic counter. The data type is atomic_uint, and a layout is used to specify the buffer index binding point and the offset within the buffer, where the memory storage for the atomic counter can be found.

Following, some examples of valid declarations are presented:

layout (binding = 1, offset = 0) uniform atomic_uint atRed;
layout (binding = 2, offset = 0) uniform atomic_uint atGreen;
layout (binding = 2, offset = 4) uniform atomic_uint atBlue;

The first variable, atRed will be using storage from the buffer with index binding point 1. Since it has a zero offset it will take the first four bytes of the buffer (recall that an atomic counter is an unsigned integer).

The second and third variables will both have their storage on the buffer with index binding point 2. The offset has been set to 0 for atGreen, and 4 for atBlue. This is the minimum offset difference between two counters. Offsets should be multiples of 4, otherwise we’ll risk overlap, and it will be messier to get back the results from the buffer.

Besides overlapping, the only other restriction is that there can only be one variable for each pair (binding, offset) in any individual shader stage, i.e., the following declaration will get us in trouble as both variables refer to the same atomic counter:

layout (binding = 1, offset = 0) uniform atomic_uint at1;
layout (binding = 1, offset = 0) uniform atomic_uint at2;

An atomic counter can be used in more than one shader stage of the pipeline using the same binding and offset values. In the following example both variables refer to the same atomic counter.

 
// vertex shader
layout (binding = 2, offset = 0) uniform atomic_uint atVertex;
...
// fragment shader
layout (binding = 2, offset = 0) uniform atomic_uint atFragment;

It is also possible to declare an array of atomic counters as follows:

layout (binding = 1, offset = 0) uniform atomic_uint at[3];
layout (binding = 1, offset = 12) uniform atomic_uint at2;

Notice the second offset, it has to cover for three unsigned ints (3 x 4) to void overlapping the atomic counter array.

Finally, as with other opaque types, for instance samplers, an atomic counter can’t be part of a uniform block, even though it is a uniform variable.

Usage

Inside the shader code an atomic counter can only be used with the following functions:

//returns the current value of the atomic counter
uint atomicCounter(atomic_uint c);

// decrements the value of the atomic counter and returns its new value
uint atomicCounterDecrement(atomic_uint c); 

//increment the value and return its prior value
uint atomicCounterIncrement(atomic_uint c);

Now on to some examples. Let’s assume that we want to build a small histogram based on color, i.e. we want to count how many pixels are mostly red, mostly green and mostly blue. By mostly red I mean that the red color value is larger than or equal to both the green and blue values. A fragment shader for this purpose could be written as:

#version 420

layout (binding = 1, offset = 0) uniform atomic_uint atRed;
layout (binding = 1, offset = 4) uniform atomic_uint atGreen;
layout (binding = 1, offset = 8) uniform atomic_uint atBlue;

in VertexData {
	vec4 color;
} FragIn;

out vec4 colorOut;

void main() {

	if ((FragIn.color.r >= FragIn.color.g) && (FragIn.color.r >= FragIn.color.b))
		atomicCounterIncrement(atRed);
	else if (FragIn.color.g >= FragIn.color.b)
		atomicCounterIncrement(atGreen);
	else
		atomicCounterIncrement(atBlue);

	colorOut = color;
}

Another simple example is to compare two images, counting the number of different pixels.

#version 420

layout (binding = 1, offset = 0) uniform atomic_uint atDiff;

uniform sampler2D texUnit1, texUnit2;

in VertexData {
	vec4 texCoord;
} FragIn;

out vec4 colorOut;

void main() {

	if (texture(texUnit1, FragIn.texCoord.xy) != texture(texUnit2, FragIn.texCoord.xy))
		atomicCounterIncrement(atDiff);

	colorOut = color;
}

Here is a different example:

#version 430

layout (binding = 1, offset = 0) uniform atomic_uint at;

uniform sampler2D texUnit;

in VertexData {
	vec4 color;
	vec4 texCoord;
	float texCount;
} FragmentIn;

out vec4 colorOut;

void main() {

	uint a = atomicCounterIncrement(at);
	uint b = atomicCounterDecrement(at);

	if (a == b)
		colorOut = texture(texUnit, FragmentIn.texCoord.xy);
	else
		colorOut = vec4(1.0, 0.0, 0.0, 1.0);
}

In the above example, at first glance we would expect that the output would be the textured model, yet we get mostly red, the else path, with just a few pixels following the if path. If we check the atomic counter on the application latter, on the application side, we do get zero as expected. Hence the result makes complete sense globally (for the sum of the instances of the shader), but it is harder to understand locally (for each particular instance).

Considering a single instance of the shader, it may appear that the if path would always be selected. However, since we’re dealing with massive parallelism, we must imagine lots of groups of instances which are not synchronized among them. Although each increment/decrement instruction is atomic, the pair of instructions is not atomic.

In practice this means there will be several groups executing instances of the shader, yet as these groups are not synchronized among them each group can be executing a different instruction of the shader. Hence, in a particular group there is no guarantee that between incrementing the counter, and decrementing it, there is no other group of instances altering the counter.

Counting Vertices and Primitives

To count the number of primitives or total number of vertices we can also use an atomic counter in a geometry shader. Just increment the atomic counter each time we end a primitive or emit a vertex, respectively.

We can not use an atomic counter in a vertex shader to count the total number of vertices. If we increment an atomic counter each time we execute a vertex shader it seems that we get the number of processed vertices, not the total number of vertices in a draw call. It looks as if when a vertex is in cache, the instruction to increment/decrement an atomic counter is not executed. If this is the case, then this feature can be put to good use to check if a VAO is efficiently reusing its vertices.

Note: with catalyst 12.9 I get an error when trying to use atomic counters in the vertex shader. The GLSL compiler reports: “No matching overload function found: atomicCounterIncrement”. With NVIDIA GeForce 305.67 drivers it works!

Querying the OpenGL implementation limits

The maximum size, in bytes, for an atomic counter buffer can be obtained with glGetInteger as follows:

GLint max;
glGetIntegerv(GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE, &max);

The specification states that each shader stage has a limit for both the number of atomic counters as well as the number of active atomic counter buffers. There is a global limit, and a limit per shader stage. These limits can be obtained with glGetIntegerv, using the following constants:

GL_MAX_COMBINED_ATOMIC_COUNTER_BUFFERS
GL_MAX_VERTEX_ATOMIC_COUNTER_BUFFERS
GL_MAX_TESS_CONTROL_ATOMIC_COUNTER_BUFFERS
GL_MAX_TESS_EVALUATION_ATOMIC_COUNTER_BUFFERS
GL_MAX_GEOMETRY_ATOMIC_COUNTER_BUFFERS
GL_MAX_FRAGMENT_ATOMIC_COUNTER_BUFFERS
GL_MAX_COMBINED_ATOMIC_COUNTERS
GL_MAX_VERTEX_ATOMIC_COUNTERS
GL_MAX_TESS_CONTROL_ATOMIC_COUNTERS
GL_MAX_TESS_EVALUATION_ATOMIC_COUNTERS
GL_MAX_GEOMETRY_ATOMIC_COUNTERS
GL_MAX_FRAGMENT_ATOMIC_COUNTERS

Testing these values with AMD 12.9 and GeForce 305.67 drivers provides very different results. AMD provides only 8 atomic counters for the whole pipeline. NVIDIA is far more generous and it gives us 16384 counters per shader stage. Daniel Rákos points out, in here, that AMD has dedicated hardware for atomic counters, providing a faster implementation, although more limited in number.

I’ve tested a simple app to render a 3D model, similar to the demo that comes with VSL, to count the number of mostly red pixels, and got a huge hit on performance with a NVIDIA 460GTX. My Radeon 6990M on the other hand behaved as if the atomic counter operations we’re just any other operation.

On the other hand, using the same drivers and hardware, I’m unable to get more than one atomic counter working with AMD, but probably I’m doing something wrong…

Some Final Notes

Why are the GLSL operations not consistent? increment and return pre-increment vs. decrement and return post-decrement?

The genesis of this behavior can be explained by a simple application that was in the mind of the creators of the extension. The idea was to implement a list of records (for instance with image load/store). When a record is added the increment function returns the index of last inserted record. When a record is deleted the decrement function returns the index of the last record on the list (after the deletion).

Careful:

If an atomic counter is used in more than one pipeline stage then it is counted as many times towards the global count of atomic counters.
Decrements and increments at the limit of the range [0, 2^32-1] will wrap.
Counters aggregated into arrays within a shader (using square brackets []) can only be indexed with dynamically uniform integral expressions, otherwise results are undefined.

Note: An expression is dynamically uniform if all instances of the shader, within the same draw call, get the same value as result. What this means in practice is that the following code is bound to get us in trouble:

layout (binding = 1, offset = 0) uniform atomic_uint myIntReds[256];
...
int myIntRed = (int)(255.0 * texture(texUnit, texCoord).r);
atomicCounterIncrement(myIntReds[myIntRed]);

In the above code, in each shader instance we may be accessing a different counter, depending on the texture return value, and this is not allowed. The above results will be undefined. The index for an atomic counter array must be either a compile-time constant, a uniform variable, or an expression composed of only compile-time constants and uniform variables.

What we can’t do:

Atomic counters have no location, hence they can not be set with glUniform*.
Being an opaque type, atomic counters can not be used as a regular unsigned integer

Atomic Counters in Action Elsewhere

Cyril Crassin and Simon Green wrote about a technique for octree based sparse voxelization using the GPU hardware rasterizer. A chapter was published on the book OpenGL Insights and it can be downloaded from the books companion site. This technique uses atomic counters to help constructing a voxel fragment list. The end result is great, and its real-time!

A post on RenderingPipeline.com describes how to create an animation showing the GPU rasterizer patter. In a first pass the atomic counter is used to keep track of the writing order of each fragment. A second pass is called repeatedly with a timer, and only fragments whose counter is below the timer value get drawn. The post has a couple of movies showing the final effect.

5 Responses to “OpenGL Atomic Counters”

Craig Reynolds says:

21/12/2015 at 2:41 AM

I basically followed your code to write my first usage of atomic counters. It would not work until I added a call to glBindBufferBase before drawing:

glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, m_AtomicCountersBuffer);

where that second argument is the “binding” value given in GLSL:

layout (binding = 0, offset = 0) uniform atomic_uint my_counter;

this was in OpenGL 4.5

Reply
msqrt says:

09/03/2014 at 9:57 AM

In the code you’re setting redBits, greenBits and redBits. I guess you meant blue to be somewhere too 🙂

Good read though, all of the important stuff in a nutshell.

Reply
- ARF says:
  
  09/03/2014 at 3:30 PM
  
  Hi,
  
  Many thanks for the bug report.
  
  Reply
Jonas says:

02/07/2013 at 1:50 PM

Hi, thank you for the explanations. However, in the listing about getting the data is an error. It should be

glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint)*3, userCounters);

instead of

glGetBufferSubData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint) * 3, sizeof(GLuint), userCounters);

Reply
- ARF says:
  
  05/07/2013 at 1:08 AM
  
  Quite right. Fixed. Thanks for reporting the bug.
  
  Reply