Gravatar Cool, nice work on the whole tutorial, I use standard desktop GL, not ES, but your tutorial is so easy I should be able to port it relatively easily. You did a good job explaining everything so I think I know just what I'm doing now, thanks.


Gravatar Thanks! As far as porting, the raw OpenGL stuff should be the same except for the functions that have OES in the name in which case you should just drop the OES characters of the function name.


Gravatar Finished! (For the moment)

I could tell I needed to drop the OES bits. My current VBO class is pretty limited, it only does GL_STATIC_DRAW, GL_QUADS, and doesn't use indices, but it works and does use texturing, thanks!

Just a tip for others, I got something (A cube) rendering with immediate mode, then I put those points into an array that I then still drew in immediate mode, but in a loop that didn't know anything about cubes. That sets you up for the last step where you actually use a VBO, at first I just drew the points, no texturing, then when I knew that worked I added the texture coordinates. In my RenderVBO function I still have an immediate mode loop commented out ready for debugging again if needed.


Gravatar Excellent article. I'm currious if you have done any performance comparisons between vbos and regular old vertex arrays on the iPhone. I haven't noticed any difference, and wonder if vbos actually help out or not.


Gravatar I have not done any performance analysis on the iPhone. However, I think I have seen your question asked, probably on the mac-opengl mailing list forums. I believe I have seen claims that VBOs offered nothing in terms of performance gains, while seeing counterclaims that VBOs were significant. I think it will come down to usage cases and good profiling of your apps.

One claim I have encountered multiple times is that dispatching lots of small arrays of vertices will be detrimental to performance (which seems to makes sense). In the usage case of sending lots of small textured quads to the screen for a 2D type game, you might not get the performance you hope. I recall in Apple's Mac OS X (desktop) OpenGL documentation that for display lists, you really wanted 16 vertices at a minimum. I would not be surprised if iPhone OS is optimized the same way.

So I have actually encountered claims from iPhone programmers that they will combine small arrays into larger arrays (on the CPU) before dispatching them to OpenGL, for example particle effects, to maximize performance. Obviously, there will be a balancing act between CPU and GPU in this case, and a trade off in flexibility and complexity, but apparently this can be used for overall performance gains.


Gravatar just 5 comments.. this article deserves a lot more.. and Its one of best written and covers a lot of tips and facts.

Thanks a lot for this article.


Gravatar One doubt.. whats ur use behind this
"(GLvoid*)((char*)NULL)"


Gravatar If its to resolve overload, why are u casting it 2 times ?


Gravatar Thank you for the praise. I haven't done much (anything) to promote my site, so chances are that this article has gone unseen. (Feel free to promote my site )

Anyway, about the casting...
The outer GLvoid* cast is to conform to the function signature which wants a GLvoid*. I think you figured that out already.

As I understand it, the inner cast of (char*) of NULL is to make a char* pointer. This will allow you to add numbers to NULL as an offset which are counted in bytes.

So for the glColorPointer case in my example:
glColorPointer(4, GL_UNSIGNED_BYTE, 0, (GLvoid*)((char*)NULL+vertex_size));

Let's pretend vertex_size is 12. It means we wanted to start looking for the color data 12-bytes from the beginning of the array.

If the cast was (float*) instead of (char*), I believe the +12 would actually look at 12*4=48 bytes because the size of float is 4 bytes and not 1 byte like the size of char.

In the case where we don't add any offset to NULL, the (char*) might be superfluous, though I haven't tried removing it. But I leave it in there to remind me that's what you need if you do need an offset.

Also, a lot of people like to write macros or inline functions for this. In that case, they will always have that code indirectly.

inline GLvoid* BufferObjectPtrOffset(size_t the_offset)
{
return (GLvoid*)(((char*)NULL+the_offset);
}
glColorPointer(4, GL_UNSIGNED_BYTE, 0, BufferObjectPtrOffset(12));

Hope this helps.


Gravatar Quick question - you mention that on the iPhone you should always be using GLfloats for vertex positions - is there a reason for that? If your vertex positions could be defined within a GLshort, wouldn't there be an optimization in terms memory bandwidth by copying GLshort arrays (containing fewer bytes) than GLfloat arrays?


Gravatar So the quick answer is that Apple has been promoting that you should use floats for vertex positions because the hardware is designed for floating point vertices.

The not so quick answer is that you need to benchmark for your specific situation.

While you are correct that the memory bandwidth overhead will be smaller with shorts, it is not necessarily true that memory bandwidth will be your bottleneck.

One thing maybe I should have mentioned about interleaving vertex and colors, etc., is that word alignment for each type may (significantly) affect performance. In this case, if your interleaved sections are not word aligned, you are encouraged to pad extra bytes. Obviously, you create more bandwidth pressure, but this is a case where the bottleneck is not bandwidth.

Back to using GLshort instead of GLFloat for vertex positions, a lot will depend on your usage and the behind-the-scenes driver and hardware.

If you are doing computations to compute your vertex positions or computing other things based on vertex positions (e.g. collision detection), there is a good chance floating point is faster because the hardware provides FPUs and doing your own fixed point math will probably not win.

If you are just submitting static geometry like in this example, you should benchmark as I don't know the real answer. But the worse-case scenario is that the GPU is not designed to handle shorts and must fallback to CPU to do rendering. Being imaginative, this could entail things like the GPU pushing data back to the CPU to do short to float conversions and then having to push back the data to the GPU. I suspect that it probably isn't that bad which is why you need to benchmark in case bandwidth is the bottleneck.

But with (mostly) static geometry, I don't think you should be too concerned about the bandwidth of floats vs shorts for vertices, particularly since this example was about VBOs. The idea is you get it to the video card and cache it in GPU memory, minimizing the need to retransfer the data across the bus on every draw.


Gravatar Thanks for the clarification Eric!! I'm rewriting my iPhone engine from the ground up to be more optimized (interleaved arrays, compressed texture atlases, VBOs, aligning on 4-byte boundaries, etc.) and was debating on the GLshort approach - but you're right floating point makes more sense if I do any math outside of the GPU. They did mention using GLshorts where possible at a WWDC OpenGL lecture "if you can get away with it" which leads me to believe there aren't any of the "float conversion" downsides on the iPhone OpenGL implementation. Great article on VBOs btw - really helped me get my head round them!


Gravatar Thank you very much for writing this. I was looking for a clear summary of the different options and this was exactly what I needed!




Name:

Email:

URL:

Comment:  ? 

 

Commenting by HaloScan