I am noticing a large performance difference between Java & JOGL and C# & Tao.OpenGL when both loading PNGs from storage into memory, and when loading that BufferedImage (java) or Bitmap (C# - both are PNGs on hard drive) 'into' OpenGL.
This difference is quite large, so I assumed I was doing something wrong, however after quite a lot of searching and trying different loading techniques I've been unable to reduce this difference.
With Java I get an image loaded in 248ms and loaded into OpenGL in 728ms The same on C# takes 54ms to load the image, and 34ms to load/create texture.
The image in question above is a PNG containing transparency, sized 7200x255, used for a 2D animated sprite. I realise the size is really quite ridiculous and am considering cutting up the sprite, however the large difference is still there (and confusing).
On the Java side the code looks like this:
BufferedImage image = ImageIO.read(new File(fileName));
texture = TextureIO.newTexture(image, false);
texture.setTexParameteri(GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR);
texture.setTexParameteri(GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR);
The C# code uses:
Bitmap t = new Bitmap(fileName);
t.RotateFlip(RotateFlipType.RotateNoneFlipY);
Rectangle r = new Rectangle(0, 0, t.Width, t.Height);
BitmapData bd = t.LockBits(r, ImageLockMode.ReadOnly, PixelFormat.Format32bppArgb);
Gl.glBindTexture(Gl.GL_TEXTURE_2D, tID);
Gl.glTexImage2D(Gl.GL_TEXTURE_2D, 0, Gl.GL_RGBA, t.Width, t.Height, 0, Gl.GL_BGRA, Gl.GL_UNSIGNED_BYTE, bd.Scan0);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MIN_FILTER, Gl.GL_LINEAR);
Gl.glTexParameteri(Gl.GL_TEXTURE_2D, Gl.GL_TEXTURE_MAG_FILTER, Gl.GL_LINEAR);
t.UnlockBits(bd);
t.Dispose();
After quite a lot of testing I can only come to the conclusion that Java/JOGL is just slower here - PNG reading might not be as quick, or that I'm still doing something wrong.
Thanks.
Edit2:
I have found that creating a new BufferedImage with format TYPE_INT_ARGB_PRE decreases OpenGL texture load time by almost half - this includes having to create the new BufferedImage, getting the Graphics2D from it and then rendering the previously loaded image to it.
Edit3: Benchmark results for 5 variations. I wrote a small benchmarking tool, the following results come from loading a set of 33 pngs, most are very wide, 5 times.
testStart: ImageIO.read(file) -> TextureIO.newTexture(image)
result: avg = 10250ms, total = 51251
testStart: ImageIO.read(bis) -> TextureIO.newTexture(image)
result: avg = 10029ms, total = 50147
testStart: ImageIO.read(file) -> TextureIO.newTexture(argbImage)
result: avg = 5343ms, total = 26717
testStart: ImageIO.read(bis) -> TextureIO.newTexture(argbImage)
result: avg = 5534ms, total = 27673
testStart: TextureIO.newTexture(file)
result: avg = 10395ms, total = 51979
ImageIO.read(bis) refers to the technique described in James Branigan's answer below. argbImage refers to the technique described in my previous edit:
img = ImageIO.read(file);
argbImg = new BufferedImage(img.getWidth(), img.getHeight(), TYPE_INT_ARGB_PRE);
g = argbImg.createGraphics();
g.drawImage(img, 0, 0, null);
texture = TextureIO.newTexture(argbImg, false);
Any more methods of loading (either images from file, or images to OpenGL) would be appreciated, I will update these benchmarks.
Short Answer The JOGL texture classes do quite a bit more than necessary, and I guess that's why they are slow. I run into the same problem a few days ago, and now fixed it by loading the texture with the low-level API (glGenTextures, glBindTexture, glTexParameterf, and glTexImage2D). The loading time decreased from about 1 second to "no noticeable delay", but I haven't done any systematic profiling.
Long Answer If you look into the documentation and source code of the JOGL TextureIO, TextureData and Texture classes, you notice that they do quite a bit more than just uploading the texture onto the GPU:
I'm not sure which one of these is taking more time. But in many cases you know what kind of image data you have available, and don't need to do any premultiplication.
The alpha premultiplication feature is anyway completely misplaced in this class (from a software architecture perspective), and I didn't find any way to disable it. Even though the documentation claims that this is the "mathematically correct way" (I'm actually not convinced about that), there are plenty of cases in which you don't want to use alpha premultiplication, or have done it beforehand (e.g. for performance reasons).
After all, loading a texture with the low-level API is quite simple unless you need it to handle different image formats. Here is some scala code which works nicely for all my RGBA texture images:
val textureIDList = new Array[Int](1)
gl.glGenTextures(1, textureIDList, 0)
gl.glBindTexture(GL.GL_TEXTURE_2D, textureIDList(0))
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MIN_FILTER, GL.GL_LINEAR)
gl.glTexParameterf(GL.GL_TEXTURE_2D, GL.GL_TEXTURE_MAG_FILTER, GL.GL_LINEAR)
val dataBuffer = image.getRaster.getDataBuffer // image is a java.awt.image.BufferedImage (loaded from a PNG file)
val buffer: Buffer = dataBuffer match {
case b: DataBufferByte => ByteBuffer.wrap(b.getData)
case _ => null
}
gl.glTexImage2D(GL.GL_TEXTURE_2D, 0, GL.GL_RGBA, image.getWidth, image.getHeight, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, buffer)
...
gl.glDeleteTextures(1, textureIDList, 0)
I'm not sure that it will completely close the performance gap, but you should be able to use the ImageIO.read method that takes a InputStream and pass in a BufferedInputStream wrapping a FileInputStream. This should greatly reduce the number of native file I/O calls that the JVM has to perform. It would look like this:
File file = new File(fileName); FileInputStream fis = new FileInputStream(file); BufferedInputStream bis = new BufferedInputStream(fis, 8192); //8K reads BufferedImage image = ImageIO.read(bis);
Have you looked into JAI (Java Advanced Imaging) by any chance, it implements native acceleration for tasks such as png compressing/decompression. The Java implementation of PNG decompression may be the issue here. Which version of jvm are you using ?
I work with applications which load and render thousands of textures, for this I use a pure Java implementation of DDS format - available with NASA WorldWind. DDS Textures load into GL faster since it is understood by the graphics card.
I appreciate your benchmarking and would like to use your experiments to test out DDS load times. Also tweak the memory available to JAI and JVM to allow loading of more segments and decompression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With