I'm trying to understand how the surface-to-surface approach works with MediaCodec. In a ByteBuffer only approach, decoded data is placed in OutputBuffers. This non-encoded data can be processed manually then passed to the InputBuffers of an Encoder.
If we give a look at an example from Android MediaCodec CTS using a surface to surface approach to pass data between a decoder and an encoder, we configure the Decoder to output the decoded data onto a Surface called outputSurface, and we configure the Encoder to receive the data on a Surface called inputSurface.
In the documentation, the createInputSurface and the usage of this surface in the configuration of the Encoder is described as so:
createInputSurface(): Requests a Surface to use as the input to an encoder, in place of input buffers.
In other terms, and this is visible in the CTS example in the ByteBuffers declarations: there is just no InputBuffers for the Encoder. You have:
Instead of enqueu-ing data in the Encoder InputBuffers, you have these line of codes:
outputSurface.awaitNewImage();
outputSurface.drawImage();
inputSurface.setPresentationTime(videoDecoderOutputBufferInfo.presentationTimeUs * 1000);
inputSurface.swapBuffers();
How is the ouputSurface content of the Decoder passed to the inputSurface of the Encoder? What is concretely happening behind the curtain?
The decoder's/encoder's output/input Surface
respectively is a specially configured (either physically contiguous or reserved etc) piece of memory which specialised hardwares (for example, GPUs
, hardware (accelerated) codecs) or software modules can use in a fashion best suited for performance needs (by using features such as hardware acceleration, DMA etc).
More specifically, in the current context for instance, the decoder's output Surface is backed by SurfaceTexture
, so that it can be used in an OpenGL environment
to be used as an external texture for any kind of processing before it is rendered on the Surface from which the encoder can read and encode to create the final video frame.
Not coincidentally, OpenGL can only render to such a Surface
.
So the decoder acts as the provider of raw video frame, the Surface (Texture) the carrier, OpenGL the medium to render it to the Encoder's input Surface which is the destination for the (to be encoded) video frame.
To further satiate your curiosity, check Editing frames and encoding with MediaCodec for more details.
[Edit]
You can check subprojects in grafika
Continuous Camera or Show + capture camera, which currently renders Camera frames (fed to SurfaceTexture) to a Video (and display). So essentially, the only change is the MediaCodec feeding frames to SurfaceTexture instead of the Camera.
Google CTS DecodeEditEncodeTest does exactly the same and can be used as a reference in order to make the learning curve smoother.
To start from the very basics, as fadden pointed out use Android graphics tutorials
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With