I am missing some fundamental thing in translating an UDP stream of a SDP session into a decodable H.264 stream. I am testing with a H.264 capable camera and can play the stream with a player directly. When I try to play the translated stream it will not be recognized by the player (missing header error). However I have to decode the UDP stream to be able to integrate this in a Java application for which there are some decoders around.
I have seen very good answers to following questions already:
Both have some small differences which are confusing (see below).
But first let us look at the easy part. As I see the from the camera there are SPS and PPS packets sent. All the remaining packets are fragmented frames indexed or not.
For all the packets without frames (only NALUnitType 7 and 8 in my case) I strip of the RTP Header (12 Bytes) and add starting bytes 3 x 0 bytes and 1 x 1 in front (00 00 00 01).
For all fragmented frame packets I reconstruct them according the description of the answer 1. So in detail this means: Strip of the RTP header (just use this for data verification). Then decode from the payload the fragment information:
First byte: [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS]
Second byte: [ START BIT | END BIT | RESERVED BIT | 5 NAL UNIT BITS]
If start bit is set there is a new payload header constructed as this: [3 NAL UNIT BITS (from first byte)| 5 NAL UNIT BITS (from second byte)]
This gives us a NALUnitType 1 for an non idr slice or a 5 for an idr slice. Which is according to the specification.
I take this new payload header (1 byte) and attach the payload without the 2 bytes header into a new package. All consecutive fragments are added the same way (strip of 12 bytes RTP header, strip of 2 bytes of unit type information) until there is an end bit information seen. When the end is seen I put start bytes (00 00 00 01) in front of this packet and write it out to the stream.
The problem is it can not be decoded for unknown reason. The difference in answer 2 of the answers I have read is that the second byte of the payload header might be put into the translated packet as well. But I tried both and still no luck.
Probably there is something other missing in the newly constructed stream ? Or do I make a mistake in the defragmentation?
Thomas,
I'm trying to understand all of this myself. It looks to me, from reading this: How to process raw UDP packets so that they can be decoded by a decoder filter in a directshow source filter that your "start bytes" is off by one byte. I think it's 3 bytes, not four... as in: 00 00 01
Maybe that's where it's having trouble.
See Problem to Decode H264 video over RTP with ffmpeg (libavcodec) for the answer. It has the correct implementation!
And @Thomas, yes it does have 4 if a SPS, PPS or SEI NAL are present.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With