Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to process raw UDP packets so that they can be decoded by a decoder filter in a directshow source filter

Long Story:

  1. There is an H264/MPEG-4 Source
  2. I can able to connect this source with RTSP protocol.
  3. I can able to get raw UDP packets with RTP protocol.
  4. Then send those raw UDP packets to a Decoder[h264/mpeg-4] [DS Source Filter]
  5. But those "raw" UDP packets can not be decoded by the Decoder[h264/mpeg-4] filter

Shortly:

How do I process those raw UDP data in order to be decodable by H264/ MPEG-4 decoder filter? Can any one clearly identify steps I have to do with H264/MPEG stream?

Extra Info:

I am able to do this with FFmpeg... But I can not really figure out how FFmpeg processes the raw data so that is decodable by a decoder.

like image 664
Novalis Avatar asked Oct 05 '11 17:10

Novalis


People also ask

Do UDP packets have checksum?

UDP uses a simple connectionless communication model with a minimum of protocol mechanisms. UDP provides checksums for data integrity, and port numbers for addressing different functions at the source and destination of the datagram.

Does UDP split packets?

The Data link layers below TCP/UDP might split your packet up if they want to. Especially if you send data over the internet or any networks outside of your control it's hard to predict that. But no matter if the data arrives in one packet or multiple packets at the receiver.

Does UDP provide sequencing?

Because UDP does not use sequence numbers, it is unable to re-order datagrams that it receives in the wrong order.


2 Answers

Peace of cake!

1. Get the data

As I can see, you already know how to do that (start RTSP session, SETUP a RTP/AVP/UDP;unicast; transport, and get user datagrams)... but if you are in doubt, ask.

No matter the transport (UDP or TCP) the data format is mainly the same:

  • RTP data: [RTP Header - 12bytes][Video data]
  • UDP: [RTP Data]
  • TCP: [$ - 1byte][Transport Channel - 1byte][RTP data length - 2bytes][RTP data]

So to get data from UDP, you only have to strip off first 12 bytes which represent RTP header. But beware, you need it to get video timing information, and for MPEG4 the packetization information!

For TCP you need to read first byte until you get byte $. Then read next byte, that will be transport channel that the following data belongs (when server responds on SETUP request it says: Transport: RTP/AVP/TCP;unicast;interleaved=0-1 this means that VIDEO DATA will have TRANSPORT_CHANNEL=0 and VIDEO RTCP DATA will have TRANSPORT_CHANNEL=1). You want to get VIDEO DATA, so we expect 0... then read one short (2 byte) that represents the length of the RTP data that follows, so read that much bytes, and now do the same as for UDP.

2. Depacketize data

H264 and MPEG4 data are usually packetized (in SDP there is packetization-mode parameter that can have values 0, 1 and 2 what each of them means, and how to depacketize it, you can see HERE) because there is a certain network limit that one endpoint can send through TCP or UDP that is called MTU. It is usually 1500 bytes or less. So if the video frame is larger than that (and it usually is), it needs to be fragmented (packetized) into MTU sized fragments. This can be done by encoder/streamer on TCP and UDP transport, or you can relay on IP to fragment and reassemble video frame on the other side... the first is much better if you want to have a smooth error prone video over UDP and TCP.

H264: To check does the RTP data (which arrived over UDP, or interleaved over TCP) hold fragment of one larger H264 video frame, you must know how the fragment looks when it is packetized:

H264 FRAGMENT

First byte:  [ 3 NAL UNIT BITS | 5 FRAGMENT TYPE BITS]  Second byte: [ START BIT | END BIT | RESERVED BIT | 5 NAL UNIT BITS]  Other bytes: [... VIDEO FRAGMENT DATA...] 

Now, get the first VIDEO DATA in byte array called Data and get the following info:

int fragment_type = Data[0] & 0x1F; int nal_type = Data[1] & 0x1F; int start_bit = Data[1] & 0x80; int end_bit = Data[1] & 0x40; 

If fragment_type == 28 then video data following it represents the video frame fragment. Next check is start_bit set, if it is, then that fragment is the first one in a sequence. You use it to reconstruct IDR's NAL byte by taking the first 3 bits from first payload byte (3 NAL UNIT BITS) and combine them with last 5 bits from second payload byte (5 NAL UNIT BITS) so you would get a byte like this [3 NAL UNIT BITS | 5 NAL UNIT BITS]. Then write that NAL byte first into a clear buffer with VIDEO FRAGMENT DATA from that fragment.

If start_bit and end_bit are 0 then just write the VIDEO FRAGMENT DATA (skipping first two payload bytes that identify the fragment) to the buffer.

If start_bit is 0 and end_bit is 1, that means that it is the last fragment, and you just write its VIDEO FRAGMENT DATA (skipping the first two bytes that identify the fragment) to the buffer, and now you have your video frame reconstructed!

Bare in mind that the RTP data holds RTP header in first 12 bytes, and that if the frame is fragmented, you never write first two bytes in the defragmentation buffer, and that you need to reconstruct NAL byte and write it first. If you mess something up here, the picture will be partial (half of it will be gray or black or you will see artifacts).

MPEG4: This is an easy one. You need to check the MARKER_BIT in RTP Header. That byte is set (1) if the video data represents the whole video frame, and it is 0 of the video data is one video frame fragment. So to depacketize that, you need to see what the MARKER_BIT is. If it is 1 thats it, just read the video data bytes.

WHOLE FRAME:

   [MARKER = 1] 

PACKETIZED FRAME:

   [MARKER = 0], [MARKER = 0], [MARKER = 0], [MARKER = 1] 

First packet that has MARKER_BIT=0 is the first video frame fragment, all others that follow including the first one with MARKER_BIT=1 are fragments of the same video frame. So what you need to do is:

  • Until MARKER_BIT=0 place VIDEO DATA in depacketization buffer
  • Place next VIDEO DATA where MARKER_BIT=1 into the same buffer
  • Depacketization buffer now holds one whole MPEG4 frame

3. Process data for decoder (NAL byte stream)

When you have depacketized video frames, you need to make NAL byte stream. It has the following format:

  • H264: 0x000001[SPS], 0x000001[PPS], 0x000001[VIDEO FRAME], 0x000001...
  • MPEG4: 0x000001[Visual Object Sequence Start], 0x000001[VIDEO FRAME]

RULES:

  • Every frame MUST be prepended with 0x000001 3 byte code no matter the codec
  • Every stream MUST start with CONFIGURATION INFO, for H264 that are SPS and PPS frames in that order (sprop-parameter-sets in SDP), and for MPEG4 the VOS frame (config parameter in SDP)

So you need to build a config buffer for H264 and MPEG4 prepended with 3 bytes 0x000001, send it first, and then prepend each depacketized video frame with the same 3 bytes and send that to the decoder.

If you need any clarifying just comment... :)

like image 143
Cipi Avatar answered Nov 07 '22 02:11

Cipi


I have an implementation of this @ https://net7mma.codeplex.com/

Here is the relevant code

/// <summary>     /// Implements Packetization and Depacketization of packets defined in <see href="https://tools.ietf.org/html/rfc6184">RFC6184</see>.     /// </summary>     public class RFC6184Frame : Rtp.RtpFrame     {         /// <summary>         /// Emulation Prevention         /// </summary>         static byte[] NalStart = { 0x00, 0x00, 0x01 };          public RFC6184Frame(byte payloadType) : base(payloadType) { }          public RFC6184Frame(Rtp.RtpFrame existing) : base(existing) { }          public RFC6184Frame(RFC6184Frame f) : this((Rtp.RtpFrame)f) { Buffer = f.Buffer; }          public System.IO.MemoryStream Buffer { get; set; }          /// <summary>         /// Creates any <see cref="Rtp.RtpPacket"/>'s required for the given nal         /// </summary>         /// <param name="nal">The nal</param>         /// <param name="mtu">The mtu</param>         public virtual void Packetize(byte[] nal, int mtu = 1500)         {             if (nal == null) return;              int nalLength = nal.Length;              int offset = 0;              if (nalLength >= mtu)             {                 //Make a Fragment Indicator with start bit                 byte[] FUI = new byte[] { (byte)(1 << 7), 0x00 };                  bool marker = false;                  while (offset < nalLength)                 {                     //Set the end bit if no more data remains                     if (offset + mtu > nalLength)                     {                         FUI[0] |= (byte)(1 << 6);                         marker = true;                     }                     else if (offset > 0) //For packets other than the start                     {                         //No Start, No End                         FUI[0] = 0;                     }                      //Add the packet                     Add(new Rtp.RtpPacket(2, false, false, marker, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, FUI.Concat(nal.Skip(offset).Take(mtu)).ToArray()));                      //Move the offset                     offset += mtu;                 }             } //Should check for first byte to be 1 - 23?             else Add(new Rtp.RtpPacket(2, false, false, true, PayloadTypeByte, 0, SynchronizationSourceIdentifier, HighestSequenceNumber + 1, 0, nal));         }          /// <summary>         /// Creates <see cref="Buffer"/> with a H.264 RBSP from the contained packets         /// </summary>         public virtual void Depacketize() { bool sps, pps, sei, slice, idr; Depacketize(out sps, out pps, out sei, out slice, out idr); }          /// <summary>         /// Parses all contained packets and writes any contained Nal Units in the RBSP to <see cref="Buffer"/>.         /// </summary>         /// <param name="containsSps">Indicates if a Sequence Parameter Set was found</param>         /// <param name="containsPps">Indicates if a Picture Parameter Set was found</param>         /// <param name="containsSei">Indicates if Supplementatal Encoder Information was found</param>         /// <param name="containsSlice">Indicates if a Slice was found</param>         /// <param name="isIdr">Indicates if a IDR Slice was found</param>         public virtual void Depacketize(out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)         {             containsSps = containsPps = containsSei = containsSlice = isIdr = false;              DisposeBuffer();              this.Buffer = new MemoryStream();              //Get all packets in the frame             foreach (Rtp.RtpPacket packet in m_Packets.Values.Distinct())                  ProcessPacket(packet, out containsSps, out containsPps, out containsSei, out containsSlice, out isIdr);              //Order by DON?             this.Buffer.Position = 0;         }          /// <summary>         /// Depacketizes a single packet.         /// </summary>         /// <param name="packet"></param>         /// <param name="containsSps"></param>         /// <param name="containsPps"></param>         /// <param name="containsSei"></param>         /// <param name="containsSlice"></param>         /// <param name="isIdr"></param>         internal protected virtual void ProcessPacket(Rtp.RtpPacket packet, out bool containsSps, out bool containsPps, out bool containsSei, out bool containsSlice, out bool isIdr)         {             containsSps = containsPps = containsSei = containsSlice = isIdr = false;              //Starting at offset 0             int offset = 0;              //Obtain the data of the packet (without source list or padding)             byte[] packetData = packet.Coefficients.ToArray();              //Cache the length             int count = packetData.Length;              //Must have at least 2 bytes             if (count <= 2) return;              //Determine if the forbidden bit is set and the type of nal from the first byte             byte firstByte = packetData[offset];              //bool forbiddenZeroBit = ((firstByte & 0x80) >> 7) != 0;              byte nalUnitType = (byte)(firstByte & Common.Binary.FiveBitMaxValue);              //o  The F bit MUST be cleared if all F bits of the aggregated NAL units are zero; otherwise, it MUST be set.             //if (forbiddenZeroBit && nalUnitType <= 23 && nalUnitType > 29) throw new InvalidOperationException("Forbidden Zero Bit is Set.");              //Determine what to do             switch (nalUnitType)             {                 //Reserved - Ignore                 case 0:                 case 30:                 case 31:                     {                         return;                     }                 case 24: //STAP - A                 case 25: //STAP - B                 case 26: //MTAP - 16                 case 27: //MTAP - 24                     {                         //Move to Nal Data                         ++offset;                          //Todo Determine if need to Order by DON first.                         //EAT DON for ALL BUT STAP - A                         if (nalUnitType != 24) offset += 2;                          //Consume the rest of the data from the packet                         while (offset < count)                         {                             //Determine the nal unit size which does not include the nal header                             int tmp_nal_size = Common.Binary.Read16(packetData, offset, BitConverter.IsLittleEndian);                             offset += 2;                              //If the nal had data then write it                             if (tmp_nal_size > 0)                             {                                 //For DOND and TSOFFSET                                 switch (nalUnitType)                                 {                                     case 25:// MTAP - 16                                         {                                             //SKIP DOND and TSOFFSET                                             offset += 3;                                             goto default;                                         }                                     case 26:// MTAP - 24                                         {                                             //SKIP DOND and TSOFFSET                                             offset += 4;                                             goto default;                                         }                                     default:                                         {                                             //Read the nal header but don't move the offset                                             byte nalHeader = (byte)(packetData[offset] & Common.Binary.FiveBitMaxValue);                                              if (nalHeader > 5)                                             {                                                 if (nalHeader == 6)                                                 {                                                     Buffer.WriteByte(0);                                                     containsSei = true;                                                 }                                                 else if (nalHeader == 7)                                                 {                                                     Buffer.WriteByte(0);                                                     containsPps = true;                                                 }                                                 else if (nalHeader == 8)                                                 {                                                     Buffer.WriteByte(0);                                                     containsSps = true;                                                 }                                             }                                              if (nalHeader == 1) containsSlice = true;                                              if (nalHeader == 5) isIdr = true;                                              //Done reading                                             break;                                         }                                 }                                  //Write the start code                                 Buffer.Write(NalStart, 0, 3);                                  //Write the nal header and data                                 Buffer.Write(packetData, offset, tmp_nal_size);                                  //Move the offset past the nal                                 offset += tmp_nal_size;                             }                         }                          return;                     }                 case 28: //FU - A                 case 29: //FU - B                     {                         /*                          Informative note: When an FU-A occurs in interleaved mode, it                          always follows an FU-B, which sets its DON.                          * Informative note: If a transmitter wants to encapsulate a single                           NAL unit per packet and transmit packets out of their decoding                           order, STAP-B packet type can be used.                          */                         //Need 2 bytes                         if (count > 2)                         {                             //Read the Header                             byte FUHeader = packetData[++offset];                              bool Start = ((FUHeader & 0x80) >> 7) > 0;                              //bool End = ((FUHeader & 0x40) >> 6) > 0;                              //bool Receiver = (FUHeader & 0x20) != 0;                              //if (Receiver) throw new InvalidOperationException("Receiver Bit Set");                              //Move to data                             ++offset;                              //Todo Determine if need to Order by DON first.                             //DON Present in FU - B                             if (nalUnitType == 29) offset += 2;                              //Determine the fragment size                             int fragment_size = count - offset;                              //If the size was valid                             if (fragment_size > 0)                             {                                 //If the start bit was set                                 if (Start)                                 {                                     //Reconstruct the nal header                                     //Use the first 3 bits of the first byte and last 5 bites of the FU Header                                     byte nalHeader = (byte)((firstByte & 0xE0) | (FUHeader & Common.Binary.FiveBitMaxValue));                                      //Could have been SPS / PPS / SEI                                     if (nalHeader > 5)                                     {                                         if (nalHeader == 6)                                         {                                             Buffer.WriteByte(0);                                             containsSei = true;                                         }                                         else if (nalHeader == 7)                                         {                                             Buffer.WriteByte(0);                                             containsPps = true;                                         }                                         else if (nalHeader == 8)                                         {                                             Buffer.WriteByte(0);                                             containsSps = true;                                         }                                     }                                      if (nalHeader == 1) containsSlice = true;                                      if (nalHeader == 5) isIdr = true;                                      //Write the start code                                     Buffer.Write(NalStart, 0, 3);                                      //Write the re-construced header                                     Buffer.WriteByte(nalHeader);                                 }                                  //Write the data of the fragment.                                 Buffer.Write(packetData, offset, fragment_size);                             }                         }                         return;                     }                 default:                     {                         // 6 SEI, 7 and 8 are SPS and PPS                         if (nalUnitType > 5)                         {                             if (nalUnitType == 6)                             {                                 Buffer.WriteByte(0);                                 containsSei = true;                             }                             else if (nalUnitType == 7)                             {                                 Buffer.WriteByte(0);                                 containsPps = true;                             }                             else if (nalUnitType == 8)                             {                                 Buffer.WriteByte(0);                                 containsSps = true;                             }                         }                          if (nalUnitType == 1) containsSlice = true;                          if (nalUnitType == 5) isIdr = true;                          //Write the start code                         Buffer.Write(NalStart, 0, 3);                          //Write the nal heaer and data data                         Buffer.Write(packetData, offset, count - offset);                          return;                     }             }         }          internal void DisposeBuffer()         {             if (Buffer != null)             {                 Buffer.Dispose();                 Buffer = null;             }         }          public override void Dispose()         {             if (Disposed) return;             base.Dispose();             DisposeBuffer();         }          //To go to an Image...         //Look for a SliceHeader in the Buffer         //Decode Macroblocks in Slice         //Convert Yuv to Rgb     } 

There are also implementations for various other RFC's which help getting the media to play in a MediaElement or in other software or just saving it to disk.

Writing to a container format is underway.

like image 31
Jay Avatar answered Nov 07 '22 01:11

Jay