Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to debug SEGV_ACCERR

I have an app that streams video using Kickflip and ButterflyTV libRTMP

Now for 99% percent of the time the app is working ok, but from time to time I get a native segmentation fault that I am not able to debug, since messages are too cryptic:

01-24 10:52:25.576 199-199/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-24 10:52:25.576 199-199/? A/DEBUG: Build fingerprint: 'google/hammerhead/hammerhead:6.0.1/M4B30Z/3437181:user/release-keys'
01-24 10:52:25.576 199-199/? A/DEBUG: Revision: '11'
01-24 10:52:25.576 199-199/? A/DEBUG: ABI: 'arm'
01-24 10:52:25.576 199-199/? A/DEBUG: pid: 14302, tid: 14382, name: MuxerThread  >>> tv.myapp.broadcast.dev <<<
01-24 10:52:25.576 199-199/? A/DEBUG: signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x9fef1000
01-24 10:52:25.636 199-199/? A/DEBUG: Abort message: 'Setting to ready!'
01-24 10:52:25.636 199-199/? A/DEBUG:     r0 9c6f9500  r1 9c6f94fc  r2 9fee900c  r3 00007ff4
01-24 10:52:25.636 199-199/? A/DEBUG:     r4 9fee9010  r5 9fef0ffd  r6 00007ff1  r7 9fef0d88
01-24 10:52:25.636 199-199/? A/DEBUG:     r8 cfe40980  r9 9e0a6900  sl 00007ff4  fp 9c6f94fc
01-24 10:52:25.636 199-199/? A/DEBUG:     ip 9c6f9058  sp 9c6f94dc  lr 000000e9  pc b3a33cb6  cpsr 800f0030
01-24 10:52:25.650 199-199/? A/DEBUG: backtrace:
01-24 10:52:25.651 199-199/? A/DEBUG:     #00 pc 00004cb6  /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so
01-24 10:52:25.651 199-199/? A/DEBUG:     #01 pc 00005189  /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so (rtmp_sender_write_video_frame+28)
01-24 10:52:25.651 199-199/? A/DEBUG:     #02 pc 00005599  /data/app/tv.myapp.broadcast.dev-2/lib/arm/librtmp-jni.so (Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo+60)
01-24 10:52:25.651 199-199/? A/DEBUG:     #03 pc 014e84e7  /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (int net.butterflytv.rtmp_client.RTMPMuxer.writeVideo(byte[], int, int, int)+122)
01-24 10:52:25.651 199-199/? A/DEBUG:     #04 pc 014dbd55  /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix.writeThread()+2240)
01-24 10:52:25.651 199-199/? A/DEBUG:     #05 pc 014d8c41  /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix.access$000(io.kickflip.sdk.av.muxer.RtmpMuxerMix)+60)
01-24 10:52:25.651 199-199/? A/DEBUG:     #06 pc 014d819f  /data/app/tv.myapp.broadcast.dev-2/oat/arm/base.odex (offset 0xa66000) (void io.kickflip.sdk.av.muxer.RtmpMuxerMix$1.run()+98)
01-24 10:52:25.651 199-199/? A/DEBUG:     #07 pc 721e78d1  /data/dalvik-cache/arm/system@[email protected] (offset 0x1ed6000)

Again, in a 2 hour stream this might not ever happen or it might happen 10 minutes into the stream. It is super hard to debug because I cannot force the bug to happen.

Is there any way to improve the debugging information I get? What exactly does SEGV_ACCER mean? I've read that this "means you tried to access an address that you don't have permission to access." but I am unsure as what that means, as I can stream for hours without the bug happening.

Is there any way to catch the signal and just continue?

EDIT: to add more information, this is the part of the native library where the app crashes (found using ndk-stack):

JNIEXPORT jint JNICALL
Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo(JNIEnv *env, jobject instance,
                                                       jbyteArray data_, jint offset, jint length,
                                                       jint timestamp) {
    jbyte *data = (*env)->GetByteArrayElements(env, data_, NULL);
    jint result = rtmp_sender_write_video_frame(data, length, timestamp, 0, 0);
    (*env)->ReleaseByteArrayElements(env, data_, data, 0);

    return result;
}


int rtmp_sender_write_video_frame(uint8_t *data,
                                  int size,
                                  uint64_t dts_us,
                                  int key,
                                  uint32_t abs_ts)
{


    uint8_t * buf;
    uint8_t * buf_offset;
    int val = 0;
    int total;
    uint32_t ts;
    uint32_t nal_len;
    uint32_t nal_len_n;
    uint8_t *nal;
    uint8_t *nal_n;
    char *output ;
    uint32_t offset = 0;
    uint32_t body_len;
    uint32_t output_len;

    buf = data;
    buf_offset = data;
    total = size;
    ts = (uint32_t)dts_us;

    //ts = RTMP_GetTime() - start_time;
    offset = 0;

    nal = get_nal(&nal_len, &buf_offset, buf, total);

(...)


}



static uint8_t * get_nal(uint32_t *len, uint8_t **offset, uint8_t *start, uint32_t total)
{
    uint32_t info;
    uint8_t *q ;
    uint8_t *p  =  *offset;
    *len = 0;




    if ((p - start) >= total)
        return NULL;

    while(1) {
        info =  find_start_code(p, 3);

        if (info == 1)
            break;
        p++;
        if ((p - start) >= total)
            return NULL;
    }
    q = p + 4;
    p = q;

    while(1) {
        info =  find_start_code(p, 3);

        if (info == 1)
            break;
        p++;
        if ((p - start) >= total)
            //return NULL;
            break;
    }


    *len = (p - q);
    *offset = p;
    return q;
}


static uint32_t find_start_code(uint8_t *buf, uint32_t zeros_in_startcode)
{
    uint32_t info;
    uint32_t i;

    info = 1;
    if ((info = (buf[zeros_in_startcode] != 1)? 0: 1) == 0)
        return 0;

    for (i = 0; i < zeros_in_startcode; i++)
        if (buf[i] != 0)
        {
            info = 0;
            break;
        };

    return info;
}

Crash happens at buf[zeros_in_startcode] in find_start_code. I have removed a few android_log lines as well (dont think this matters?).

To my understanding, this buffer should be accessible, it makes no sense that it crashes only "sometimes".

PS. this is where I call the native code from Java:

private void writeThread() {

       while (true) {

           Frame frame = null;
           synchronized (mBufferLock) {
              if (!mConfigBuffer.isEmpty()) {
                   frame = mConfigBuffer.peek();
               } else if (!mBuffer.isEmpty()) {
                   frame = mBuffer.remove();
               }
               if (frame == null) {
                   try {
                       mBufferLock.wait();
                   } catch (InterruptedException e) {
                   }
               }
           }

           if (frame == null) {
               continue;
           } else if (frame instanceof Sentinel) {
               break;
           }


           int writeResult = 0;

           synchronized (mWriteFence) {
               if (!mConnected) {
                   debug(WARN, "Skipping frame due to disconnection");
                   continue;
               }

               if (frame.getFrameType() == Frame.VIDEO_FRAME) {              
                   writeResult = mRTMPMuxer.writeVideo(frame.getData(), frame.getOffset(), frame.getSize(), frame.getTime());
               } else if (frame.getFrameType() == Frame.AUDIO_FRAME) {
                   writeResult = mRTMPMuxer.writeAudio(frame.getData(), frame.getOffset(), frame.getSize(), frame.getTime());

               }

               if (writeResult < 0) {
                       mRtmpListener.onDisconnected();
                       mConnected = false;
               } else {
                   //Now we remove the config frame, only if sending was successful!
                   if (frame.isConfig()) {
                       synchronized (mBufferLock) {
                           mConfigBuffer.remove();
                       }
                   }
               }
           }

       }

   }

Note that the crash happens even when I dont send audio at all.

like image 441
Gabriel Sanmartin Avatar asked Jan 24 '17 10:01

Gabriel Sanmartin


1 Answers

"You can store the data in a byte[]. This allows very fast access from managed code. On the native side, however, you're not guaranteed to be able to access the data without having to copy it."

See https://developer.android.com/training/articles/perf-jni.html

Analysis

Some musings and things to try:

  • The code where it falls over is very generic, so probably no bug there
  • It must be the frame data has been removed/damaged/locked/moved
  • Has the Java garbage collector removed OR relocated the data ?
  • You could write detailed debug to a file, overwriting it on every frame, so you only have a small log with the last debug info.
  • send a local copy of the frame variable info (using ByteBuffer) to mRTMPMuxer.writeVideo
    Unlike regular byte buffers,in ByteBuffer the storage is not allocated on the managed heap, and can always be accessed directly from native code.

Implementation

//allocates memory from the native heap
ByteBuffer data = ByteBuffer.allocateDirect(frame.getData().length);
data.clear();
//System.gc();
//copy data
data.get(frame.getData(), 0, frame.getData().length);
//data = (frame.getData() == null) ? null : frame.getData().clone();
int offset  = frame.getOffset();
int size    = frame.getSize();
int time    = frame.getTime();
writeResult = mRTMPMuxer.writeVideo(data , offset, size, time);

JNIEXPORT jint JNICALL
Java_net_butterflytv_rtmp_1client_RTMPMuxer_writeVideo(
    JNIEnv *env,
    jobject instance,
    jobject data_, //NOT jbyteArray data_,
    jint offset,
    jint length,
    jint timestamp) 
{
    jbyte *data = env->GetDirectBufferAddress(env, data);//GetDirectBufferAddress NOT GetByteArrayElements
    jint result = rtmp_sender_write_video_frame(data, length, timestamp, 0, 0);
    //(*env)->ReleaseByteArrayElements(env, data_, data, 0);//????
    return result;
}

Debugging

Some code from SO Catching exceptions thrown from native code:

    static uint32_t find_start_code(uint8_t *buf, uint32_t zeros_in_startcode){
    //...
    try {
        if ((info = (buf[zeros_in_startcode] != 1)? 0: 1) == 0) return 0;//your code
    }
    // You can catch std::exception for more generic error handling
    catch (std::exception e){
        throwJavaException (env, e.what());//see method below
    }
    //...

Then a new method:

    void throwJavaException(JNIEnv *env, const char *msg)
    {
     // You can put your own exception here
     jclass c = env->FindClass("java/lang/RuntimeException");
     if (NULL == c)
     {
         //B plan: null pointer ...
         c = env->FindClass("java/lang/NullPointerException");
     }
     env->ThrowNew(c, msg);
    }
}

Don't get too hung up on SEGV_ACCERR, you have a segmentation fault,SIGSEGV (caused by a program trying to read or write an illegal memory location, read in your case).
From siginfo.h:

SEGV_MAPERR means you tried to access an address that doesn't map to anything. SEGV_ACCERR means you tried to access an address that you don't have permission to access.

Other

This may be of interest:

Q: I noticed that there was RTMP support. But a patch which remove RTMP had been merged.
Q: Could you tell me why ?
A: We don't think RTMP serves the mobile broadcasting use case as well as HLS,
A: and so we don't want to dedicate our limited resources towards supporting it.

see: https://github.com/Kickflip/kickflip-android-sdk/issues/33

I suggest you register an issue with:
https://github.com/Kickflip/kickflip-android-sdk/issues
https://github.com/ButterflyTV/LibRtmp-Client-for-Android/issues

like image 53
Jon Goodwin Avatar answered Sep 26 '22 03:09

Jon Goodwin