Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my long-running python script crash with "invalid pointer" after running for about 3 days?

Tags:

I wrote a python 3 script which tests an SPI link to an FPGA. It runs on an Raspberry Pi 3. The test works like this: after putting the FPGA in test mode (a push switch), send the first byte, which can be any value. Then further bytes are sent indefinitely. Each one increments by the first value sent, truncated to 8 bits. Thus, if the first value is 37, the FPGA expects the following sequence:

37, 74, 111, 148, 185, 222, 4, 41 ...

Some additional IO pins are used to signal between the devices - RUN (RPi output) starts the test (necessary because the FPGA times out in about 15ms if it expects a byte) and ERR (FPGA output) signals an error. Errors can thus be counted at both ends.

In addition, the RPi script writes a one line summary of bytes sent and number of erros every million bytes.

All of this works just fine. But after running for about 3 days, I get the following error on the RPi:

free(): invalid pointer: 0x00405340

I get this exact same error on two identical test setups, even the same memory address. The last report says "4294M bytes sent, 0 errors"

I seem to have proved the SPI link, but I am concerned that this long-running program crashes for no apparent reason.

Here is the important part of my test code:

def _report(self, msg):
        now = datetime.datetime.now()
        os.system("echo \"{} : {}\" > spitest_last.log".format(now, msg))

    def spi_test(self):
        global end_loop
        input("Put the FPGA board into SPI test mode (SW1) and press any key")
        self._set_run(True)
        self.END_LOOP = False
        print("SPI test is running, CTRL-C to end.")
        # first byte is sent without LOAD, this is the seed
        self._send_byte(self._val)
        self._next_val()
        end_loop = False
        err_flag = False
        err_cnt = 0
        byte_count = 1
        while not end_loop:
            mb = byte_count % 1000000 
            if mb == 0:
                msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
                print("\r" + msg, end="")
                self._report(msg)
                err_flag = True
            else:
                err_flag = False
            #print("sending: {}".format(self._val))
            self._set_load(True)
            if self._errors and err_flag:
                self._send_byte(self._val + 1)
            else:
                self._send_byte(self._val)
            if self.is_error():
                err_cnt += 1
                msg = "{}M bytes sent, {} errors".format(int(byte_count/1000000), err_cnt)
                print("\r{}".format(msg), end="")
                self._report(msg)
            self._set_load(False)
            # increase the value by the seed and truncate to 8 bits
            self._next_val()
            byte_count += 1

        # test is done
        input("\nSPI test ended ({} bytes sent, {} errors). Press ENTER to end.".format(byte_count, err_cnt))
        self._set_run(False)

(Note for clarification : there is a command line option to artifically create an error every million bytes. Hence the " err_flag" variable.)

I've tried using python3 in console mode, and there seems to be no issue with the size of the byte_count variable (there shouldn't be, according to what I have read about python integer size limits).

Anyone have an idea as to what might cause this?

like image 757
danmcb Avatar asked Apr 08 '19 08:04

danmcb


People also ask

How can you prevent a program from crashing when it gets an error in Python?

In Python, we use the try and except statements to handle exceptions. Whenever the code breaks down, an exception is thrown without crashing the program. Let's modify the add number program to include the try and except statements. Python would process all code inside the try and except statement.

What is Python crash?

A segfaulting program might be the symptom of a bug in C code–or it might be that your process is running out of memory. Crashing is just one symptom of running out of memory. Your process might instead just run very slowly, your computer or VM might freeze, or your process might get silently killed.

Can Python crash your computer?

This is known as a 'fork bomb'. It will open more and more copies of itself, and those copies will open yet more copies. Your computer will be unusable within a couple of seconds after clicking the file and it will crash completely within a minute or two.


1 Answers

This issue is connected to spidev versions older than 3.5 only. The comments below were done under assumption that I was using the upgraded version of spidev.

#############################################################################

I can confirm this problem. It is persistent with both RPi3B and RPi4B. Using python 3.7.3 at both RPi3 and RPi4. The version of spidev which I tried were 3.3, 3.4 and the latest 3.5. I was able to reproduce this error several times by simply looping through this single line.

spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])

It takes up to 11 hours depending on the RPi version. After 1073014000 calls (rounded to 1000), the script crashes because of "invalid pointer". The total amount of bytes sent is the same as in danmcb's case. It seems as if 2^32 bytes represent a limit.

I tried different approaches. For example, calling close() from time to time followed by open(). This did not help.

Then, I tried to create the spiDev object locally, so it would re-created for every batch of data.

def spiLoop():
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 15000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    for j in range(100000):
        spidevice2.xfer2([0x00, 0x00, 0x00, 0x00])
        
    spidevice2.close()

It still crashed at after approx. 2^30 calls of xfer2([0x00, 0x00, 0x00, 0x00]) which corresponds to approx. 2^32 bytes.

EDIT1

To speed up the process, I was sending in blocks of 4096 bytes using the code below. And I repeatedly created the SpiDev object locally. It took 2 hours to arrive at 2^32 bytes count.

def spiLoop():
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 25000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    to_send = [0x00] * 2**12 # 4096 bytes
    for j in range(100):
        spidevice2.xfer2(to_send)
        
    spidevice2.close()
    del spidevice2

def runSPI():
    for i in range(2**31 - 1):
        spiLoop()            
        print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')

spi crashed after sending 2^32 bytes

EDIT2

Reloading the spidev on the fly does not help either. I tried this code on both RPi3 and RPi4 with the same result:

import importlib
def spiLoop():
    importlib.reload(spidev)
    spidevice2 = spidev.SpiDev()
    spidevice2.open(0, 1)
    spidevice2.max_speed_hz = 25000000
    spidevice2.mode = 1 # Data is clocked in on falling edge
    
    to_send = [0x00] * 2**12 # 4096 bytes
    for j in range(100):
        spidevice2.xfer2(to_send)
        
    spidevice2.close()
    del spidevice2

def runSPI():
    for i in range(2**31 - 1):
        spiLoop()            
        print((2**12 * 100 * (i + 1)) / 2**20, 'Mbytes')

reloading the spidev package does not help

EDIT3

Executing the code snippet did not isolate the problem either. It crashed after the 4th chuck of 1Gbyte-data was sent.

program = '''
import spidev
spidevice = None

def configSPI():
    global spidevice
    
    # We only have SPI bus 0 available to us on the Pi
    bus = 0
    #Device is the chip select pin. Set to 0 or 1, depending on the connections
    device = 1

    spidevice = spidev.SpiDev()
    spidevice.open(bus, device)
    spidevice.max_speed_hz = 250000000
    
    spidevice.mode = 1 # Data is clocked in on falling edge

def spiLoop():
    to_send = [0xAA] * 2**12
    loops = 1024
    for j in range(loops):
        spidevice.xfer2(to_send)
    
    return len(to_send) * loops    

configSPI()
bytes_total = 0

while True:
    bytes_sent = spiLoop()
    bytes_total += bytes_sent            
    print(int(bytes_total / 2**20), "Mbytes", int(1000 * (bytes_total / 2**30)) / 10, "% finished")
    if bytes_total > 2**30:
        break

'''
for i in range(100):
    exec(program)
    print("program executed", i + 1, "times, bytes sent > ", (i + 1) * 2**30)

enter image description here

like image 79
Dmitri M. Avatar answered Oct 09 '22 18:10

Dmitri M.