Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do memory test on Arm Architecture Hardware? (something like Memtest86)

Tags:

linux

arm

Is there a way to do complete memory test on android device RAM?

I'm developing a driver but at ramdom times I get certain physical addresses with wrong value causing the driver to go into wrong state. I'm trying to read from RAM when I hit the problem. I Think certain portions of ram on my device are corrupted.

like image 690
Vector Avatar asked Jul 24 '12 22:07

Vector


People also ask

What is memory test MemTest86?

MemTest86 and Memtest86+ are memory test software programs designed to test and stress test an x86 architecture computer's random-access memory (RAM) for errors, by writing test patterns to most memory addresses, reading back the data, and comparing for errors.

Does MemTest86 check RAM or CPU?

MemTest86 boots from a USB flash drive and tests the RAM in your computer for faults using a series of comprehensive algorithms and test patterns.


2 Answers

Complete is an ambiguous word. It may mean different temperatures, voltages and across a range of devices with different component tolerances. As you site MemTest86, I think I understand. Most project I have seen are C based and can not test everything.

Here is one running under Linux - http://www.madsgroup.org/~quintela/memtest/

There are algorithms documented such as walking bits, etc. A lot depends on your RAM type. I guess you have some type of SDRAM. There are many different cycles with SDRAM. There are single beat reads/write, bank-to-bank transfer, terminated bursts, etc.

Personally, we had a system were 5% of the boards would show problems when doing an SSH transfer over Ethernet (DMA). The SSH involves encryption which is CPU/memory intensive and the DMA engine often does different SDRAM cycles than the CPU (with cache).

Here are some requirements,

  1. Non-SDRAM memory for code to reside.
  2. Bare metal framework (no cache, interrupts, DMA, etc)
  3. Turn off the DCache.
  4. Turn on the ICache for the code.

Another limiting requirement is the time to run. A complete SDRAM test could take years to run on a single board. I have found that a pseudo random address/data test works well. Just take numbers that are relative prime to the size of the SDRAM and use that as an increment. The simplest case is 1. You might wish to find the others to constantly change rows, banks and device size; bank size-1 for example; however prime numbers will work better as you have different amounts of bits changing all the time. With the cache off, you can use char, short, int, and long long pointers to test some different burst lengths. These tests will be slow. You will need to use ldm/stm pairs to simulate a full SDRAM burst, these are more common with the cache on so you should simulate them with ldm/stm. This is also one of the fastest tests.

typedef unsigned char      b8;
typedef unsigned short     b16;
typedef unsigned long      b32;
typedef unsigned long long b64;

/* Use a macro to speed code.  The compiler will use constants for
 * _incr and _wrap instead of registers which cause spilling.  A
 * macro centralizes the memory test logic.
 */
#define MEMTEST(name,type,_incr,_wrap) ...

/* Sequential tests. */
MEMTEST(do_mem_seq8,   b8, 97, 1)
MEMTEST(do_mem_seq16, b16, 50839, 1)
MEMTEST(do_mem_seq32, b32, 3999971, 1)
MEMTEST(do_mem_seq64, b64, 3999971, 1)

/* Random tests. These test try to randomize both the data and the
 * address access.
 */

/* 97/0x61 prime for char and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd8,b8,97,9999991)
/* 50839/C697 large prime for 64k and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd16,b16,50839,9999991)
/* 3999971/3D08E3 prime and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd32,b32,3999971,9999991)
/* 3999971/3D08E3 prime and 9999991/0x989677 prime for 64MB. */
MEMTEST(do_mem_rnd64,b64,3999971,9999991)

incr is the data increment and wrap is the address increment. The algorithm for the burst will be the same. Here is some inline gcc assembler,

    register ulong t1 asm ("r0")  = 0;                              \
    register ulong t2 asm ("r4")  = t1 + incr;                      \
    register ulong t3 asm ("r6")  = t2 + incr;                      \
    register ulong t4 asm ("r8")  = t3 + incr;                      \
        /* Run an entire burst line. */                             \
        __asm__ (" stmia  %[ptr], {%0,%1,%2,%3}\r\n" : :            \
                 "r" (t1), "r" (t2), "r" (t3), "r" (t4),            \
                 [ptr]"r" (start + (addr<<2)) :                     \
                 "memory" );                                        \
        /* Read four 32 bits values. */                             \
        __asm__ (" ldmia   %[ptr], {%0, %1, %2, %3}\r\n" :          \
                 "=r" (t1), "=r" (t2), "=r" (t3), "=r" (t4) :       \
                 [ptr]"r" (start + (addr<<2)) );                    \

These tests are simple and should fit in the code cache which will maximize stress on the RAM. Our main issue was the DQS delay which is critical for DDR-SDRAM and can be temperature and voltage dependent and will vary with PCB layout and materials.

Cachbench can be used if you are optimizing the memory controller registers with the SDRAM chips. It may also be useful for testing.

See also: Unix Stack Exchange (same question). I used these C based test suites under Linux, but they didn't expose any issues in our case. The memtest86 algorithms may not be as stressful (for PCB glitches) as what I describe above; although test 7 or the burnBX test is close. I think memtest86 caters to find DRAM chip issues as opposed to board design issues.

Edit: Another issue is transients/cross talk with the SDRAM chips. If your device driver is a high current or high frequency device, the SDRAM interface can possible pick up cross talk, or get a double clock due to supply variations. So a RAM test may show no issues and the SDRAM error only happens when a particular portion of hardware is used. Also be careful that the Android device doesn't use dynamic clocking and change the SDRAM frequency. Signals may cross a resonance as the clock changes.

like image 167
artless noise Avatar answered Sep 18 '22 05:09

artless noise


Das U-Boot is perhaps the most widely used boot loader on ARM boards, and it includes some memory test features.

Interestingly, its README suggests an alternative approach that might be more portable and/or more effective:

The best known test case to stress a system like that is to boot Linux with root file system mounted over NFS, and then build some larger software package natively (say, compile a Linux kernel on the system) - this will cause enough context switches, network traffic (and thus DMA transfers from the network controller), varying RAM use, etc. to trigger any weak spots in this area.

While you're building the linux kernel, you might be interested in the CONFIG_MEMTEST=y option, which causes the built-in memory test to be built. This used to be for x86 architecture only, but I believe recent versions support it on other architectures as well, perhaps even ARM.

The memtester tool is already built and available in some linux distributions, for various architectures, including ARM.

The kernel-memtest project might interest you as well.

Bear in mind that no tool can test the memory that it's running from (so a program in a running OS will have significant blind spots) and basic read/write tests won't reveal every type of defect or other error. Set your expectations accordingly, and if you have reason to suspect bad memory, consider trying several different test tools.

like image 40
ʇsәɹoɈ Avatar answered Sep 20 '22 05:09

ʇsәɹoɈ