I'm having the strangest bug I've ever seen with a linux system right now and there seem to be only two possible explanations for it -
Well let me give you some background. I'm currently writing a c++ program for raspberry pi gpio manipulation. There are no visible error in the program as far as I know & since it works with sudo successfully and with delays successfully too. So here's how rpi's gpio work -
First you've to export one, to reserve it for manipulation, it will create a new directory as
gpio+number
with several files in it.
echo 17 > /sys/class/gpio/export
Then set it's direction(in means read and out means write)
echo "out" > /sys/class/gpio/gpio17/direction
Then write the value (0 or 1 for off and on)
echo 1 > /sys/class/gpio/gpio17/value
At the end, unexport it back, the directory will get deleted.
echo 17 > /sys/class/gpio/unexport
It doesn't matter whether you do this through bash commands or through c/c++ or any other language IO, since in unix these are just files and you just need to read/write to them. Everything works fine till now. I've tested this manually and it works, so my manual test passes.
Now I've a simple test written for my program which looks like this -
TEST(LEDWrites, LedDevice)
{
Led led1(17, "MyLED");
// auto b = sleep(1);
EXPECT_EQ(true, led1.on());
}
The Led class constructor
does the export part - echo 17 > /sys/class/gpio/export
, while the .on()
call sets the direction - echo "write" > /sys/class/gpio/gpio17/direction
and outputs the value as well - echo 1 > /sys/class/gpio/gpio17/value
. Forget about unexport here since it is handled by destructor and plays no role here.
If you're curious, these functions handle I/O like this -
{
const std::string direction = _dir ? "out" : "in";
const std::string path = GPIO_PATH + "/gpio" + std::to_string(powerPin) + "/direction";
std::ofstream dirStream(path.c_str(), std::ofstream::trunc);
if (dirStream) {
dirStream << direction;
} else {
// LOG error here
return false;
}
return true;
}
means basic c++ file/io. Now let me explain the bug.
First, here are 3 runs of same test -
Normal run
FAILS
[isaac@alarmpi build]$ ./test/testexe
Running main() from gtest_main.cc
[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LEDConstruction
[ RUN ] LEDConstruction.LedDevice
[ OK ] LEDConstruction.LedDevice (1 ms)
[----------] 1 test from LEDConstruction (1 ms total)
[----------] 1 test from LEDWrites
[ RUN ] LEDWrites.LedDevice
../test/test.cpp:20: Failure
Value of: led1.on()
Actual: false
Expected: true
[ FAILED ] LEDWrites.LedDevice (2 ms)
[----------] 1 test from LEDWrites (3 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (6 ms total)
[ PASSED ] 1 test.
[ FAILED ] 1 test, listed below:
[ FAILED ] LEDWrites.LedDevice
1 FAILED TEST
run with sudo
PASSES
[isaac@alarmpi build]$ sudo ./test/testexe
[sudo] password for isaac:
Running main() from gtest_main.cc
[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LEDConstruction
[ RUN ] LEDConstruction.LedDevice
[ OK ] LEDConstruction.LedDevice (1 ms)
[----------] 1 test from LEDConstruction (2 ms total)
[----------] 1 test from LEDWrites
[ RUN ] LEDWrites.LedDevice
[ OK ] LEDWrites.LedDevice (2 ms)
[----------] 1 test from LEDWrites (2 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (5 ms total)
[ PASSED ] 2 tests.
wtf delay run
PASSES has uncommented // auto b = sleep(1);
[isaac@alarmpi build]$ ./test/testexe
Running main() from gtest_main.cc
[==========] Running 2 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LEDConstruction
[ RUN ] LEDConstruction.LedDevice
[ OK ] LEDConstruction.LedDevice (1 ms)
[----------] 1 test from LEDConstruction (2 ms total)
[----------] 1 test from LEDWrites
[ RUN ] LEDWrites.LedDevice
[ OK ] LEDWrites.LedDevice (1001 ms)
[----------] 1 test from LEDWrites (1003 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (1005 ms total)
[ PASSED ] 2 tests.
The only difference b/w delay and normal run is of single uncommented line - // auto b = sleep(1);
Everything is same including device, directory structure, build conf and everything. The only things that explains this is linux might be creating that file and its friends sometimes later or it takes some time? and I call .on()
before that. Well that could explain it...
But then why does sudo invocation with no delay passes? Does it makes those writes faster/instant or does it puts the delay statement by itself? Is this the cause of some kind of buffering? Please say no :/
If it matters, I'm using following dev rule for getting non-sudo access to gpio directory -
SUBSYSTEM=="bcm2835-gpiomem", KERNEL=="gpiomem", GROUP="gpio", MODE="0660"
SUBSYSTEM=="gpio", KERNEL=="gpiochip*", ACTION=="add", PROGRAM="/bin/sh -c 'chown root:gpio /sys/class/gpio/export /sys/class/gpio/unexport ; chmod 220 /sys/class/gpio/export /sys/class/gpio/unexport'"
SUBSYSTEM=="gpio", KERNEL=="gpio*", ACTION=="add", PROGRAM="/bin/sh -c 'chown root:gpio /sys%p/active_low /sys%p/direction /sys%p/edge /sys%p/value ; chmod 660 /sys%p/active_low /sys%p/direction /sys%p/edge /sys%p/value'"
EDIT - As @charles mentioned, I used std::flush
after every write I made on I/O operations. Still failing.
Strace to the rescue
Let's see the execution of the failing build command -
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/gpio17/value", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)
open("/sys/class/gpio/gpio17/direction", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = -1 EACCES (Permission denied)
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
..., 0666) = -1 EACCES (Permission denied)
Okaaay, here's something, that explains why it is passing with sudo. But why is it passing with delay? Let's check that too,
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/gpio17/value", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/gpio17/direction", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
No wait, wtf? This means the permission denied must be for if files aren't created at that time. But how does using sudo
solves that?
Here's relevant output for sudo -
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/export", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/gpio17/value", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
open("/sys/class/gpio/gpio17/direction", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 4
open("/sys/class/gpio/unexport", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
There is a race between udev and your program. When you write to /sys/class/gpio/export
, the write will not return until the GPIO is fully created. However, once it has been created, you have two processes that simultaneously take action on the new device:
/sys/class/gpio/gpio17/value
./sys/class/gpio/gpio17/value
.So there is some chance that your program will open the value
file before udev has changed its ownership and permissions. This is in fact very likely, because your udev handler does an execve of a shell which then execve's chown and chmod. But even without that, the scheduler will normally give priority to the task that was already running when returning from a syscall, so your program will usually open the value
file before udev has even woken up.
By inserting a sleep, you allow udev to do its thing. So to make it robust, you could poll the file with access() before opening it.
It would also help by giving udev higher priority. E.g. chrt -f -p $(pidof systemd-udevd) 3
. This gives udev real-time priority, which means it will always run before your program. It can also make your system unresponsive so take care.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With