I have been playing with ftrace recently to monitor some behavior characteristics of my system. I've been handling switching the trace on/off via a small script. After running the script, my system would crash and reboot itself. Initially, I believed that there might be an error with the script itself, but I have since determined that the crash and reboot is a result of echo
ing some tracer to /sys/kernel/debug/tracing/current_tracer when current_tracer
is set to function_graph.
That is, the following sequence of commands will produce the crash/reboot:
echo "function_graph" > /sys/kernel/debug/tracing/current_tracer
echo "function" > /sys/kernel/debug/tracing/current_tracer
Durning the reboot after the crash caused by the above echo
statements, I see a lot of output that reads:
clearing orphaned inode
<inode>
I tried to reproduce this problem by replacing the current_tracer
value from function_graph to something else in a C program:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
int openCurrentTracer()
{
int fd = open("/sys/kernel/debug/tracing/current_tracer", O_WRONLY);
if(fd < 0)
exit(1);
return fd;
}
int writeTracer(int fd, char* tracer)
{
if(write(fd, tracer, strlen(tracer)) != strlen(tracer)) {
printf("Failure writing %s\n", tracer);
return 0;
}
return 1;
}
int main(int argc, char* argv[])
{
int fd = openCurrentTracer();
char* blockTracer = "blk";
if(!writeTracer(fd, blockTracer))
return 1;
close(fd);
fd = openCurrentTracer();
char* graphTracer = "function_graph";
if(!writeTracer(fd, graphTracer))
return 1;
close(fd);
printf("Preparing to fail!\n");
fd = openCurrentTracer();
if(!writeTracer(fd, blockTracer))
return 1;
close(fd);
return 0;
}
Oddly enough, the C program does not crash my system.
I originally encountered this problem while using Ubuntu (Unity environment) 16.04 LTS and confirmed it to be an issue on the 4.4.0 and 4.5.5 kernels. I have also tested this issue on a machine running Ubuntu (Mate environment) 15.10, on the 4.2.0 and 4.5.5 kernels, but was unable to reproduce the issue. This has only confused me further.
Can anyone give me insight on what is happening? Specifically, why would I be able to write()
but not echo
to /sys/kernel/debug/tracing/current_tracer?
Update
As vielmetti pointed out, others have had a similar issue (as seen here).
The
ftrace_disable_ftrace_graph_caller()
modifies jmp instruction atftrace_graph_call
assuming it's a 5 bytes near jmp (e9 ). However it's a short jmp consisting of 2 bytes only (eb ). Andftrace_stub()
is located just below theftrace_graph_caller
so modification above breaks the instruction resulting in kernel oops on theftrace_stub()
with the invalid opcode like below:
The patch (shown below) solved the echo
issue, but I still do not understand why echo
was breaking previously when write()
was not.
diff --git a/arch/x86/kernel/mcount_64.S b/arch/x86/kernel/mcount_64.S
index ed48a9f465f8..e13a695c3084 100644
--- a/arch/x86/kernel/mcount_64.S
+++ b/arch/x86/kernel/mcount_64.S
@@ -182,7 +182,8 @@ GLOBAL(ftrace_graph_call)
jmp ftrace_stub
#endif
-GLOBAL(ftrace_stub)
+/* This is weak to keep gas from relaxing the jumps */
+WEAK(ftrace_stub)
retq
END(ftrace_caller)
via https://lkml.org/lkml/2016/5/16/493
One of the most common uses of ftrace is the event tracing. Throughout the kernel is hundreds of static event points that can be enabled via the tracefs file system to see what is going on in certain parts of the kernel. See events. txt for more information.
how does ftrace work? ftrace is a dynamic tracing system. This means that when I start ftracing a kernel function, the function's code gets changed. So – let's suppose that I'm tracing that do_page_fault function from before.
The kernel function tracing_off() can be used within the kernel to disable writing to the ring buffer, which will set this file to "0". User space can re-enable tracing by echoing "1" into the file. Note, the function and event trigger "traceoff" will also set this file to zero and stop tracing.
Looks like you are not the only person to notice this behavior. I see
as a report of the problem, and
as a patch to the kernel that addresses it. Reading through that whole thread it appears that the issue is some compiler optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With