I write a c program to test linux scheduler. this is my code:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
void Thread1()
{
sleep(1);
int i,j;
int policy;
struct sched_param param;
pthread_getschedparam(pthread_self(),&policy,¶m);
if(policy == SCHED_OTHER)
printf("SCHED_OTHER\n");
if(policy == SCHED_RR)
printf("SCHED_RR 1 \n");
if(policy==SCHED_FIFO)
printf("SCHED_FIFO\n");
/* for(i=1;i<100;i++) */
while(1)
{
for(j=1;j<5000000;j++)
{
}
printf("thread 1\n");
}
printf("Pthread 1 exit\n");
}
void Thread2()
{
sleep(1);
int i,j,m;
int policy;
struct sched_param param;
pthread_getschedparam(pthread_self(),&policy,¶m);
if(policy == SCHED_OTHER)
printf("SCHED_OTHER\n");
if(policy == SCHED_RR)
printf("SCHED_RR\n");
if(policy==SCHED_FIFO)
printf("SCHED_FIFO\n");
/* for(i=1;i<10;i++) */
while(1)
{
for(j=1;j<5000000;j++)
{
}
printf("thread 2\n");
}
printf("Pthread 2 exit\n");
}
void Thread3()
{
sleep(1);
int i,j;
int policy;
struct sched_param param;
pthread_getschedparam(pthread_self(),&policy,¶m);
if(policy == SCHED_OTHER)
printf("SCHED_OTHER\n");
if(policy == SCHED_RR)
printf("SCHED_RR \n");
if(policy==SCHED_FIFO)
printf("SCHED_FIFO\n");
/* for(i=1;i<10;i++) */
while(1)
{
for(j=1;j<5000000;j++)
{
}
printf("thread 3\n");
}
printf("Pthread 3 exit\n");
}
int main()
{
int i;
i = getuid();
if(i==0)
printf("The current user is root\n");
else
printf("The current user is not root\n");
pthread_t ppid1,ppid2,ppid3;
struct sched_param param;
pthread_attr_t attr3,attr1,attr2;
pthread_attr_init(&attr1);
pthread_attr_init(&attr3);
pthread_attr_init(&attr2);
param.sched_priority = 97;
pthread_attr_setschedpolicy(&attr1,SCHED_RR);
pthread_attr_setschedparam(&attr1,¶m);
pthread_attr_setinheritsched(&attr1,PTHREAD_EXPLICIT_SCHED);
param.sched_priority = 98;
pthread_attr_setschedpolicy(&attr2,SCHED_RR);
pthread_attr_setschedparam(&attr2,¶m);
pthread_attr_setinheritsched(&attr2,PTHREAD_EXPLICIT_SCHED);
pthread_create(&ppid3,&attr3,(void *)Thread3,NULL);
pthread_create(&ppid2,&attr2,(void *)Thread2,NULL);
pthread_create(&ppid1,&attr1,(void *)Thread1,NULL);
pthread_join(ppid3,NULL);
pthread_join(ppid2,NULL);
pthread_join(ppid1,NULL);
pthread_attr_destroy(&attr2);
pthread_attr_destroy(&attr1);
return 0;
}
In this program, I create one thread with default attribute and two thread whose schedule policy is SCHED_RR and specific priority. My question is: When I run the program, I can barly see the output from thread 1. How can this happen ? I think that thread 1 and thread 2 are real time process and thread 3 is a normal process. So thread 3 will never run until thread 1 and thread 2 exit. But In my program thread 1 and thread 2 never exit, so I expect that only thread 2 can actually run. Why I can see the output of thread 2 and thread 3 and can't see the output of thread 1?
Thread 3 can be run because 0.05s is reserved for non-runtime tasks, disable it via echo -1 > /proc/sys/kernel/sched_rt_runtime_us:
The default values for
sched_rt_period_us(1000000 or 1s) andsched_rt_runtime_us(950000 or 0.95s). This gives 0.05s to be used bySCHED_OTHER(non-RT tasks). These defaults were chosen so that a run-away realtime tasks will not lock up the machine but leave a little time to recover it. By setting runtime to -1 you'd get the old behaviour back.
sched-rt-group.txt
Thread 1 cannot be run because thread 2 has higher priority (note that 99 is the highest RT priority in pthread calls, this is contradictory to internal Linux numbering and explained here), and round-robin scheduling is only performed within a queue of process with the same priority:
SCHED_RRtasks are scheduled by priority, and within a certain priority they are scheduled in a round-robin fashion. EachSCHED_RRtask within a certain priority runs for its allotted timeslice, and then returns to the bottom of the list in its priority array queue.
Understanding the Linux 2.6.8.1 CPU Scheduler
Some notes on how to to find out why this is happening. To do so, we will need source code and dynamic tracer like SystemTap. Switching threads (or more precisely context switch) can be traced via scheduler.ctxswitch probe which is wrapper around sched_switch tracepoint.
Checking source code around that tracepoint says that new task is handled by __schedule function which calls pick_next_task:
3392 next = pick_next_task(rq, prev, cookie);
...
3397 if (likely(prev != next)) {
...
3402 trace_sched_switch(preempt, prev, next);
Crawling into source code leads us to pick_next_task_rt, which, under certain conditions returns NULL instead of our threads. It's SystemTap time!
# stap -e 'probe kernel.function("pick_next_task_rt").return {
if ($return == 0) {
println($rq->rt$) } }' -c ./a.out
...
{.active={...}, .rt_nr_running=2, .highest_prio={...}, .rt_nr_migratory=2, .rt_nr_total=2,
.overloaded=1, .pushable_tasks={...}, .rt_throttled=1, .rt_time=950005330, .rt_runtime=950000000,
.rt_runtime_lock={...}, .rt_nr_boosted=0, .rq=0xffff8801bfc16c40,
.leaf_rt_rq_list={...}, .tg=0xffffffff81e3d480}
SCHED_OTHER
So, it seems that rt_time is greater than 950ms and rt_throttled flag is set when we switch to SCHED_OTHER. Further googling leads to this answer and documentation linked above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With