Operating system question

Question

I recently asked myself: If a program, like Mozilla Firefox for example, is started - the control must be somehow given to it. But when the program crashes, why doesn't my whole system crash like in early Windows version?

How can Windows take back the control from the program, or even not give it to it fully?

(Note: This is not my homework; I go to school but in my informatics class are really only guys that would answer with "Can I eat that?" when I ask them about kernels. Same with my teacher.)

Pavel Radzivilovsky · Accepted Answer

That's the story about rings and exceptions. Access violation would throw control to a pre-set OS handler to decide what to do. A program may also set a handler, but if it doesn't, it is an unhandled access violation, which is one of things you call a crash.

In some cases, such mechanism is used for good things. For example, this is how page faults work, when disk immitates actual memory. The OS catches access violation and loads the needed stuff, and then resumes program execution as if nothing happened.

Other things may cause crash.

Invalid instruction will also be caught by OS. If it's a valid instruction from a newer, not yet supported (by the CPU) instruction set, OS will implement it in software. If not, it will declare an unhandled exception and shut your process.

Access to hardware ports from a process which is not running in proper mode, would also cause the program to crash.

Blue screens are caused by deliberate call of a special function, known as KeBugCheckEx(). This will be done by the kernel or device drivers running in kernel mode. This is to announce that they reached themselves an inconsistent logical state, and they are important enough to believe this is a great reason to bring the whole system down immediately, to avoid further damage to the hardware or other components.

BlueRaja - Danny Pflughoeft · Answer

Usually, a crash will cause an interrupt to occur in the processor. The OS has handlers set up for each of these interrupts, so at that point control is given back to the OS.

Not all interrupts are bad (eg. IO interrupts for reading from disk/network). However, when the OS does encounter a bad interrupt, it will either:

Ignore it and allow the program to continue running
Shut down the program and (most OS's) notify the user
If the program causes an interrupt the OS can't ignore and is unable to recover from, the OS itself will crash.

As for how the OS can not give full control to programs: modern processors have a flag (called the PE bit) which determines if a process is running at full privileges (kernel mode) or limited privileges (user mode). User-mode programs are isolated from one another, and must communicate with each other through the OS ("system calls")

Daniel A.A. Pelsmaeker · Answer

It is actually very simple. Because Windows is a multitasking operating system, it continually switches (each X milliseconds) from one application to the next. By giving each program very often a very short time to run, it creates the illusion that the programs are working simultaneously.

When an application hangs, the application is probably in a long (possibly endless) loop. Windows keeps giving the application a short time to run and doesn't notice this (unless you want to interact with the application and it doesn't respond within a second). This is the first type of 'crash'.

In the second type, a real crash, some serious error occurred so that Windows cannot let the program continue. For example, the program attempts to write to a memory area that is reserved for some other program or Windows itself. The processor has a build-in mechanism which generates an interrupt (sort of event for the processor) when this happens. Windows is programmed to react on this interrupt, and because it has no way to fix the problem, it will treat the program as being 'crashed' and will terminate it immediately.

As mentioned, writing to the wrong memory address causes an (protection) interrupt by the processor automatically. Other things which may cause such an interrupt for an unrecoverable error are amongst others:

Reading from an unallowed memory address
Insufficient memory for this specific application (however, paging mostly removes this problem)
Attempt to execute unexecutable memory (for example, data)
Jumping to an invalid address (e.g. in the middle of a machine instruction)

Windows constructs special tables which are used by the Memory Management Unit (MMU) on the processor, which contains information about which areas of the memory the current process can access. For each process, this table is different. Obviously, because each process resides at a different location in memory, and it has to be able to access its own data and code.

So the OS using special access tables, combined with protection interrupts fired by the processor, are mainly the reason that a program doesn't take the whole operating system with it. Otherwise, timesharing allows the rest of the OS and programs to continue when a program is hanging.

JustJeff · Answer

In early windows, there was no real isolation between processes, and no real preemptive scheduling. Multiple processes had share the system resources in something known as 'cooperative multitasking'. So if one process stopped cooperating, even by accident, your whole system was toast.

Modern OS's, (and windows, since NT/2K anyway) isolate processes from one another by use of virtual memory, and control is periodically transferred from one process to the next by a hardware interrupt driven timing mechanism known as preemptive multitasking. If one process goes bonkers and gets into a tight loop, it's only a matter of time (milliseconds!) before the dud process is preempted, the OS gets control, and transfers it to the next process. If a process goes berserk and dereferences a bad pointer, it cannot corrupt another process's data because the memory management unit (MMU) has each process's virtual memory mapped to different areas of physical memory.

Now, detecting when a program is off the rails is another matter. Maybe you WANT to spin in a tight loop, is it up to the OS to decide that's a crash? So generally you don't see a program that's gone into a loop terminated, but you will see it load down the CPU. How much load depends on how details of the OS scheduler, but generally the system soldiers on. Bad pointers are easier to recognize, the null pointer being the most obvious one. Modern CPUs usually have segment descriptors that can be used to recognize when an illegal memory reference has been attempted, for example, using up all the stack space allotted to the process. The MMU will usually allow programs fairly liberal access to the address space, but if the OS designer so desires, the MMU can be configured to put certain virtual addresses off limits, and if a program tries to access one of these areas, an exception will result which will let the OS immediately seize control and deal with the offending process.

Operating system question

Tags:

operating-system

kernel

ual

4 Answers

Pavel Radzivilovsky

BlueRaja - Danny Pflughoeft

Daniel A.A. Pelsmaeker

JustJeff

Recent Activity

Donate For Us

Operating system question

Tags:

operating-system

kernel

ual

4 Answers

Pavel Radzivilovsky

BlueRaja - Danny Pflughoeft

Daniel A.A. Pelsmaeker

JustJeff

Related questions

Recent Activity

Donate For Us