Saturday, January 7, 2012

Who does the cleanup, the process or the kernel?


For a long time I believed that cleanup is work that the kernel does. However that is not completely true. Part of cleanup is also taken by the process itself. But let's dive into the details..

Ways to terminate a program


There are some ways to terminate a process from whithin the process itself(also referred sometimes as "normal termination"):

  • exit()

  • _exit()

  • return


(break can not be used, as it must be within a switch or loop.)

There are other ways to terminate the process from outside the process by sending a signal. The signal can be sent from a terminal by typing

[sourcecode light="true"]
kill [processID]
[/sourcecode]

or by hitting CTRL+C after you run the program. This will send a SIGINT signal to the process. There are also ways to send signals from whithin the process, like for example using the function abort(). abort() will send a SIGABRT and thus terminate the process(if default handler for SIGABRT is used). However as the same signal can be sent outside the process by the user explicitly(from the terminal), we can easily put this way of terminating a process in the same basket with the other "abnormal" ways of termination.

Do something on exit( atexit )


This is not something that has an absolute connection with cleanup but I mention it for a deep understanding of what happens on a process' termination.

On any normal termination of a process we can do some specific work. Maybe I want something printed on screen every time the process terminates normally:

[sourcecode language="cpp"]
void onexit(void){
puts("Process terminated normally.");
}

int main() {
atexit(onexit);
/* do stuff here */
return 0;
}
[/sourcecode]

Now every time main returns, our onexit() function is invoked. Pay attention to that even "return 1;" would work as it is still considered a normal termination of the process. Remember, "normal termination" has not to do with the value returned, but rather with what caused the process termination(was it an outside signal or was just the exit function invoked?).

A more compilcated example using atexit() would be to free some memory after exiting the program:

[sourcecode language="cpp"]
#include
#include

void *a; //we need global scope

void* allocSomeMemory(void* a) {
return malloc(10);
}

void freeSomeMemory(void) {
free(a);
}

int main() {
a=allocSomeMemory(a);
atexit(freeSomeMemory);
/* do stuff with variable a */
return 0;
}
[/sourcecode]

This will allocate 10 bytes to the global variable a. Then we add a handler(a function that takes care of something) to free the memory allocated whenever we exit the process.

Now this will work, however if it's a practical thing to do is a subjective thing. The kernel is going to free all resources for the process anyway so we just add more delay to the termination by freeing memory explicitly on exit. However this can be good practice for debugging in some cases.

What is process cleanup?


Cleanup is just the state of bringing back the kernel's resources to as they were before running a program. That means freeing memory, flushing buffers, closing files, removing the process ID from the process table in the kernel, decrementing counters for open files, removing kernel timers, sending signals to the parent of the process, and much more.

There are two main players when it comes to cleanup:

  • The kernel

  • The process


We will start with the process cleanup which is somehow bulkier to describe.

Cleanup from the process


Process cleanup can occur in both normal and abnormal termination scenarios. In normal termination the cleanup occurs when exit() or return from main occurs. In fact returning from main is going to invoke exit() automatically. The process itself has an overhead(which it got when we compiled it) that tells it what to clean and how. Cleanup of the process are considered these things:

  1. Do some last work if there is registered with atexit()

  2. Flush all unwritten buffered data

  3. Close open streams

  4. Remove all files created by function tmpfile()

  5. Return the exit status and control to the kernel


Personally I don't think point 1 is so much of a cleanup in the broader sense, but it gives the opportunity to the programmer to add some default behavior on every normal termination that in many cases would be to tidy up things.

Now all this occurs with the call of exit(). There is an exit function that will bypass all first 4 steps and go directly to the 5th step. That function is _exit(). So calling _exit() instead of exit() is actually going to give control directly to the kernel.

"Ok, so what's the big deal with using _exit() instead of exit() and vice versa?"

In most cases exit() is the way to go. However some times you don't want the same things "cleaned twice". One example is using fork(). With fork(), a child process is created. The child inherits a lot of things by the parent process and amongst others the parent's buffers. If exit() is called from the child, the inherited buffers will be flushed. Later on when the parent also exits, it will flush its buffers as well. In this scenario we will get double output.

Using _exit() in the child, we bypass the flushing from the child process and thus we don't get unnecessary side effects(like double output).

Cleanup from the kernel


No matter if exit() or _exit() is used, in the end kernel is the big reaper. We will not go too deep into what excactly the kernel does but a few points in the cleanup routine are:

  • destroying kernel structures that were created for the process

  • memory allocated for the process is freed

  • decrementing open files

  • sending signals to the parent process


At this point the process is dead, that is, it's not loaded in the memory. A very few structures are still present in the kernel solely in case the parent process might be interested. This is what we call a zombie process.

In order for these last structures to be destroyed, the parent must wait() for the child process. Once that has happened, the zombie process disappears and all resources in conjunction with the dead process are free.