Basic concurrency with fork()

2021-06-23 programming linux unix concurrency c

When programming on Linux/Unix, the function fork(2) allows us to create new processes. This is a short and example-driven introduction to the basics of using that function.

First stab

After a brief look at the manual page for fork(2) we can tell that it returns a value of type pid_t and does not take any arguments:

NAME
     fork – create a new process

SYNOPSIS
     #include <unistd.h>

     pid_t
     fork(void);

Let us simply call fork and see what happens and what values does it return.

#include <unistd.h>
#include <stdio.h>

int main(void)
{
    pid_t p;

    p = fork();
    printf("p = %d\n", p);

    return 0;
}

When we compile and run the above program, we can see that it calls printf twice:

p = 0
p = 29837

This is because when we call fork, the system creates a new process (child) that is almost an exact copy (see fork(2) to see the full list of exceptions) of the original (parent) process.

The new process is created right before the printf call, so in fact, we execute printf in the child and in the parent process almost at the same time.

The return values of fork indicate if the call was successful, but also allow to check whether we are in the parent or in the child process.

RETURN VALUES
     Upon successful completion, fork() returns a value of 0 to the child
     process and returns the process ID of the child process to the parent
     process.  Otherwise, a value of -1 is returned to the parent process, no
     child process is created, and the global variable errno is set to
     indicate the error.

Let us now modify the above code:

#include <unistd.h>
#include <stdio.h>

int main(void)
{
    pid_t p;

    p = fork();

    if (p > 0)
        printf("Parent says: p = %d\n", p);
    else if (p == 0)
        printf("Child says: p = %d\n", p);
    else 
        printf("Something went wrong\n");

    return 0;
}

When we run the above program, we get an output similar to this:

Parent says: p = 15334
Child says: p = 0

Process IDs

Each process can check its process ID (PID) by using getpid(2). The parent process knows what is the child’s PID: it is the value returned by the fork(2) call. The child process can obtain its parent’s PID by using getppid(2).

Life and death

Let us see what happens when the parent process exits before the child process:

#include <unistd.h>
#include <stdio.h>

int main(void)
{
    pid_t p;
    const char *s;

    p = fork();

    if (p > 0) {
        s = "Parent";
    } else if (p == 0) {
        s = "Child";
        sleep(5);
    } else {
        s = "Problem";
        printf("Something went wrong\n");
    }

    printf("%s finished (p = %d)\n", s, p);

    return 0;
}

When we run the above program we notice that the parent process exits and we get our shell prompt. Then, after 5 seconds the child process prints out Child finished (p = 0):

% ./a.out
Parent finished (p = 98878)
% Child finished (p = 0)

Note the odd % character that does not appear in the code. It is our shell’s prompt. This is because when the parent process exits, the control is returned to the shell. Then our child process, which has finished sleeping, writes to stdout, which then comes after the %. To make the program wait for the child to exit you could use wait(2) in the parent process.

Executing other files

When we want to execute some external binary in our code, then we can use execve(2). Let us try the following example, where we execute env(1):

#include <unistd.h>
#include <stdio.h>

int main(void)
{
    char *const a[] = { "/usr/bin/env", NULL };
    char *const e[] = { "A=1", "B=2", NULL };

    execve(a[0], a, e);

    printf("Hello\n");

    return 0;
}

When we run the above code, we notice that the environment contains the declared variables.

% ./a.out
A=1
B=2
%

However, we do not see the final print that we have in the code. What is going on?!

This is actually explained in the first sentence of the manpage execve(2):

DESCRIPTION
     execve() transforms the calling process into a new process.

It means that nothing after the execve call is going to get executed. We can solve this by using fork:

#include <unistd.h>
#include <stdio.h>

int main(void)
{
    char *const a[] = { "/usr/bin/env", NULL };
    char *const e[] = { "A=1", "B=2", NULL };

    if (!fork())
        execve(a[0], a, e);

    printf("Hello\n");

    return 0;
}

Now we get the following output:

% ./a.out
Hello
A=1
B=2
%

Notice how Hello appears before the output of env. This is because things are running concurrently and opening the binary file and then running it takes more time than executing printf.