Transcript
  • Servers and Tools

Forking

From the class:  Processes

You've already seen a couple of examples of parent child processes whereby a parent creates a process. Like if we were to run a process in Bash. And now I'm going to show you how we can do that manually and what it actually does. So I'm using Node.js because it has a really cool feature called cluster, which we can use to create a cluster of processes.

What it's called when a process creates another process is forking. So the terminology is to fork another process. And what forking means is that we're going to take our entire current process that's running-- so keep this diagram right here in mind-- and we're going to do is to copy it. So we're going to take the stack, the code, everything that's in memory, the file descriptors, and literally make another copy of this. And that will become a new process with its own pid that's going to be running independently but as a child of the parent process.

So forked processes are independent. They have their own space, their own memory, their own stack. But they're going to be a copy of their parent. So what we're doing here is we're creating two worker processes, and then we'll listen for the exit event. And on exit we'll just print something out to the console.

And then if we're not the master, if we're actually the forked copy-- node gives us this nice way of just checking that with this if statement-- then what we'll do is create an HTTP server. So what we're going to have here is two HTTP servers running, and HTTP services do what we had before. Which is to listen on port 3,000 and print hello world out to the console. So let's try out this program and see what it actually does.

I've named this program cluster.js, and we can run it with Node.js. And it's just going to tell me that the master process is online with a pid of 9408, and the two workers are online as well. And notice that each one gets their own process identifier because they're each their own independent processes.

Now down below I can use the ps command with the af options to see that I have my master process running here. And notice that it is a child a Bash, because we created it up here. It's actually just running in the foreground. And then it has two child processes, which were created as copies of this parent. But each one has its own pid. So 9411, 9410, and 9408.

And just like before, if we kill this parent process then all the children will die automatically. But we can kill the individual children and that's OK. First of all, let's just make sure that it's doing what we expect. So I'm going to curl HTTP local host 3000. That'll send an HTTP request, and we get back hello world. So that's all good.

And now let's send a sig term signal to the process ID of 9411. Oops, OK. Now at the top it says that worker 9411 has died. And let's take a look at our process tree again down below. I'm going to clean this up, and we'll say ps af. And notice that there's only one child now that's underneath cluster.js. Only one child process of this parent process. However, we can still come down here and make HTTP requests, because there's still one worker running.

There we go. So it's still working, and what was happening under the hood in case you're curious, is Node.js is round robining these requests. So sending them first to this and then to the second process and then back to this process. And it's just sending it back and forth back and forth, distributing the traffic between these two workers.

And now that 9411 has died, it just sends the request to this one worker that's still alive. So we're going to see a couple of examples later where programs like Meteor actually end up creating several children processes that it then manages from one master process.

Before we wrap up, let me talk about one last concept called threading, which is similar to creating new processes, but a little bit different. And in case you hear the terminology, you should know what it means. A thread makes copy as well, but it's a lighter weight copy. So each thread gets its own code and its own call stack, but they all share the memory, file descriptors, and security attributes.

And so the main thing is that they share memory with the main master process. But we can use threads to execute code simultaneously, and either take advantage of multiple CPUs or we can take advantage of doing disk IO at the same time that we're using the CPU or something like that. So threads allow us to execute code simultaneously, but they share memory with the master process, as where processes get their own memory, own code, own stack.

So they're a little bit more heavyweight, but sometimes simpler to work with, because you don't need to worry about the threads conflicting with each other since they're working with the same memory. So just wanted to give you a quick highlight of what threads are in contrast to processes. And I'll cover them in more detail in another class if enough of you are interested in.