Forking with Kernel#fork
We've spent this week shelling out with system, backticks, exec, and spawn. fork is a different beast: instead of running an external command, it clones the current Ruby process and executes a block of Ruby code in the child process.
pid = fork do
puts "[child] pid=#{Process.pid}, parent_pid=#{Process.ppid}"
sleep 1
puts "[child] done"
end
puts "[parent] spawned child #{pid}"
Process.wait(pid)
puts "[parent] child finished"
# Output:
# [parent] spawned child 12345
# [child] pid=12345, ppid=12344
# [child] done
# [parent] child finished
// child.js:
console.log(`[child] pid=${process.pid}, parent_pid=${process.ppid}`);
setTimeout(() => {
console.log("[child] done");
process.exit(0);
}, 1000);
// parent.js:
const { fork } = require("child_process");
// NOTE: Node's fork starts a new Node process running another file.
// It does not clone the current process memory like Ruby/Unix fork().
const child = fork("./worker.js");
child.on("exit", () => {
console.log("[parent] child finished");
});
console.log(`[parent] spawned child ${child.pid}`);
// Output:
// [parent] spawned child 12345
// [child] pid=12345, parent_pid=12344
// [child] done
// [parent] child finished
The unique superpower of fork is copy-on-write memory sharing. It allows preloading a big Ruby app once, then forking multiple worker processes that share the same memory. We see this technique used in all of the popular Ruby web servers to achieve multi-core parallelism with minimal memory overhead.
A tiny example of a pre-forking server might look like this:
require "socket"
NUMBER_OF_WORKERS = ENV["WEB_CONCURRENCY"]&.to_i || 4
puts "Starting server at localhost:9292"
puts "Connect with: nc localhost 9292"
server = TCPServer.new("127.0.0.1", 9292)
NUMBER_OF_WORKERS.times do
fork do
loop do
socket = server.accept
socket.puts "Handled by PID #{Process.pid}"
socket.close
end
end
end
Process.waitall
The idea of forking to conserve memory usage has evolved even further in recent years with Shopify's pitchfork server, which first preforks and then reforks workers after a warmup period to further reduce memory usage by sharing the warmed-up memory pages.
Another use case for fork is isolation for risky code: if a child crashes, bloats memory usage or hangs, the parent process can kill it, keep running and remain unaffected. This technique has been used in some job processing systems like GitHub's resque.
Job processing example:
TIMEOUT_SECONDS = 2
jobs = [
{ name: "good-job", simulate: :success },
{ name: "crashy-job", simulate: :crash },
{ name: "stuck-job", simulate: :hang }
]
jobs.each do |job|
pid = fork do
STDOUT.sync = true
puts "[child #{job[:name]}] pid=#{Process.pid} started"
case job[:simulate]
when :success
sleep 0.5
puts "[child #{job[:name]}] done"
exit 0
when :crash
abort "[child #{job[:name]}] simulated crash"
when :hang
sleep 10
end
end
deadline = Process.clock_gettime(Process::CLOCK_MONOTONIC) + TIMEOUT_SECONDS
loop do
# Non-blocking wait to check if the child process has finished.
waited_pid, status = Process.waitpid2(pid, Process::WNOHANG)
if waited_pid
if status.success?
puts "[parent] #{job[:name]} succeeded"
else
puts "[parent] #{job[:name]} crashed"
end
break
end
# Abort if we've passed the timeout deadline
if Process.clock_gettime(Process::CLOCK_MONOTONIC) >= deadline
Process.kill(9, pid)
puts "[parent] #{job[:name]} timed out (signal 9)"
break
end
sleep 0.05
end
end
puts "[parent] still running after isolated failures"
# Output:
# [child good-job] pid=20795 started
# [child good-job] done
# [parent] good-job succeeded
# [child crashy-job] pid=20798 started
# [child crashy-job] simulated crash
# [parent] crashy-job crashed
# [child stuck-job] pid=20799 started
# [parent] stuck-job timed out (signal 9)
# [parent] still running after isolated failures
It should be noted that, in the context of job processing, using fork to survive chaotic workloads has largely lost popularity since it comes with a large performance trade-off compared to executing jobs in the same process.
History
Kernel#fork has been in Ruby since the beginning and is directly inspired by the Unix fork(2) system call.
It is worth noting that not all languages chose to expose fork directly since it isn't supported on all platforms. But since Ruby set out to be a Unix-based language, it has been a core part of Ruby's concurrency story from the start.