Last Update: 2011-07-31 09:50:50 -0400

Automatically Running Tasks in Parallel

Consider these task definitions:

task :default => [:a, :b]
task :a => [:x, :y]
task :b

To create the dependency graph, let nodes represent tasks and child nodes represent prerequisites,

     / \
    /   \
   a     b
  / \
 /   \
x     y

Notice the dependency graph already contains the information necessary for parallelizing tasks. In particular, tasks x, y, and b may run in parallel. When x and y are finished, a and b may run in parallel.

So why, then, does multitask exist? Rake already has the information it needs — why must we provide it a second time? The short answer is probably that multitask was easy to implement, while dynamically finding parallelizable tasks and safely executing them in separate threads requires a little more code.

Rake now supports the -j option which does exactly this. To run up to three tasks in parallel,

% rake -j3

or equivalently,

% rake --threads 3

Without a number,

% rake -j

means as many tasks as possible will run in parallel: the dependency graph alone determines the number.

A Note on Dependencies

Without multitask or -j, Rake invokes tasks in a predetermined order. Rakefile authors inevitably come to rely on this order. Even though their dependency graph may be wide and hierarchical, the only graph which has ever been tested is linear: one which results from visiting each node in sequence.


task :a => [:x, :y, :z]

When Rake runs in single-threaded mode (the default), tasks x,y,z will be invoked in that order before a is invoked, assuming no other rules exist involving these tasks. However with -j N for N > 1, one should not expect any particular order of execution. Since there is no dependency specified between x,y,z above, Rake is free to run them in any order.

The problem of underspecified dependencies plagues Makefiles as well (GNU make has supported -j for some time).

Migrating to -j

Suppose -j produces errors from insufficient dependencies in your Rakefile. Do you want to bother fixing them? If you are satisfied with your build time then there is really no reason to use -j.

If on the other hand your build takes twenty minutes to complete, you may be interested in getting the full dependency graph correct in order to take advantage of multiple CPUs or cores.

Though Rake cannot fathom what you mean by a correct dependency, there is a tool available which may help you get closer to saying what you mean:

% rake --randomize[=SEED]

where the optional SEED is an integer. This will randomize the order of sibling prerequisites for each task. Runs with the same SEED will produce the same sibling permutations. If no SEED is given then one will be generated. SEED is reported when Rake exits.

Though this option may cause an error due to underspecified dependencies, with SEED at least it will be an error which is exactly the same on each run. In addition you’ll have the major debugging advantage of using a single thread.

Task#invoke inside Task#invoke

Parallelizing tasks means surrendering control over the micro-management of their execution. Manually invoking tasks inside other tasks is rather contrary to this notion, throwing a monkey wrench into the system. An exception will be raised when this is attempted in -j mode.

What is the Difference Between -j and Multitask?

What if you replaced task with multitask everywhere in your Rakefile? Isn’t that the same as -j?

It is effectively the same when the number of tasks is small. However an all-multitask Rakefile becomes problematic as the number of tasks and dependencies increase. The number of lines in the dependency graph would be equal to the number of threads running simultaneously. Multitask blindly fires off N threads for the N prerequisites of a task.

A simplistic multitask setup for compiling 100 C files might spawn 100 threads running 100 compiler processes all at the same time. To reduce the load you could make several reasonably-sized multitasks then tie them together with a regular task. Or you could just use -j.