Wednesday, January 13, 2010

Co...routines

Since I'm currently occupied with implementing Coroutines for the MLVM project (and thus for the Hotspot VM) I decided to share my thoughts and findings:

Co...what?
Coroutines allow a subroutine to return without terminating, so that the next time it is called execution continues after the return statement (most often called yield instead of return).
In contrast to other programming language concepts (e.g. continuations) coroutines were never precisely defined. Coroutines might not even be recognizable as such, and depending on their semantics and expressive power they are called generators, coexpressions, fibers, iterators, green threads, greenlets, tasklets, cooperative threads, etc. (did I miss an important one?)

Coroutines were introduced by [Conway, 1963], and [Marlin, 1980] defines some basic characteristics:
  • Local data (variables, ...) is restored on successive calls.
  • Execution of a coroutine is only suspended when it yields, to be resumed when the coroutine is called again.
Some useful classifications of coroutines are given in [Moura and Ierusalimschy, 2009]:
  • Coroutines that can only yield from within their main method are called stackless. They can be implemented by compile-time transformations but are only useful for a limited range of applications. If coroutines can yield in subsequently called methods they are called stackful.
  • Only coroutines that are not tied to specific language constructs (like a for-each loop) are considered to be first-class coroutines.
  • Symmetric coroutines are controlled by a scheduler-like component that decides which coroutine should run next after a yield. With asymmetric coroutines each yield needs to specify which coroutine should come next.
I propose another classification depending upon how the coroutines interact:
  • Simple coroutines communicate by their actions, without passing a value.
  • Coexpressions can receive and return values, which most of the time will only make sense for asymmetric coroutines (because when passing a value one wants to know where it will go). Coexpressions that only receive a value act as a sink (writing a file, etc.), coexpressions that only return values act as generators (reading a file, etc.) and coexpressions that both receive and return a value can perform complex transformations that need to keep an internal state.


Co...why?
During last years JVM Language Summit it became clear that there is an urgent need for coroutines in Java. Of course every algorithm that is implemented using coroutines can be manually split into an explicit state machine, but in many cases this will be prohibitively expensive. (not only in implementation effort but also performance-wise: compilers create much better code when dealing with local variables and the stack is great at ensuring memory locality!)

Let me introduce a simple example: the SAX parser (an XML parser that doesn't have to keep the whole file in memory because it passes one entity after the other to a callback)
For large or incomplete (streaming) XML documents a SAX parser might be the only way to go, but it is hard to use because the callback needs to keep track of its state and act accordingly.
Using coroutines we have two ways of solving this:
  • We can turn the main program into a coroutine that will be called by the callback or
  • We can use a few lines of code to turn the SAX parser into a coexpression that returns one entity each time we call it. (which IMHO is the most elegant solution)
Simula contains an early implementation of first-class stackful symmetric and asymmetric coroutines which is used for simulation applications. Scripting languages for games (which might be seen as a modern-day equivalent to simulations) also often provide support for coroutines (e.g. lua) because they allow a simulation method to suspend itself until the next event happens.

Co...how?
I'll give a short descritpion of how I think a coroutine API should look like in Java:

Symmetric and asymmetric coroutines should be implemented by the framework (although they could in theory be simulated using one another) in order to provide useful semantics out-of-the-box.
The simplest way of keeping an ordered set of symmetric coroutines is a doubly-linked list (adding, removing and changing order are constant time operations). The simplest scheduling algorithm always yields to the next coroutine in the ring, and a mechanism to implement custom schedules should be provided.

Because of the fundamental differences between symmetric and asymmetric coroutines (and between simple coroutines and coexpressions) I think that these should be provided by distinct interfaces:
Coroutine (for simple symmetric coroutines) and Coexpression
These should in principle work like Threads (created with a Runnable or subclassed).

Coexpression should be generic - this leads to a cleaner API: Coexpression<InT, OutT>

This is what I imagine the API could look like:

class Coroutine {
  public Coroutine();
  public Coroutine(Runnable target);

  public void run();

  public static void yieldNext();
  public static <OutT> OutT yieldCall(Coexpression<Void, OutT> expr);
  public static <InT> void yieldCall(Coexpression<InT, Void> expr, InT input);
  public static <InT, OutT> OutT yieldCall(CoExpression<InT, OutT> expr, InT input);
}

class Coexpression<InT, OutT> {
  public Coexpression();
  public Coexpression(CoRunnable<InT, OutT> target);

  public OutT run(InT input);

  public static <OutT> OutT yieldCall(Coexpression<Void, OutT> expr);
  public static <InT> void yieldCall(Coexpression<InT, Void> expr, InT input);
  public static <InT, OutT> OutT yieldCall(CoExpression<InT, OutT> expr, InT input);

  public InT yieldReturn(OutT output);
}

class Test extends Coroutine {
  public void main(String[] args) {
    new Test();
    yieldNext();
    yieldNext();
  }

  public void run() {
    yieldNext(); 
  }
} 

The example program starts executing in the main method, where it creates a coroutine (which is automatically added to the current thread). Then it switches back and forth between main and run two times (the return from run is an implicit yieldNext).

Whats the current state of this?
I have an early prototype of this running which uses multiple stacks per thread. It is very early (fails horribly on GC, etc.) but I have managed to get very encouraging performance numbers. (an equivalent of ~7 method calls per coroutine switch and a maximum of ~1000000 concurrent coroutines using C2 on a 32-bit machine)
It works by allocating a fixed number of stacks per thread. If there are too many coroutines it will start to copy the stack contents to and from the heap in order to share stacks.
I'll certainly experiment with more esoteric ways of implementing coroutines, but for now it provides a baseline for experimenting with the API, etc.

8 comments:

  1. Can't this be solved by using a normal class? You will create one with a single method (your coroutine equivalent). The instance will have it's state and you will be able to invoke the method as many time as you like. The method could alter the instance state as needed. Am i missing something? Andrea

    ReplyDelete
  2. This is of course an oversimplified example. Lets look at a more complicated one, the aforementioned SAXParser inversion. There's lots and lots of state hidden within the SAX parsers recursive stackframes, which would be incredibly difficult to put into explicit objects!

    public class CoSAXParser extends Coexpression<Void, String> {
      public static void main(String[] args) {
        CoSAXParser coParser = new CoSAXParser();

        while (!coParser.isFinished()) {
          String element = yieldCall(coParser);
          System.out.println(element);
        }
      }

      @Override
      public String run(Void v) {
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("content.xml"), new DefaultHandler() {
          public void startElement(String uri, String localName, String name, Attributes attributes) {
            yieldReturn(name);
          }
        });
        return null;
      }
    }

    ReplyDelete
  3. This looks promising, nice work!

    I have a question about the example code of the CoSAXParser: how will "coParser.isFinished()" be set to true eventually? I was thinking the "run" method would do something like "yieldFinished()" at the end of its body, but it simply does "return null"?

    Also, if you would do another "yieldCall" after it 's finished: then would the parser start all over again, returning the first element? or would it throw some Exception?

    Thanks in advance & keep up the good work :)
    Anthony

    ReplyDelete
  4. Good questions - As a general rule I'd say that a Coroutine should behave like a Thread as much as possible.
    This would mean that isFinished returns true as soon as the coroutine has returned from its run method, and that it is not possible to restart the coroutine without creating a new CoExpression object. (and that an exception will be thrown if one tries to call a coroutine that is already finished)

    Having said this two problems remain (in our SAXParser example): we cannot know if the coroutine will terminate upon the next call (because the SAXParser might or might not finish), and we have to pass a return value upon termination (because yieldCall has to return a value in this case, too).

    Maybe I should have written the loop differently:

    for (String element = yieldCall(coParser); element != null; element = yieldCall(coParser))
      System.out.println(element);

    ReplyDelete
  5. Coroutines are a fabulous match for building scalable systems with blocking I/O. If coroutines can implicitly yeild whenever they hit the disk or blocking network I/O, it vastly simplifies running thousands of concurrent connections without the overhead of the state machine and manually managing nio.

    ReplyDelete
  6. You may also want to look at vtd-xml which is the latest and most advanced xml processing model that combines processing perfomrane, low memory usage and ease of use

    ReplyDelete
  7. Nice work, Lukas!
    Could you elaborate a bit on the difference between co-routines and continuations? Is it the same thing? Or are they different?

    ReplyDelete
  8. Coroutines are equivalent to one-shot continuations. As such they have less expressive power than full continuations. Coroutines will never "fork", and thus coroutines can be implemented by allocating a separate stack for them. If they are implemented this way then switching to another coroutine is a constant-time operation.
    From another point of view coroutines can be thought of as threads that need to pass control to one another explicitly.

    ReplyDelete