Saturday, March 7, 2009

Offloading "arbitrary" JS Objects to Workers / Async drawing for bespin

While doing the experiment to marry Joose and bespin I got sucked directly into bespin development. I started digging into ways to offload part of bespin's painting work into web workers. Web workers are independent JavaScript processes which communicate with the primary page using message events. Because they run in different threads they don't block the UI of the web page and can thus safely do complex calculations without hampering it's responsiveness.

Offloading arbitrary objects into workers
I started with a prototype to offload the work bespin is doing to do syntax highlighting into workers. bespin currently cheats by only looking at the lines which are actually visible (many editors do this). This strategy is fast but might lead to errors when there is not enough information within the currently visible lines.
I'm a lazy person and I had some free time so I decided to build a more general solution which could be used to put all kinds of things into workers. Thus I build a framework that creates a facade for any given JS object that acts like the original object but delegates all work to another objects which actually has all the functions and state of the original object but lives inside a worker. Clear? :)


Turning the whole syntax highlighting framework of bespin into a system that runs in the background is now as easy as this:
this.syntaxModel = new bespin.worker.WorkerFacade(new bespin.syntax.Model());

Of course, there is a little more to it:
There are some important limitations for the objects which can be passed to workers:
  • The objects may not reference DOM nodes (because workers have no access to them)
  • The objects may not have circular references (this restriction could be lifted)
  • And most importantly: None of the functions may be closures. This might seem harsher than it is. It basically means that you need to maintain all state inside the object.
Also, all method calls on the facade will be async. I chose to do a jQuery-style fluid-interface for registering callbacks.

Aside: The Web Worker API
Creating a web worker is as easy as calling new Worker("myWorker.js"). The WorkerPool implementation is Google Gears also included an API to create Workers from a string of JavaScript source. Unfortunately this part did not make it into the official spec. My system needs this feature, so I tried a couple of work arounds. First I tried to put the source into a data URI and use that. I found out that this was even suggested when discussing the spec. Unfortunately at least Safari 4 does not seem to support data URIs for workers. (Should this be reported as a bug to the WebKit team?) The next solution I came up with was to use a small bootstrapping worker that loads the source from the hash-part of its URI. This works well but might have security implications.

Building the Facade
The actual facade object is basically a copy of the original object with all methods exchanged with methods to call into the background worker. The source is quite scary. If you, like me, like source code, please enjoy, otherwise no need to read it:
createFacade: function (obj) {

var facade = this;

for(var prop in obj) {
if(prop.charAt(0) != "_") { // supposedly we dont need "private" methods. Delete if assumption is wrong
(function () { // make a lexical scope
var val = obj[prop];
var method = prop
if(typeof val == "function") { // functions are replaced with code to call the worker
facade[prop] = function () {
var self = this;
var index = CALL_INDEX++ // each call gets a globally unique index
var paras = Array.prototype.slice.call(arguments);
if(this.__hasWorkers__) {
var data = {
callIndex: index,
method: method,
paras: paras
}
if(!USE_GEARS) data = dojo.toJson(data) // here we should really test whether our postMessage supports structured data. Safari 4 does not
// send the method to a worker
this.__getWorker__().postMessage(data)
} else {
// No worker implementation available. Use an async call using
// setTimeout instead
var self = this;
window.setTimeout(function () {
var retVal = self.__obj__[method].apply(self.__obj__, paras);
var callback = self.__callbacks__[index];
delete self.__callbacks__[index]
if(callback) {
callback(retVal)
}
}, 0)
}
// Return an object to create a "fluid-interface" style callback generator
// callback will be applied against context
// callback will be part of the mutex
// paras is an array of extra paras for the callback
return {
and: function (context, mutex, paras, callback) {
var func = callback
if(mutex instanceof bespin.worker.Mutex) {
mutex.start()
func = function () {
callback.apply(this, arguments)
mutex.stop()
}
}

self.__callbacks__[index] = function () {
paras = Array.prototype.slice.call(arguments).concat(paras)
func.apply(context, paras)
}
}
}
}
}
else {
// put instance vars here, too?
}
})()
}
}
},
Inside the source you see this:
this.__getWorker__()
By encapsulating access to the actual worker behind the facade it is possible to put multiple workers behind one object. I currently implemented a simple round robin scheduling, but this could of course be extended to use a more sophisticated mechanism (e.g. picking any currently idle worker) (Beware that multiple workers only work reliably for "stateless" objects).

Gears WorkerPool VS. Web Workers
The Gears WorkerPool API predates the Web Workers API. It is very likely that the Gears API will eventually change to more closely follow the W3C API. For now I implemented a simple facade for the Gears API to make it look like the Web Worker API.
This works quite well with some limitations:
  • DOM Message Events are currently not supported (Support could be added)
  • The current postMessage interface which is used to communicate with workers does not support structured data. My implementation uses Gear's native support for structured data and uses JSON for standard workers
Again here is the source for the facade if your care:
// If there is no Worker API (http://www.whatwg.org/specs/web-workers/current-work/) yet,
// try to build one using Google Gears API
if(typeof Worker == "undefined") {
BespinGearsInitializeGears() // this functions initializes Gears only if we need it
if(window.google && google.gears) {
USE_GEARS = true; // OK, gears is here

var wp = google.gears.factory.create('beta.workerpool');
var workers = {};
Worker = function (uri) { // The worker class
this.isGears = true;

// We can pass the source directly. So we decode the source ourselves.
// To make this more general purpose we could of course also load the
// actual JS file.
var source = uriDecodeSource(uri)

this.id = wp.createWorker(source)
workers[this.id] = this;
}

Worker.prototype = { // we can post messages to the worker
postMessage: function (data) {
wp.sendMessage(data, this.id)
}
}

// upon receiving a message we call our onmessage callback
// DOM-Message-Events are not supported
wp.onmessage = function (a, b, message) {
var worker = workers[message.sender];
var cb = worker.onmessage;
if(cb) {
cb.call(worker, {
data: message.body
})
}
}
}
}

Mutexes / Semaphores / etc.
Once you do real multi-process programming with JavaScript you need to start to worry about some of the issues that come with programming parallel systems (not all of them, because workers are more like processes so there are no issues related to pure multi-threading).
Bespin has a quite complicated paint() method which does the actual drawing of the editing area using a canvas-tag. This paint method calls the syntax highlighter for every line that is draws upon every paint. Because all those calls happen asynchronously, potentially in parallel and in undefined order, I needed a mechanism which basically says: Execute this code right after all those async calls finished.
My current solutions is to have an object which maintains a counter of method calls belonging to the same group. Starting an async method increments the counter, calling the method's callback decrements it. You can schedule yet more callbacks which execute once the counter reaches zero again (when all async method calls are finished). I called the class Mutex which is probably wrong but at least on topic.
Even more source for the interested:
//** {{{ bespin.worker.Mutex }}} **
//
// Object that maintains a counter of running workers/async processes.
// Calling after(callback) schedules a function to be called until all
// async processes are finished
//
// Is Mutex the correct term?
dojo.declare("bespin.worker.Mutex", null, {
constructor: function(name, options) {
this.name = name;
this.count = 0;
this.afterJobs = [];
this.options = options || {}
},
start: function () {
this.count = this.count + 1
},
stop: function () {
this.count = this.count - 1
if(this.count == 0) {
if(this.options.onlyLast) {
var last = this.afterJobs[this.afterJobs.length-1];
if(last) {
last()
}
} else {
for(var i = 0; i < this.afterJobs.length; ++i) {
var job = this.afterJobs[i];
job()
}
}
this.afterJobs = []
}
},
after: function (context, func) {
this.afterJobs.push(function () {
func.call(context)
})
}
})

Conclusion
While my current patch will not make it into bespin (because the current syntax highlighter is so fast that the overhead of the Worker is larger than the win) I still think that the approach above has a lot of potential (e.g. analyzing more lines than the currently visible ones). "Transparently" moving objects into workers while for the most part maintaining the original interface at least gives JavaScript a nice Erlang touch (as @psvensson put it).

Update:
I should clarify that it is no problem for functions to pass the worker-bounday which generate closures. Non of the functions which are already present inside the object which is sent to a worker may include a closure, though.
Post a Comment