Showing posts with label WebWorkers. Show all posts
Showing posts with label WebWorkers. Show all posts

Wednesday, July 22, 2009

Google Chrome's Very Incomplete Web Worker Support

In bespin we use a facade that allows us running JavaScript code in Web Workers, Gears Workers and if those are not available in the main thread.

Google Chrome does not yet support Web Workers but since it has Google Gears built in it should use the appropriate fallback. Turns out it didn't with our original code.

I naively implemented object detection to check for the availability of the Worker-object like this:
if (typeof Worker == "undefined") {
For some reason the statement above is actually true in Chrome 2, even though as stated above support for the Worker API has not been implemented.

I then tried to instantiate the Worker object. All this does is to throw an exception with the message "Worker is not enabled". This looks like an unfinished implementation that was only partially removed or something in that direction.

This code handles the special case:
var noWorker = typeof Worker == "undefined" ? true : false;
if(!noWorker) {
try { new Worker("test.js") }
catch(e) {
e = e + ""
if(e.indexOf("Worker is not enabled") != -1) {
noWorker = true;
}
}
}
I'd be very interested why this fragment of the worker implementation was left in the code? Most likely it is a bug but it is questionable whether it will be fixed since the next version of Chrome will actually support the Worker API.

Monday, May 11, 2009

One event loop to rule them all

While experimenting with web workers in bespin I made a small change that makes working with them a lot nicer. In bespin we make heavy use of custom events to loosely bind things together in the application. So you see a lot of
bespin.publish("something:happened", { ... })
and
bespin.subscribe("something:happened", function () { ... })
The change I made is that you can now do the same thing in workers and it will Do The Right Thing (tm). Meaning you can subscribe to events that might be triggered by the UI or anywhere else from within the worker and also publish events that might be observed by handlers which live in the "main page". Overall this makes working with workers much more seamless and first class from a bespin perspective because it means that as long as one is not doing any direct UI work (as opposed to sending events to UI components) one can do everything from the worker without building custom interfaces.

For me working on bespin is really an experiment on how to design an event driven client side web applications (or postmodern web application). One of the things that felt kinda awkward up until now was how to know the right point in time to initialize a particular component. You might want to wait until the settings have been loaded (asynchronously via Ajax) and another component has been initialized. All these events are, of course, signaled by custom events but the order might be totally random and they might have already happened when we start looking for them. The solution I came up with (whether it is a good one remains to be seen) is to have a function that checks a list of events against all events that have fired already and then waits for the rest of the events to fire and finally calls a callback when this is done:
    // ** {{{ assertEvents }}} **
//
// Given an array of topics, fires given callback as soon as all of the topics have
// fired at least once
assertEvents: function (topics, callback) {
var count = topics.length;
var done = function () {
if(count == 0) {
callback();
}
};
for(var i = 0; i < topics.length; ++i) {
var topic = topics[i];
if (this.eventLog[topic]) {
--count;
} else {
bespin.subscribe(topic, function () {
--count;
done()
})
}
done();
}
},

Saturday, March 7, 2009

Offloading "arbitrary" JS Objects to Workers / Async drawing for bespin

While doing the experiment to marry Joose and bespin I got sucked directly into bespin development. I started digging into ways to offload part of bespin's painting work into web workers. Web workers are independent JavaScript processes which communicate with the primary page using message events. Because they run in different threads they don't block the UI of the web page and can thus safely do complex calculations without hampering it's responsiveness.

Offloading arbitrary objects into workers
I started with a prototype to offload the work bespin is doing to do syntax highlighting into workers. bespin currently cheats by only looking at the lines which are actually visible (many editors do this). This strategy is fast but might lead to errors when there is not enough information within the currently visible lines.
I'm a lazy person and I had some free time so I decided to build a more general solution which could be used to put all kinds of things into workers. Thus I build a framework that creates a facade for any given JS object that acts like the original object but delegates all work to another objects which actually has all the functions and state of the original object but lives inside a worker. Clear? :)


Turning the whole syntax highlighting framework of bespin into a system that runs in the background is now as easy as this:
this.syntaxModel = new bespin.worker.WorkerFacade(new bespin.syntax.Model());

Of course, there is a little more to it:
There are some important limitations for the objects which can be passed to workers:
  • The objects may not reference DOM nodes (because workers have no access to them)
  • The objects may not have circular references (this restriction could be lifted)
  • And most importantly: None of the functions may be closures. This might seem harsher than it is. It basically means that you need to maintain all state inside the object.
Also, all method calls on the facade will be async. I chose to do a jQuery-style fluid-interface for registering callbacks.

Aside: The Web Worker API
Creating a web worker is as easy as calling new Worker("myWorker.js"). The WorkerPool implementation is Google Gears also included an API to create Workers from a string of JavaScript source. Unfortunately this part did not make it into the official spec. My system needs this feature, so I tried a couple of work arounds. First I tried to put the source into a data URI and use that. I found out that this was even suggested when discussing the spec. Unfortunately at least Safari 4 does not seem to support data URIs for workers. (Should this be reported as a bug to the WebKit team?) The next solution I came up with was to use a small bootstrapping worker that loads the source from the hash-part of its URI. This works well but might have security implications.

Building the Facade
The actual facade object is basically a copy of the original object with all methods exchanged with methods to call into the background worker. The source is quite scary. If you, like me, like source code, please enjoy, otherwise no need to read it:
createFacade: function (obj) {

var facade = this;

for(var prop in obj) {
if(prop.charAt(0) != "_") { // supposedly we dont need "private" methods. Delete if assumption is wrong
(function () { // make a lexical scope
var val = obj[prop];
var method = prop
if(typeof val == "function") { // functions are replaced with code to call the worker
facade[prop] = function () {
var self = this;
var index = CALL_INDEX++ // each call gets a globally unique index
var paras = Array.prototype.slice.call(arguments);
if(this.__hasWorkers__) {
var data = {
callIndex: index,
method: method,
paras: paras
}
if(!USE_GEARS) data = dojo.toJson(data) // here we should really test whether our postMessage supports structured data. Safari 4 does not
// send the method to a worker
this.__getWorker__().postMessage(data)
} else {
// No worker implementation available. Use an async call using
// setTimeout instead
var self = this;
window.setTimeout(function () {
var retVal = self.__obj__[method].apply(self.__obj__, paras);
var callback = self.__callbacks__[index];
delete self.__callbacks__[index]
if(callback) {
callback(retVal)
}
}, 0)
}
// Return an object to create a "fluid-interface" style callback generator
// callback will be applied against context
// callback will be part of the mutex
// paras is an array of extra paras for the callback
return {
and: function (context, mutex, paras, callback) {
var func = callback
if(mutex instanceof bespin.worker.Mutex) {
mutex.start()
func = function () {
callback.apply(this, arguments)
mutex.stop()
}
}

self.__callbacks__[index] = function () {
paras = Array.prototype.slice.call(arguments).concat(paras)
func.apply(context, paras)
}
}
}
}
}
else {
// put instance vars here, too?
}
})()
}
}
},
Inside the source you see this:
this.__getWorker__()
By encapsulating access to the actual worker behind the facade it is possible to put multiple workers behind one object. I currently implemented a simple round robin scheduling, but this could of course be extended to use a more sophisticated mechanism (e.g. picking any currently idle worker) (Beware that multiple workers only work reliably for "stateless" objects).

Gears WorkerPool VS. Web Workers
The Gears WorkerPool API predates the Web Workers API. It is very likely that the Gears API will eventually change to more closely follow the W3C API. For now I implemented a simple facade for the Gears API to make it look like the Web Worker API.
This works quite well with some limitations:
  • DOM Message Events are currently not supported (Support could be added)
  • The current postMessage interface which is used to communicate with workers does not support structured data. My implementation uses Gear's native support for structured data and uses JSON for standard workers
Again here is the source for the facade if your care:
// If there is no Worker API (http://www.whatwg.org/specs/web-workers/current-work/) yet,
// try to build one using Google Gears API
if(typeof Worker == "undefined") {
BespinGearsInitializeGears() // this functions initializes Gears only if we need it
if(window.google && google.gears) {
USE_GEARS = true; // OK, gears is here

var wp = google.gears.factory.create('beta.workerpool');
var workers = {};
Worker = function (uri) { // The worker class
this.isGears = true;

// We can pass the source directly. So we decode the source ourselves.
// To make this more general purpose we could of course also load the
// actual JS file.
var source = uriDecodeSource(uri)

this.id = wp.createWorker(source)
workers[this.id] = this;
}

Worker.prototype = { // we can post messages to the worker
postMessage: function (data) {
wp.sendMessage(data, this.id)
}
}

// upon receiving a message we call our onmessage callback
// DOM-Message-Events are not supported
wp.onmessage = function (a, b, message) {
var worker = workers[message.sender];
var cb = worker.onmessage;
if(cb) {
cb.call(worker, {
data: message.body
})
}
}
}
}

Mutexes / Semaphores / etc.
Once you do real multi-process programming with JavaScript you need to start to worry about some of the issues that come with programming parallel systems (not all of them, because workers are more like processes so there are no issues related to pure multi-threading).
Bespin has a quite complicated paint() method which does the actual drawing of the editing area using a canvas-tag. This paint method calls the syntax highlighter for every line that is draws upon every paint. Because all those calls happen asynchronously, potentially in parallel and in undefined order, I needed a mechanism which basically says: Execute this code right after all those async calls finished.
My current solutions is to have an object which maintains a counter of method calls belonging to the same group. Starting an async method increments the counter, calling the method's callback decrements it. You can schedule yet more callbacks which execute once the counter reaches zero again (when all async method calls are finished). I called the class Mutex which is probably wrong but at least on topic.
Even more source for the interested:
//** {{{ bespin.worker.Mutex }}} **
//
// Object that maintains a counter of running workers/async processes.
// Calling after(callback) schedules a function to be called until all
// async processes are finished
//
// Is Mutex the correct term?
dojo.declare("bespin.worker.Mutex", null, {
constructor: function(name, options) {
this.name = name;
this.count = 0;
this.afterJobs = [];
this.options = options || {}
},
start: function () {
this.count = this.count + 1
},
stop: function () {
this.count = this.count - 1
if(this.count == 0) {
if(this.options.onlyLast) {
var last = this.afterJobs[this.afterJobs.length-1];
if(last) {
last()
}
} else {
for(var i = 0; i < this.afterJobs.length; ++i) {
var job = this.afterJobs[i];
job()
}
}
this.afterJobs = []
}
},
after: function (context, func) {
this.afterJobs.push(function () {
func.call(context)
})
}
})

Conclusion
While my current patch will not make it into bespin (because the current syntax highlighter is so fast that the overhead of the Worker is larger than the win) I still think that the approach above has a lot of potential (e.g. analyzing more lines than the currently visible ones). "Transparently" moving objects into workers while for the most part maintaining the original interface at least gives JavaScript a nice Erlang touch (as @psvensson put it).

Update:
I should clarify that it is no problem for functions to pass the worker-bounday which generate closures. Non of the functions which are already present inside the object which is sent to a worker may include a closure, though.