Saturday, March 14, 2009

On-the-fly Syntax Checker and Code Outline View for Bespin

I continued to work on bespin and submitted my first complex patch for review. The patch builds on my work about moving JS objects into web workers and implements syntax checking as well as a simple code outline view for JavaScript. All these features are automatically disabled when you do not have access to web workers because syntax checking is too complex to be performed inside the UI thread.

The outline view is activated by typing the outline command:

The functions names are clickable and move the cursor to the line where the function is defined.

Syntax errors are displayed as short info on the bottom of the screen:

It might be possible to underline the relevant code, but I need to dive deeper into bespin to do this (plus get more info out of the JS parser).

While working on the code I made some discoveries about Gears and Web Workers:
  • Apparently you are not allowed to define a constant called Block inside a Gears worker
  • Safari 4 currently does not support any way to load source code into a worker (both importScripts and XMLHttpRequest are not implemented). For now I pasted all source into a bootstrap-script. Another possible solution would be to send a message to the main page asking it to make the http request and send back the result.
  • I removed the ugly hack to send source to the worker inside the hash part of the url and instead send the source to the worker via postMessage immediately after loading.
I tested the system in Safari 4, FF 3.1 beta and FF 3.0.x with Google Gears. The current patch only supports syntax checking for JavaScript. Other languages could be added in a similar fashion to language-dependend syntax highlighters, though.

Tuesday, March 10, 2009

bespin learns JavaScript

Today I hooked up bespin to Brendan Eich's JavaScript parser written in JavaScript.
Some evil recursive tree wrestling later, we can now find out the names, line and column numbers of all functions in a JS document. I also added some logic, to look up the tree for anonymous functions to see whether they were declared in an object literal, so we can use the key as the inferred name.

The parse tree could be used for all kinds of interesting things, like outlines and even IntelliSense(TM)(R). JS being a very dynamic language this, however, only goes so far. In particular systems like Joose or even dojo's simplistic object system do not go so well with static analysis.

As a next step, I'll move the parser to a web worker (I designed the API to be async, so this should be easy) and implement a simple outline view.

Monday, March 9, 2009

bespin extensibility / auto-indenting

bespin allows you to extend itself with JavaScript. Obviously this JavaScript is edited with bespin itself (I love StrangeLoops(TM)).
The easiest way to do this is to edit the config.js file within the standard BespinSettings project. Here is a sample that enables indenting after line breaks. When you hit enter the next line will have the same leading white space as the current line and when you hit enter after an opening { it will add an extra two spaces (Note that because this is my personal config.js I don't have to worry about configurable tab width :)
bespin.publish("bespin:editor:bindkey", {
key: "ENTER",
action: function (args) {
console.log("Pressed enter")
var editor = bespin.get("editor");
editor.ui.actions.newline(args);
var line = editor.model.getRowArray(args.modelPos.row).join("");
var match = line.match(/^(\s*)/)
var leadingWs = match[1];
var chunk = leadingWs;
var newBlock = line.match(/{\s*$/) ? true : false;

if(newBlock) chunk += " "

args.chunk = chunk;
editor.ui.actions.insertChunk(args)
}
})
Note1: More Info on the bespin custom events
Note2: For this to work you may need to enable the setting 'autoconfig': 'on' first
Note3: Seems like there currently is no easy way to say "In this case ignore me" for an event key event handler which would make it easy to do the right thing in case of an active selection.

Saturday, March 7, 2009

Offloading "arbitrary" JS Objects to Workers / Async drawing for bespin

While doing the experiment to marry Joose and bespin I got sucked directly into bespin development. I started digging into ways to offload part of bespin's painting work into web workers. Web workers are independent JavaScript processes which communicate with the primary page using message events. Because they run in different threads they don't block the UI of the web page and can thus safely do complex calculations without hampering it's responsiveness.

Offloading arbitrary objects into workers
I started with a prototype to offload the work bespin is doing to do syntax highlighting into workers. bespin currently cheats by only looking at the lines which are actually visible (many editors do this). This strategy is fast but might lead to errors when there is not enough information within the currently visible lines.
I'm a lazy person and I had some free time so I decided to build a more general solution which could be used to put all kinds of things into workers. Thus I build a framework that creates a facade for any given JS object that acts like the original object but delegates all work to another objects which actually has all the functions and state of the original object but lives inside a worker. Clear? :)


Turning the whole syntax highlighting framework of bespin into a system that runs in the background is now as easy as this:
this.syntaxModel = new bespin.worker.WorkerFacade(new bespin.syntax.Model());

Of course, there is a little more to it:
There are some important limitations for the objects which can be passed to workers:
  • The objects may not reference DOM nodes (because workers have no access to them)
  • The objects may not have circular references (this restriction could be lifted)
  • And most importantly: None of the functions may be closures. This might seem harsher than it is. It basically means that you need to maintain all state inside the object.
Also, all method calls on the facade will be async. I chose to do a jQuery-style fluid-interface for registering callbacks.

Aside: The Web Worker API
Creating a web worker is as easy as calling new Worker("myWorker.js"). The WorkerPool implementation is Google Gears also included an API to create Workers from a string of JavaScript source. Unfortunately this part did not make it into the official spec. My system needs this feature, so I tried a couple of work arounds. First I tried to put the source into a data URI and use that. I found out that this was even suggested when discussing the spec. Unfortunately at least Safari 4 does not seem to support data URIs for workers. (Should this be reported as a bug to the WebKit team?) The next solution I came up with was to use a small bootstrapping worker that loads the source from the hash-part of its URI. This works well but might have security implications.

Building the Facade
The actual facade object is basically a copy of the original object with all methods exchanged with methods to call into the background worker. The source is quite scary. If you, like me, like source code, please enjoy, otherwise no need to read it:
createFacade: function (obj) {

var facade = this;

for(var prop in obj) {
if(prop.charAt(0) != "_") { // supposedly we dont need "private" methods. Delete if assumption is wrong
(function () { // make a lexical scope
var val = obj[prop];
var method = prop
if(typeof val == "function") { // functions are replaced with code to call the worker
facade[prop] = function () {
var self = this;
var index = CALL_INDEX++ // each call gets a globally unique index
var paras = Array.prototype.slice.call(arguments);
if(this.__hasWorkers__) {
var data = {
callIndex: index,
method: method,
paras: paras
}
if(!USE_GEARS) data = dojo.toJson(data) // here we should really test whether our postMessage supports structured data. Safari 4 does not
// send the method to a worker
this.__getWorker__().postMessage(data)
} else {
// No worker implementation available. Use an async call using
// setTimeout instead
var self = this;
window.setTimeout(function () {
var retVal = self.__obj__[method].apply(self.__obj__, paras);
var callback = self.__callbacks__[index];
delete self.__callbacks__[index]
if(callback) {
callback(retVal)
}
}, 0)
}
// Return an object to create a "fluid-interface" style callback generator
// callback will be applied against context
// callback will be part of the mutex
// paras is an array of extra paras for the callback
return {
and: function (context, mutex, paras, callback) {
var func = callback
if(mutex instanceof bespin.worker.Mutex) {
mutex.start()
func = function () {
callback.apply(this, arguments)
mutex.stop()
}
}

self.__callbacks__[index] = function () {
paras = Array.prototype.slice.call(arguments).concat(paras)
func.apply(context, paras)
}
}
}
}
}
else {
// put instance vars here, too?
}
})()
}
}
},
Inside the source you see this:
this.__getWorker__()
By encapsulating access to the actual worker behind the facade it is possible to put multiple workers behind one object. I currently implemented a simple round robin scheduling, but this could of course be extended to use a more sophisticated mechanism (e.g. picking any currently idle worker) (Beware that multiple workers only work reliably for "stateless" objects).

Gears WorkerPool VS. Web Workers
The Gears WorkerPool API predates the Web Workers API. It is very likely that the Gears API will eventually change to more closely follow the W3C API. For now I implemented a simple facade for the Gears API to make it look like the Web Worker API.
This works quite well with some limitations:
  • DOM Message Events are currently not supported (Support could be added)
  • The current postMessage interface which is used to communicate with workers does not support structured data. My implementation uses Gear's native support for structured data and uses JSON for standard workers
Again here is the source for the facade if your care:
// If there is no Worker API (http://www.whatwg.org/specs/web-workers/current-work/) yet,
// try to build one using Google Gears API
if(typeof Worker == "undefined") {
BespinGearsInitializeGears() // this functions initializes Gears only if we need it
if(window.google && google.gears) {
USE_GEARS = true; // OK, gears is here

var wp = google.gears.factory.create('beta.workerpool');
var workers = {};
Worker = function (uri) { // The worker class
this.isGears = true;

// We can pass the source directly. So we decode the source ourselves.
// To make this more general purpose we could of course also load the
// actual JS file.
var source = uriDecodeSource(uri)

this.id = wp.createWorker(source)
workers[this.id] = this;
}

Worker.prototype = { // we can post messages to the worker
postMessage: function (data) {
wp.sendMessage(data, this.id)
}
}

// upon receiving a message we call our onmessage callback
// DOM-Message-Events are not supported
wp.onmessage = function (a, b, message) {
var worker = workers[message.sender];
var cb = worker.onmessage;
if(cb) {
cb.call(worker, {
data: message.body
})
}
}
}
}

Mutexes / Semaphores / etc.
Once you do real multi-process programming with JavaScript you need to start to worry about some of the issues that come with programming parallel systems (not all of them, because workers are more like processes so there are no issues related to pure multi-threading).
Bespin has a quite complicated paint() method which does the actual drawing of the editing area using a canvas-tag. This paint method calls the syntax highlighter for every line that is draws upon every paint. Because all those calls happen asynchronously, potentially in parallel and in undefined order, I needed a mechanism which basically says: Execute this code right after all those async calls finished.
My current solutions is to have an object which maintains a counter of method calls belonging to the same group. Starting an async method increments the counter, calling the method's callback decrements it. You can schedule yet more callbacks which execute once the counter reaches zero again (when all async method calls are finished). I called the class Mutex which is probably wrong but at least on topic.
Even more source for the interested:
//** {{{ bespin.worker.Mutex }}} **
//
// Object that maintains a counter of running workers/async processes.
// Calling after(callback) schedules a function to be called until all
// async processes are finished
//
// Is Mutex the correct term?
dojo.declare("bespin.worker.Mutex", null, {
constructor: function(name, options) {
this.name = name;
this.count = 0;
this.afterJobs = [];
this.options = options || {}
},
start: function () {
this.count = this.count + 1
},
stop: function () {
this.count = this.count - 1
if(this.count == 0) {
if(this.options.onlyLast) {
var last = this.afterJobs[this.afterJobs.length-1];
if(last) {
last()
}
} else {
for(var i = 0; i < this.afterJobs.length; ++i) {
var job = this.afterJobs[i];
job()
}
}
this.afterJobs = []
}
},
after: function (context, func) {
this.afterJobs.push(function () {
func.call(context)
})
}
})

Conclusion
While my current patch will not make it into bespin (because the current syntax highlighter is so fast that the overhead of the Worker is larger than the win) I still think that the approach above has a lot of potential (e.g. analyzing more lines than the currently visible ones). "Transparently" moving objects into workers while for the most part maintaining the original interface at least gives JavaScript a nice Erlang touch (as @psvensson put it).

Update:
I should clarify that it is no problem for functions to pass the worker-bounday which generate closures. Non of the functions which are already present inside the object which is sent to a worker may include a closure, though.

Wednesday, March 4, 2009

Closures VS. Properties / arguments.callee is expensive

I did some quick benchmarking to test whether it makes a difference to substitute a closure with a function that has a property (sometimes called inside-out-objects). Both are basically the same thing, only that the function property is mutable from the outside while access to the bound lexical variables of the closure is private to the closure.

Here is the source for two simple "getter-makers" that use the different strategies:
var makeGetterProperty = function (name) {
var func = function () {
return this[arguments.callee.__prop_name__]
}
func.__prop_name__ = name;
return func
}

var makeGetterClosure = function (name) {
return function () {
return this[name]
}
}
I timed both execution time of the "make" functions and of the actual getters. Making the getters is almost equally fast in both version; however executing the actual getters differs by a rate of about 150. The closure variant is significantly faster. (All these numbers are only worth their money in FF3.0.x on OSX). Interestingly turning the property variant into a closure that references itself to access the name of the property makes them both almost equally fast.

Conclusion: If you need to build inside-out-objects without closures (e.g. because you need to serialize the function state) that use self-referencing functions instead of accessing arguments.callee.

Monday, March 2, 2009

VisualWorks for JavaScript: Using Bespin to edit Joose components

When I first saw Bespin, the new "code-in-the-cloud" editor that was created by Dion and Ben at Mozilla, and it's multi-pane editor, ideas kept popping up in my head that this might be the ideal platform to complete my ultimate secret masterplan: Recreate VisualWorks for Smalltalk for JavaScript.

In VisualWorks one navigated (some people actually still do this in present tense) the loaded classes and the components of classes like methods and instance variables by progressively clicking through multiple panes. As soon as one reached an item that could be edited in source the edit view switched to that item. Saving after editing an item would directly compile the new source and put it into the running system.

Don't get me started on the debugger, which was simply awesome, and in my opinion no IDE has ever reached its elegance.

Anyway, back to the secret masterplan. Bespin also has a multi-pane view to navigate files which makes it the perfect candidate to act as a platform for VisualWorks for Joose :)

The goals for this platform would be:
  • Allow editing of loaded Joose components
  • Allow creating classes, methods, instance vars, etc. using some kind of UI
  • Saving a change should update the loaded class. One could, thus, edit the loaded application while it is running (This is the most important feature for VisualWorks-feeling)
  • Persisting changes to the server would be nice, too :)
I started work on a prototype. After the first 4 hours of hacking the system allows navigating loaded Joose classes using the multi-pane view and loading method source code into the edit-view:




To make this actually work, there are, however, still many challenges to overcome:
  • The whole persistence part is practially unsolvable with the current state of Joose (However, Nickolay is working on this problem)
  • Bespin is switching to a different "physical" URL to edit a document. This isnt really cool, if you want to be able to edit stuff that is loaded on your current page.
The architecture would look as follows:
  • Joose objects created using the Bespin editor would have an extra property to store the actual source that was used for their creation.
  • One could also create extra properties to store stuff like method level documentation, etc.
  • When "saving" we actually serialize all loaded Joose classes to a text representation. The text representation will probably not look like a classic Joose class but rather we a sequence of statements like MyClass.meta.addMethod("test", function () { return "test" })
  • Loading loads that representation and then uses reflection to show the actual elements.