Tuesday, June 23, 2009

The Internet is broken

For a couple years now we are breaking the internet. When popular short URL providers like tinyurl, bit.ly or tr.im have a downtime, loose a database or even go out of business, we all are using massive amounts of connections through the World Wide Web.

The way to fix this would be to do away with the need for short URL providers. Until then I think we should start archiving all short URLs and their mappings to long URLs so that we can still resolve them when the original provider can no longer do it.

For this purpose I created a little Google App Engine app permanize.org.
It resolves short URLs and stores the mapping. Upon the next resolve it will use its own database to do the resolving. You can also download the database to distribute the network (a real P2P architecture would be cool here) and make the archiving more save.
I also created a very experimental Firefox plugin with the ultra-cool Mozilla Jetpack that replaces short-urls with long urls (only the href values of links for now) and automatically archives all short URLs that you visit with the browser. To install it, download Jetpack and then go to the permanize-site.

Monday, June 15, 2009

Scalable PubSub with Node.js

I made some extentions to my server sided pub sub implementation based on Node.js.

When a client establishes a comet connections to listen for events published by the server it in turn receives the internal URL of the Node that it it attached to.
Now when the client publishes an event to any server (or actually a different node process running on a particular machine) and the server in turn publishes an event to client, the node that received the event relays the response to the correct Node via that URL which then uses the open comet connection to send the message to the client.

Overall this implements a solution to do PubSub between client and server without the need for stateful load balancing.

Here is the changeset for the interested.

Sunday, June 7, 2009

Server Sided PubSub via Node.js

A couple days ago ry released the very exciting JavaScript web server Node.js which is purely built on event-based IO using v8 (Btw: Did anyone notice that this blog is now hosted under the domain nonblocking.io :).
From a JS perspective this means building scalable comet application is as easy as not sending a response immediately but rather doing that in a callback function whenever an event might happen. As long as there is only a single node, writing an event queue is also easy. Just use an object that maps queue-ids to listeners.
To give Node.js a try I ported the work I did to allow servers to subscribe to custom events on the client. It turned out to be really straight forward. Here is the result.

Saturday, June 6, 2009

Saving the environment and fixing IE6 issues with Omniture tracking

One of the bugs in internet explorer JavaScript engine that can really ruin your day if it bites you is the fact that string performance slows down exponentially with respect to the number of operations.
One of the evil parts of this bug is that because it only starts really being bad once you go above a certain threshold of operations. This means that the part that is being slowed down might not be the part to fix, but rather just happens to be just after your application passed the threshold. This is why the the behavior of Omniture's client side tracking code that I will describe might never happen to your site. But it can happen at any time when you add more JavaScript code and if it happens we are seeing that loading the Omniture tracking starts taking up to 25 seconds (yes 25 seconds, aka an eternity) in Internet Explorer 6.

Saving the environment
Now you might not care about performance issues in Internet Explorer 6 because Internet Explorer 6 should die, but there are other issues to consider, too:
Every time a visitor does a page view on a site that is using Omniture for tracking, the Omniture code runs through the following steps:
  1. Eval the script
  2. Deobfuscation Part 1 (The deobfuscation operations include multiple substitutions and shuffling of parts of the string using a "key")
  3. Deobfuscation Part 2
  4. Eval of the result of the deobfuscation.
Steps 1-3 are totally unneccessary, all they do is slow down every page load and waste energy doing operations that add absolutely no value to your site. Now the good thing is, that there is an easy fix:
Obfuscating JavaScript code is obvisously a very futile endeavour because your script will eventually have to create regular, executable JavaScript. This is good because all we have to do is to take the JavaScript just before it is being evaled and use that as the Omniture tracking code. Thus eliminating steps 1-3 and saving a little bit of energy upon every page load :)

Now it is not just that easy, there is one more catch: The guys that created the omniture obfuscation code tried to be smart and to make our lifes harder. Step 3 of the deobfuscations add bullshit characters like \0 to the script which can be passed to string-eval but which cannot exist in regular JavaScript files. There might be other ways around this, but I took the easy path: Instead of directly pasting the output of step 3 into my page I used the standard escape() function to escape the string. The resulting string can then be passed to unescape() to create the real thing which can then be evaled.

This is how you can fix your tracking code until Omniture releases a fix for the issue (Works with H.19.3 but should work in later versions):
  1. Somewhere in your tracking code there is a part saying: c = s_d(c)
  2. The function might be called something else but the name should end in "_d"
  3. Add a JavaScript statement here that says something like console.log(escape(s_d(c)));
  4. Copy the result of the console.log to your clip board
  5. Substitude the statement c = s_d(c) with c = unescape("PASTE_FROM_CLIPBOARD")
We tried to contact Omniture about this issue but they haven't responded to our paid support inquiry in weeks. Meanwhile their twitter account is much more responsive, but couldn't help us either.

The fix is running without issues on on of our customer's sites thus saving 30 million deobfuscations per month already. Writing this blog article took about 4 times as long as figuring out the deobfuscation itself. The competitive advantage that might be gained with the extra "security" mechanism is thus only worth a couple of minutes.