Nerdy tidbits from my life as a software engineer

Wednesday, July 15, 2009

Why The Web Will Always Be Second Best

For all of the euphoria surrounding the exciting things coming out on the Internet these days, I think it’s important to remind ourselves of the limitations that web technologies naturally imposes on us.

All executed code needs to be compiled into machine code before it can be executed on the local machine, and the process looks like this:


Now I don’t profess to be an expert on compilers, but I know enough to draw this conclusion on JavaScript: it will never be as fast as native or intermediate code.  And the reason why is because in order to execute a super-heavy JavaScript library, everything you see above needs to happen as soon as you open a webpage.

This may not seem like a big deal, but remember that text parsing is actually incredibly slow.  For the uninitiated, the process of converting source code into a recognizable stream of tokens (IE keywords such as “int” and “class”) is done via regular expressions.  Regular expression matching is very time consuming, and there is quite simply no way around this.  Perhaps the algorithm can be sped up to some degree, but its complexity cannot be reduced: for a given text file of N characters, each of them needs to be scanned from start to finish – and this takes O(N) time.

The parser then takes this stream of tokens and converts it into a syntax tree, which can then be converted into native code (or evaluated, in the case of JavaScript).  If we imagine that an entire program can be converted into one big tree of some unknown height, we can conclude that the complexity of parsing this tree and executing it is equal to the it’s height.  Or, roughly O(log(N)), for some base log that I can only estimate.

I’m just guesstimating here, but I imagine that the total complexity of compiling an application is about O(N Log(N)), which makes it roughly equal with the complexity of quick sort.  So in addition to downloading an entire JavaScript application – in its verbose, text-based, precompiled form – a JavaScript application needs to go through the overhead above.  For small applications, this additional overhead is just about negligible.  But as applications grow larger and larger, it will become more and more pronounced.  In the end, it will end up being the largest barrier that prevents JavaScript from becoming the language of choice for highly-featured web-based applications.

Keep in mind that the largest JavaScript applications on the Internet are a few megabytes or so in size.  Loading and running these applications right now, while fast, still takes noticeable time (look at the progress bar on GMail, for instance).  But if you consider that most large commercial applications consist of many tens of millions of lines of code which take up many gigabytes of space and take many hours to compile, you can start to see the natural limitations of JavaScript.  A JavaScript application of that size, while perhaps theoretically possible, would take so long to load that it wouldn’t be usable – no matter what tricks you use to speed it up.

You may think that this is a limitation that will go away over time as new technologies and   techniques arrive that speed things up.  But years from now, future applications will be even larger than they are today.  So even if JavaScript applications can eventually catch up with today’s desktop applications, the bar will rise, our standards will increase, and today’s applications will look puny by tomorrow’s standards.  Of course, there are technologies emerging that speed JavaScript up significantly, including many within Microsoft.  These are exciting and will no doubt increase the limit of what can be done in a browser.  But ultimately, no advancement will ever bring the two environments on par with each other, because the complexity of compiling or interpreting on the fly is a constant that cannot be reduced.

I think it’s time we recognize JavaScript for what it is: a scripting language that is being used for purposes beyond what it was conceived for.  If we really want rich applications to be delivered over the internet and hosted in a web browser, we will need to think of a better technology for doing so.


Dr Loser said...

Interesting that nobody commented on this, Michael.

There are two ways to do regexps -- the wrong way (almost universal, thanks to early versions of Perl) and the right way.

Then there are ways to lex and parse a language. These are pretty straight-forward, although not entirely solved. It's difficult to imagine a lexer that isn't based upon regular expressions -- please feel free to come up with one.

The fact that regular expressions are, at best, O(n log n), is typically irrelevant. You can have a token that's (say) 256 characters long. So what? With a decent FSM, that's invisible.

Unless you're using back-tracking. Which is a mistake.

The Web will always be second best because it's a disgusting an uncontrollable interface; not because of anything else.

It will survive because "worse is better." QV Unix, an appallingly simplistic OS, and Linux, a retarded student playaround derivative.

And NT 3.51, a perfectly decent derivative of VMS (possibly the worst of the three Multics derivatives -- I prefer VOS), turned into a badly-designed and uncontrollable GUI freak-show. Missing all the original bits of NT.

That's life for you.

Michael J. Braude said...

@DrLoser - People did respond to this, but since I ported these posts over from MSDN this morning, the comments couldn't come too. Bummer. You can check out the original post & comments here: