Broken Links

tbray.org | Feb 9th 2011

I’ve been getting madder and madder about the increasing use of dorky web links; for example, twitter.com/timbray has become twitter.com/#!/timbray. Others have too; see Breaking the Web with hash-bangs and Going Postel. It dawns on me that a word of explanation might be in order for those who normally don’t worry about all the bits and pieces lurking inside a Web address.

How It Works · Suppose I point my browser at http://twitter.com/timbray. What happens is:

The browser connects to twitter.com over the Internet and sends a query whose payload is the string /timbray.
Twitter’s server knows what /timbray means and sends back the HTML which represents my tweetstream.
The browser soaks up that HTML and displays it. The HTML will contain links to all sorts of graphics and chunks of Javascript, which the browser uses to decorate and enhance the display.

On the other hand, when I point the browser at http://twitter.com/#!/timbray, here’s what happens:

The browser pulls apart that address, saving the part after the “#”, namely !/timbray, locally. This part is technically called the “fragment identifier”, but let’s say “hashbang”.
It connects to Twitter’twitter.com and sends a more-or-less empty query, because the address was all hashbang.
The server doesn’t know whose tweetstream is being asked for because that information was in the hashbang. But what it does send includes a link to a (probably large and complex) chunk of JavaScript.
The browser fetches the JavaScript and runs that code.
The JavaScript fishes the hashbang (!/timbray, remember?) out of browser memory and, based on its value, pulls down a bunch more bits and pieces from the server and constructs something that looks like my tweetstream.

How To Be Found · Web pages are retrieved by lots of things that aren’t browsers. The most obvious examples are search-engine crawlers like Google’s. These things also peel off the hashbang, and since they typically don’t run JavaScript, if you’re not careful, your site just vanished from the world’s search engines.

It turns out that you can arrange to do this and still have a way to be indexed and searched; Google provides instructions. I would describe this process as a hideous kludge; not because the Googlers who cooked it up went off the road, but because the misguided rocket scientists drove these links so far off the road that there wasn’t an unkludgey way back onto it.

Contracts · Before events took this bad turn, the contract represented by a link was simple: “Here’s a string, send it off to a server and the server will figure out what it identifies and send you back a representation.” Now it’s along the lines of: “Here’s a string, save the hashbang, send the rest to the server, and rely on being able to run the code the server sends you to use the hashbang to generate the representation.”

Do I need to explain why this is less robust and flexible? This is what we call “tight coupling” and I thought that anyone with a Computer Science degree ought to have been taught to avoid it.

History · There was a time when there was a Web but there weren’t search engines. As you might imagine, the Web was much less useful. Because everything was tied together using simple links, when people first had the idea of crawling and indexing, there was nothing getting in the way of building the first search engines. I know because I was one of the people who first had those ideas and built one.

So, my question is: what is the next great Web app that nobody’s built yet that depends on the simple link-identifies-a-resource paradigm, but that now we won’t be seeing?

Why? · There is no piece of dynamic AJAXy magic that requires beating the Web to a bloody pulp with a sharp-edged hashbang. Please stop doing it.

Original Page: http://www.tbray.org/ongoing/When/201x/2011/02/09/Hash-Blecch#c1297361288.84169

Shared from Read It Later

Sent from my iPad | ross | ross@button.ca