A Challenge

Over here on his blog, Thomas has this “neat little HTML encoding trick in JavaScript”:

I came across a really neat trick in javascript lately that actually goes ahead and kind of cheats in a very effective manor!

Here’s the problem with javascript, everytime you send information from a form to a server side script, such as with AJAX, with HTML brackets it will return an error.  You can either manually fix this, or use a really neat trick called “escape”

 function escapeHTMLEncode(str) {
     var div = document.createElement(’div’);
     var text = document.createTextNode(str);
     div.appendChild(text);
     return div.innerHTML;
 }

What this simple little function does is take the internal HTML conversion code from your browser and returns a string converted to HTML.  It’s an awsome trick with how simple it is.

See, you have to note that whenever a browser creates an element in javascript, and a text node is created, the browser will go ahead and make sure that string comes out as raw code, and not as HTML, thus the term textnode and not innerHTML.

Now if they only had a way to reverse it that was that easy. lol

If only. Well, I’m here for ya buddy. Yeah, turns out that browsers have a nifty little html escape/unescape machine built in, and we can harness that built-in power.

Sample HTML Conversions
< .......... &lt;
> .......... &gt;
& .......... &amp;

I spent some time in Firebug playing with innerHTML. Stuff a string into innerHTML, and angle brackets get turned into their HMTL codes in the innerText field.

Like Magic

Now stuff HTML codes into innerText and *POP*, out it come angle brackets in the innerHTML field. Fun! I took a bit more direct route and managed to turn this trick both ways.

    escapeHTMLEncode=function(str) {
        var div=document.createElement('div');
        div.textContent=div.innerText=str;
        return div.innerHTML.replace(/>/g,"&gt;");
    }

    unescapeHTMLDecode=function(str) {
        var div=document.createElement('div');
	div.innerHTML=str;
	return div.textContent || div.innerText;
    }

Safari Manages to Piss Me Off

Complications? I don’t need to tell you that IE causes trouble, do I? This time it’s by using the field textContent rather than innerText. We handle that in the escape by just creating both fields. What does it hurt? This is a phantom div that will never be attached to a page anyway. In the unescape we || them together. One of them will be undefined and || gives us the one that’s not.

IE always causes problems, so no surprise there. But this time Safari offends as well by leaving “>” unconverted. This is mystifying to me. I did some searching and sure enough, that quirk has pissed off some developers. We convert it by hand with a replace.

This is all loads of fun (I’m dancing in my chair from the thrill of it all), but it shows that any time you rely on the browser to do something, you need to remember that you’re relying on four major (and untold minor) browsers to carry a load. In this case, probably safest to make the conversions yourself with a string of chained replace calls.

True story.

Yeah, I can solve Rubik’s Cube. For some reason, my children are amazed by this.