Sentinels in JavaScript

Can you find the substring?

Can you find the substring?

In older procedural languages, the return values that came back from a function were restricted. If you said you were going to return a number, you returned a number. If you wanted to sometimes return a number, but other times return an indication of failure, you resorted to what is known as a “sentinel value” to return the failure.

A sentinel value is a number that doesn’t represent the answer to the problem that the function was asked to solve, but instead flags the caller to the fact that something rather out-of-the-ordinary has happened. For example, a value of -1 can indicate the end of a file being read in.

The problem with a sentinel is that it means nothing special to the language. The programmer has to keep in mind that the sentinel exists and that it has to be handled on every call that could possibly generate the sentinel value. Programmers are notorious for paying attention to and handling sentinels only when they crash a program.

Also, the programmer has to be careful in choosing the value. Is it 0? -1? 99? 255? 999? -999? MAXINT? The choice can bite you if you’ve misunderstood the possible values that can be generated in your function. I regret to inform you that I once wrote a BASIC program that had three different sentinel values returned from three different subroutines!

Sentinel values sound old and busted, don’t they? Their usage fell a bit once we could pass around pointers to data structures. With a little more elbow room, we could put a “success” boolean right up front in the data structure and do away with a lot of sentinels.

They Live!

But sentinels are still around. In JavaScript, the string method “indexOf” returns a -1 if the needle (substring) can’t be found in the haystack (the string to be searched).

Let’s look at a few different ways we can deal with that awkward -1.

function wrappedIndexOf(needle,haystack) {
    var res=haystack.indexOf(needle);
    if (res===-1) {
        res=false;
    }
    return res;
}

And you call it like this:

res=wrappedIndexOf(needle,haystack);
if (res!==false) {
    location=res;
}

We haven’t done much here but replace the need to check explicitly for -1 with a need to check explicitly for false. Too weird to sometimes return a number and sometimes return a boolean? Yeah, probably. I like it a bit better than the numeric sentinel because the calling code makes the exception check a bit more obvious. Because JavaScript functions can return just about anything (including wild things like anonymous functions), you always need to think about what can come out of a function, so what’s important is a consistent convention.

Similarly, you could return null or undefined.

But let’s move on to another solution–returning an object.

function wrappedIndexOf(needle,haystack) {
    var res=haystack.indexOf(needle);
    if (res===-1) {
        res={"success":false,"value":-1};
    } else {
        res={"success":true,"value":res};
    }
    return res;
}

And you call it like this:

obj=wrappedIndexOf(needle,haystack);
if (obj.success) {
    location=obj.value;
}

Note that I’ve left the -1 in obj.value, so you can still use that as a sentinel if you like.

What makes JavaScript really good for this task? Its object literals. You don’t need to have a structure or class around to hold the extra info. You just build the object on the fly and return it.

Advertisements

Eliminating Duplicates

Multiplicity

Yesterday I ran into a programming problem and I didn’t like any of the solutions I came up with. I went to bed kind of grouchy about it.

Here’s the problem: Given a huge number of strings, return an array that has each string appear just once. Eliminate all duplicates.

This is a common enough problem in programming that there exists a large body of literature about it.

The solutions I found, though, were pretty ugly. They either involved nested loops and a lot of comparisons, or brain-busting regular expressions.

Now, I’m always willing to look at the regular expression solution to a problem, but for me a RegEx solution is a lot of work. It’s like trying to dredge up my high school Latin. You know the joke–there are 10 kinds of programmers, those who understand Regular Expressions, and those who don’t.

I wanted small, simple code. I couldn’t find anything that really felt right.

Going with the Flow

What kept bugging me? In the back of my head, I kept thinking that all these solutions are coming from other languages. They’re not taking advantage of JavaScript.

Isn’t there something in JavaScript that naturally removes duplicates? And finally, it hit me: object keys.

Objects in JavaScript are hashes–they are made of two parts: the left part and the right part.

{"left":right}

The key is unique, but of course the value can be duplicated.

Then I had it: The “key” is the key. All I have to do is loop through the strings and assign them to the keys of an object. Hashes are an automatic duplicate removal machine.

And it works. JavaScript does the work of eliminating the duplicates naturally. I’m not doing the heavy lifting in JavaScript. Instead, JavaScript is doing the heavy lifting internally in some kick-ass compiled C loop.

My code stays clean and minty fresh because I’m relying on heavily tested and optimized code inside the JavaScript interpreter of the browser.

function eliminateDuplicates(arr) {
  var i,
      len=arr.length,
      out=[],
      obj={};

  for (i=0;i
var a=[];
var b=[];

a[0]="fish";
a[1]="fishes";
a[2]="fish";
a[3]="1";
a[4]="fishes";

b=eliminateDuplicates(a);
console.log(a);
console.log(b);

---

["fish", "fishes", "fish", "1", "fishes"]
["fish", "fishes", "1"]

Further

Armed with this concept, I was able to simplify my code even more. My data didn't start out as an array. Instead, it was coming from a file, so I just grabbed the strings as they came in and stuffed them right into the object as keys.

Every language has its own strengths and weaknesses. You get a nice feeling when you finally see the problem you're trying to solve in the context of the tool you're using to do the job.

You Could Lose Your Mind

What happens if you initialize an object with the same key (name) twice? I tried it in FireFox, Internet Explorer, Safari, and Opera.

In all cases, the last assignment (the rightmost one) is the winner.

var obj={a:1,b:2,a:3,b:4};

You end up with an object like this:

{a:3,b:4}

I’m not sure that’s mandated by the language spec. It might depend on the implementation. Anyone know?

Missing and Default Parameters in JavaScript

Passing parameters to JavaScript functions seems simple enough. Since the language is weakly-typed, all the parameters in the call are simply comma-separated.

And in the definition of the function it can get no more complicated than that–a comma-separated list of parameters.

Not much room for flexibility, is there?

But let’s take a closer look. Suppose you don’t pass all the parameters that a function is expecting. What happens?

If you shortchange a function, all the parameters you failed to supply are undefined.

What happens if you give more parameters than the function expects? They are ignored, but you can find they are still accessible in an array-like list called arguments. (No, it’s not an array. While it does have a length, it’s missing the methods that come with arrays.)

Now What?

So we’ve revealed a bit of flexibility–JavaScript won’t come to a screaming halt if you supply more or less parameters than a function expects. It’s more flexible in handling parameters than we might have suspected.

Some languages allow you to specify default parameters. Can JavaScript do that? Well, it’s not in the spec, but JavaScript programmers commonly provide default parameters using JavaScript’s || operator.

a = a || 1;

If a is undefined, a will be assigned 1. A will also be assigned 1 if a is 0. That’s not always what you want, so sometimes a condition is best:

if (a===undefined) {
  a=1;
}

Let’s try one. We’ll make a function that takes three parameters.

  • a will have the default value of 1
  • b will have the default value of 2
  • c will have the default value of 3
function test(a,b,c) {
    a=a||1;
    b=b||2;
    c=c||3;
    console.log(a,b,c);
}

We can call it like this:

test(1,2,5);
test(undefined,3,undefined);
test();

And get output like this:

1,2,5
1,3,3
1,2,3

If you have a lot of parameters coming in, that can get messy fast. Is there a better way?

There Is a Better Way

More than a few programmers have taken a stab at providing a nice default parameter implementation, and JavaScript is flexible enough to provide many ways to do it.

Most of these solutions feel over-engineered to me, so I spent some time thinking about the problem and came up with my own solution. I wrote a function called defaultHandler. This is going to be easy! Take a look:

    defaultHandler=function(defaults,params) {
        var i;
        for (i in params) {
            defaults[i]=params[i];
        }
        return defaults;
    }

We start with the “defaults” object that was passed in. Obviously, this holds all the default values for the parameters we expect to get passed in. Then we spin through everything that was passed in as a parameter and add it to the defaults, overwriting values when we run into identical keys.

Simple. Too simple, perhaps. The trick must be in how we use it. Is that it? Well, here’s an example.

    function test(obj) {
        var params=defaultHandler({a:1,b:2,c:3},obj);
        console.log(params);
    }

Hmm. Not much going on there, either. I told you this was going to be easy!

Here, I call our test function:

    test({a:5,x:"tuna"});
    test({});
    test({b:10,y:5});

And get the following output:

Object a=5 b=2 c=3 x=tuna
Object a=1 b=2 c=3
Object a=1 b=10 c=3 y=5

That’s all there is to it. Instead of passing parameters, we always pass in one single object that holds all the parameters.

In effect, we’re ignoring JavaScript’s “almost an array, but not really” parameter-passing system, and using a full-blown object to do the parameter passing work. Since objects can hold anything (numbers, booleans, arrays, objects, functions, probably even regex descriptions), nothing has changed except the packaging and naming of parameters. And we’ve gained tons of flexibility.

Come to think of it, you don’t even really need the default handler. That’s just a nice way of pruning down the parameters which need to be passed.

Why Would I Do This?

Imagine you’ve written an operating system. It has a nice windowing user interface. You have function calls like “OpenWindow” that take some parameters. In a later version of the interface you want to start adding parameters to the function.

It can get messy. You want to keep backwards compatibility so old applications don’t break. So maybe you have a slew of new functions like “NewOpenWindow.” Then you come out with another revision and you want to add still more parameters.

When it gets silly enough, you look for another solution. The Amiga guys solved this problem with taglists. Taglists were much like an object–a list of names and corresponding values. You only supply the parameters you want to.

That was the first time I saw this sort of solution. They added an OpenWindowTags() function and that solved the problem forever. Unfortunately, in the case of the Amiga, “forever” didn’t last very long.

It’s a great solution for APIs of JavaScript libraries. You can add new stuff in the future without breaking any old calls to your library.

A good rule of thumb is to use JavaScript’s normal parameter conventions whenever the number of parameters are small and it’s inconceivable that they might change. Otherwise, pass in an object.

Quickie: Nameless functions in arrays

In JavaScript, you can write functions that look like this:

    function add(a,b) {
        return a+b;
    }

    function multiply(a,b) {
        return a*b;
    }

It’s the C way of writing functions, so it should look pretty familiar and comfortable. But it’s really just a short cut. There’s another way to write named functions.

    add = function(a,b) {
        return a+b;
    }

    multiply = function(a,b) {
        return a*b;
    }

If we like, we can put functions into objects.

ops={add: function(a,b) {return a+b;},
     multiply: function(a,b) {return a*b;}};

and now we can call the functions ops.add(a,b) and ops.multiply(a,b). Or, you can use another notation and call the functions with ops[“add”](a,b) and ops[“multiply”](a,b);

In a similar manner, because arrays can hold objects, we can do this:

ops=[function(a,b) {return a+b;},
     function(a,b) {return a*b;}];

I’ve stripped out the names of the functions and now they look like the anonymous functions that are so common in JavaScript. In this case we still have a way to call the function–by indexing into the array. Normally we create anonymous functions on the fly and bind them to an event like like a button click and never call the function directly in our code.

To get to these almost-anonymous functions I’ve defined in the array, call ops[0](a,b) or ops[1](a,b).

How Weird are JavaScript Arrays?

How weird are JavaScript arrays? Really weird.

In most programming languages that I’ve worked with, arrays and objects are two different types. An array is a pie, and an object is a cake.

JavaScript is so object-oriented that it’s difficult to draw a line between the pie and the cake. In JavaScript, arrays are cheesecake. This can be disturbing, It can make some programmers angry. I’m going to ask you to try to remain calm. If I upset you, please, just walk away.

It all starts out normally (like the first scene of a Hitchcock film). No surprises yet. Let’s look at an object and an array.

var obj={"species":"dog","name":"buster"};
var arr=["dog","buster"];
alert(obj.species); //dog
alert(arr[0]); //dog

Pretty close. The object uses braces, rather than square brackets. While the object labels its elements, the array numbers them. You use dot notation to get at the info in an object, and the square brackets (array notation) to get at the info in an array.

Now it Starts to Get Interesting

We can also get at object values with this array-like syntax:

alert(obj["species"]); //dog

Now, our object looks almost like an array.

In most languages, arrays are optimized for speed. Each element is the same type and size and can be quickly accessed. In JavaScript, arrays are so loose that there’s really no speeding them up. Each array element can hold a different type.

arr[0]='pig';
arr[1]=true;
arr[2]=[5,2,7,3];
arr[3]={'a':1,'b':2};

An array element can even hold a function (or an array of functions)!

This is massively handy, if unusual.

Sit Down, I Have Something Scary to Tell You

I’m sorry, but it’s time for me to do something that may weird you out. Look at this…

arr["state"]=0;

That’s upsetting. Arrays can do more than just hold their numbered elements, they can also act just like objects, holding named values. If you want to kick something, or yell, I don’t blame you. If you need to take a few deep breaths, I understand.

You have a choice to make. You need to get all cold, steely, and rational and decide if it makes sense to take advantage of what JavaScript offers, or if you’re better off backing off and doing things the way you’d do them in other languages. That depends on your situation. Are you the only coder who will ever see the code? If you’re working with others, you may very well blow their minds if you treat an array like an object. Or maybe you’re working with some like-minded cheesecake aficionados.

Are You Insane?

But there are some very nice things you can do with this. Having all information that’s associated with the array (maybe even the array’s methods) live inside the array can make your code easier to deal with and understand, but only if you understand that an array is a cheesecake.

If you don’t take advantage of this, but you do want to keep everything associated with the array together, you’ll use an object to wrap everything up, and the array will be part of the object. This is more clear to the traditional thinker, but it does have a cost–the overhead of all the extra symbols you’ll be typing to access things.

Here are an array and an object, both designed to hold the same things.

var arr=[],obj={};
arr.state=0;
arr[0]='dog';
arr[1]='cat';
obj.state=0;
obj.data=[];
obj.data[0]='dog';
obj.data[1]='cat';

I can make a pretty good case that the array is “better” in this case, assuming you’re a big fan of cheesecake.

True Story.

I once took a cat into the vet. It was my first time there so I had to fill out a form. One of the things they wanted to know was the pet’s species.

Crap. I dug deep into the part of my brain that was supposed to be holding on to my high school Latin, but I just couldn’t quite get it. I made a stab–felis domesticatus–no, can’t be right.

I walked up to the lady at the desk.

“Do people really know this?”

“Know what?”

“The species of their pet.”

“Yeah, usually.”

I pointed at my cat. “What’s this?”

She blinked, then answered, “It’s a cat.”