Wednesday, 28 March 2012

Store numbers as numbers

Have you ever had your inner voice say to you 'you idiot' or 'you tool' or just 'Bravo' whilst clapping sarcastically (Homer: The Last Temptation of Homer)? Yeah, me too, twice this week in fact. The second instance was particularly embarrassing, so I think I should share it. It was more embarrassing as I was just complaining that morning about how the MongoDB drivers should probably implement basic functions such as this.

Now the eagle-eyed among you will spot this straight away, so please don't shout out the answer and spoil it for everyone else.

Imagine you have a collection - I'll keep it as small as possible so that the error stands out more - and you want to find the maximum value of "someNumber". Simples; I've done this a thousand times, well hundreds of times. I think for the first five times I had to find the max I implemented a different solution each time; back when I first started with Mongo.

>db.coll.insert({"someNumber": "1", "someOtherRubbish": "blah"});
>db.coll.insert({"someNumber": "10", "someOtherRubbish": "meh"});
>db.coll.insert({"someNumber": "20", "someOtherRubbish": "feh"});
>db.coll.insert({"someNumber": "50", "someOtherRubbish": "eh"});
>db.coll.insert({"someNumber": "90", "someOtherRubbish": "blahblah"});
>db.coll.insert({"someNumber": "100", "someOtherRubbish": "fehblah"});
>db.coll.insert({"someNumber": "110", "someOtherRubbish": "mehblah"});
>db.coll.insert({"someNumber": "120", "someOtherRubbish": "blaheh"});
>db.coll.insert({"someNumber": "150", "someOtherRubbish": "ehblah"});
>db.coll.insert({"someNumber": "200", "someOtherRubbish": "fehfehblah"});
>db.coll.find()
{ "_id" : ObjectId("4f7320a3d7378e0737d6406f"), "someNumber" : "1", "someOtherRubbish" : "blah" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64070"), "someNumber" : "10", "someOtherRubbish" : "meh" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64071"), "someNumber" : "20", "someOtherRubbish" : "feh" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64072"), "someNumber" : "50", "someOtherRubbish" : "eh" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64073"), "someNumber" : "90", "someOtherRubbish" : "blahblah" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64074"), "someNumber" : "100", "someOtherRubbish" : "fehblah" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64075"), "someNumber" : "110", "someOtherRubbish" : "mehblah" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64076"), "someNumber" : "120", "someOtherRubbish" : "blaheh" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64077"), "someNumber" : "150", "someOtherRubbish" : "ehblah" }
{ "_id" : ObjectId("4f7320a4d7378e0737d64078"), "someNumber" : "200", "someOtherRubbish" : "fehfehblah" }
view raw jandefence hosted with ❤ by GitHub
I'm going to show only one way of finding the maximum value, I'm choosing this particular method for no other reason than it's probably the simplest and easiest to understand.

In this slightly contrived example I only have 10 documents in the collection and in my defence the collection I was working on had a few hundred thousand - not much of a defence granted, but henceforth I'll refer to it as the Jan Defence. So, initially I ran this query:

> var cursor = db.coll.find().sort({"someNumber":1});
> var max = -1;
> while(cursor.hasNext()) {
... var current = cursor.next();
... if(current.someNumber > max) {
... max = current.someNumber;
... }
... }
90
> print(max);
90
view raw jandefence1 hosted with ❤ by GitHub
Worked first time I thought. Rock on. Hang on... that number seems a bit low. Let's put in some logging:

> var cursor = db.coll.find().sort({"someNumber":1});
> var max = -1;
> var count = 0;
> while(cursor.hasNext()) {
... count++;
... var current = cursor.next();
... if(current.someNumber > max) {
... max = current.someNumber;
... }
... }
90
> print(max);
90
> print(count);
10
view raw jandefence2 hosted with ❤ by GitHub
Instead of the 10 represented here, replace that with a number as per the Jan Defence. Lets put in some more logging.

> var cursor = db.coll.find().sort({"someNumber":1});
> var max = -1;
> var count = 0;
> while(cursor.hasNext()) {
... count++;
... var current = cursor.next();
... print(max + ' ' + current.someNumber);
... if(current.someNumber> max) {
... print(max + ' ' + current.someNumber);
... max = current.someNumber;
... }
... }
-1 1
-1 1
1 10
1 10
10 100
10 100
100 110
100 110
110 120
110 120
120 150
120 150
150 20
150 20
20 200
20 200
200 50
200 50
50 90
50 90
90
> print(max);
90
> print(count);
10
view raw jandefence3 hosted with ❤ by GitHub
Again, remember the Jan Defence. Surely 50 isn't bigger than 200.

<penny drops/>

It is, if it is a String. 'You tool'.

> var cursor = db.coll.find().sort({"someNumber":1});
> var max = -1;
> while(cursor.hasNext()) {
... var current = parseInt(cursor.next().someNumber);
... print(max + ' ' + current);
... if(current > max) {
... print(max + ' ' + current);
... max = current;
... }
... }
-1 1
-1 1
1 10
1 10
10 100
10 100
100 110
100 110
110 120
110 120
120 150
120 150
150 20
150 200
150 200
200 50
200 90
> print(max);
200
view raw jandefence4 hosted with ❤ by GitHub
Et voilà!

Fortunately I didn't burn too much time with this.

The moral of the story? You should store your numbers as numbers.

If anyone would like to see more implementations of finding the max and min values, I'm happy to share.

No comments:

Post a Comment