Wednesday 28 March 2012

Store numbers as numbers

Have you ever had your inner voice say to you 'you idiot' or 'you tool' or just 'Bravo' whilst clapping sarcastically (Homer: The Last Temptation of Homer)? Yeah, me too, twice this week in fact. The second instance was particularly embarrassing, so I think I should share it. It was more embarrassing as I was just complaining that morning about how the MongoDB drivers should probably implement basic functions such as this.

Now the eagle-eyed among you will spot this straight away, so please don't shout out the answer and spoil it for everyone else.

Imagine you have a collection - I'll keep it as small as possible so that the error stands out more - and you want to find the maximum value of "someNumber". Simples; I've done this a thousand times, well hundreds of times. I think for the first five times I had to find the max I implemented a different solution each time; back when I first started with Mongo.

I'm going to show only one way of finding the maximum value, I'm choosing this particular method for no other reason than it's probably the simplest and easiest to understand.

In this slightly contrived example I only have 10 documents in the collection and in my defence the collection I was working on had a few hundred thousand - not much of a defence granted, but henceforth I'll refer to it as the Jan Defence. So, initially I ran this query:

Worked first time I thought. Rock on. Hang on... that number seems a bit low. Let's put in some logging:

Instead of the 10 represented here, replace that with a number as per the Jan Defence. Lets put in some more logging.

Again, remember the Jan Defence. Surely 50 isn't bigger than 200.

<penny drops/>

It is, if it is a String. 'You tool'.

Et voilĂ !

Fortunately I didn't burn too much time with this.

The moral of the story? You should store your numbers as numbers.

If anyone would like to see more implementations of finding the max and min values, I'm happy to share.

Monday 12 March 2012

Mongodb MapReduce scope variables

I recently had a requirement for conditional emission from a map function. Essentially I only wanted to emit where the date was within a given range.

In SQL, grouping a count for a given time granularity within a date range would look something like this:

I wasn't 100% sure about the 'right way' to achieve this in NOSql/MongoDB. So this is a solution.
It requires you to know about scope variables. Problem is, I found that scope variables are not very well documented. You can find more about scope variables in the MongoDB documentation MapReduce-Overview. The relevant parts are:

      [, scope : <object where fields go into javascript global scope >]
and
      scope - can pass in variables that can be access from map/reduce/finalize.

Back to this example. First, let's define some data:

Before we implement this, lets get it working at the command-line, in our mongo shell:

What is happening here?

Well, we're selecting the sub-document of this collection where the value of "meh" is "meh". Then we've defined two dates; from and to to represent the boundaries of the date range, we're including these within the MapReduce function call. Basically what this means is that we can use what ever is defined here in the Map function (btw, we can also use them in the Reduce and Finalize functions).

Once we have this working from the shell, it is straight forward to implement it. This is the very same implemented in Java.

Caveat: This is an example of how to use scope variables, I'm sure if you go to any of the Events or gmane.comp.db.mongodb.user group you'll get some advice straight from the 10Gen guys.

Friday 9 March 2012

Changing date types; from JavaScript UTC to Mongo ISODate

The scenario is this.

You have an HTML form that allows your customer to add an arbitrary amount of stuff to something. stuff is a partial JSON document which is a list of String/String and String/Date NVPs contained within something. Mongo will by default store the date as an ISODate. With that said, the something collection looks something like this:



The simple solution to the format issue is to display the Date as an ISODate on the form, thus when the form is submitted, the date can be treated just like any other string (though you still need to tell mongo to treat it as a Date and not a String, so you need to 'find' it and change the type; more on that later).

Would display: 2011-11-01T11:51:46.000Z

But that would be way too easy; in the real world, easy is considered as rare as unicorn herders. The Use Case/User Story/Whatever acceptance criteria is that we display the Date in the full UTC form. OK then.

Which would display: Tue Nov 01 2011 11:51:46 GMT+0000 (GMT Standard Time)

This is easily done, in fact, it is easier than displaying it as an ISODate, but now we have to parse it and convert to an ISODate, which involves finding it in the JSON document first.

Parsing a UTC date into an ISO is trivial but should have been easier than this - I needed to faff around with the DateFormat string before I got rid of the ParseExceptions.

This is roughly how I did it. Doesn't look too pretty tbh.


It's amazing how simple this should be. You set it up as a date, you pass in a date, yet you still need to explicitly convert it from initially from a String to Date, and then from UTC to ISO.

Tuesday 6 March 2012

Neo4j: Mechanically Sympathetic

About a month ago I was 5 minutes late for a #skillsmatter event, a talk on #neo4j and mechanical sympathy. The talk was given by a #neo4j employee, thus given by someone in-the-know.

Because of my tardiness I missed why neo4j is mechanically sympathetic; the whole point of the talk. I should have had a chat with the presenter after the talk to clear up my lack of understanding, however the Slaughtered Lamb is not as conducive to conversation as it once was...

Are we saying that neo4j is mechanically sympathetic in terms of the records being powers of 2, or that the talk allows us to understand how neo4j works in the same way that Jackie Stewart knew how his cars worked because he had previously worked as a mechanic? I'm guessing the later as the neo4j cheat sheet shows records as 5, 9, 25 and 33 bytes, not exactly powers of 2.

A question came up during the session about the how many nodes you could have; the maximum NodeRecordIdSize. The response was that it wasn't 4 bytes as shown on the cheat sheet, but 4 bytes, plus 7 bits. The suggestion was that you could use the 7 'spare' bits of the 'in use' byte as only 1 bit of this byte is actually used.

With that said, my questions is, how would that affect the 4 byte reference allocation in, for instance, the RelationshipRecord, which is only allocated 4 bytes? Where are the 7 'spare bits' stored?

From form to collection

I know I haven't posted anything for a while. The reason for that has been a mixture of being quite busy at work and home, sheer laziness and being on leave.

I've been hacking around for the last hour trying to find the easiest/quickest way to update an array of name/date pairs within a collection from a form, and its slowing dawning on my there is no elegant or even cheeky shortcut. It would be trivial if we were representing the date as a plain string, however, I want to use the mongo ISODate; besides, there is no point in making it easy for myself.

(Before I go any further I should point out I should be doing this in the DAO not on the client. This involves pulling out the array from the JSON object and iterating over each name/date pair and using the java.util.Date to create the dates)

So, lets say I have this form with stuff on it, plus a bunch of name/date pairs that we create dynamically and populate with their current values, like this...


Now lets say I want to be able to edit the dates, and PUT the form. Rock on, we can use the code from a previous post. Convert HTML form to JSON and POST using jQuery But hold on a second, Mongo is going to expect the date as a date type. This means we need to convert it from a Date to an ISODate. Fortunately this is trivial enough.


So we need to update formToJSON to look for specific fields. Rather than fall into some Turing tarpit, I'm going to hack it so I'm solving this problem only - and then go for a swim in the tarpit later.


Now we need to make sure that mongo know is is a date type and not just a string, so we need to use $date. OK.


Well, after hacking that in, we should be about there. Well, not quite. The next hurdle is the stringify method. It will escape the double quotes and PUT the string with \" instead of ". What a pain. So after we create our JSON string we need to do a bit more "tidying up". We need to replace \" with ". We can simply use replace() right? Now of course javascript replace only replaces the first instance, so we need to be a bit cheeky and pass in the find string along with the /g modifier.


Great, we're there. Almost. Finally we need to strip the extraneous " around the date value.


Well, that was easy (-;

I'm off to factor all this into what was a relatively elegant form to json conversion.
There is an easier way to do this. However, easy is no fun.