Sunday, 4 December 2011

A simple guide to finding distinct array values in a MongoDB collection

So you want to find unique values within an array within a document in a collection? A reasonable request.
In ANSI SQL you'll be using DISTINCT, JOINS and GROUP BYs, stuff you're used to, but in the NoSQL realm your best bet is mapreduce.
It might seem a little bit like hard work, and probably a little intimidating at first, but it is certainly worth it; mapreduce is an extremely powerful tool.

Set up the collection, the map and reduce functions, and execute the mapreduce command:



But now you want to find unique values within an array within each document in an entire collection.


The map function in this example iterates over each of the items and emits the key/value of each array element.
The reduce function aggregates the key/value from each of the emits from the map function. In this example we're looking at unique keys and maintaining a count of the unique keys.
If you're looking to find the distinct array elements for a single document, simply specify the document index. For the entire collection, just leave the query out. *simples*

No comments:

Post a Comment