Thursday, June 24, 2010

SQL Trees

Represeting tree structures in SQL is fun.
While direct modeling techniques in object oriented wisdom dictates the use of collections to references of children or references to parents, the story while doing some SQL could be quite different.

A naive implementation would have this shape: {id, parent_id}. parent_id maybe nullable or not. The non nullable approach would designate an id for the root virtual node, probably set as 0.

Another implementation would go the encapsulation route. The principle is extremely simple, but first here's the shape of a record: {id, lft, rgt}, where lft and rgt represent left and right respectively. How this record could represent relationships you may be keen to ask? The rules are very simple:

For all a and b we have the following rules:
* a.lft belongs to the interval ]b.lft, b.rgt[ <=> a is a child of b
* a.lft < b.rgt => a.rgt < b.rgt

The first rule defines the ancestry relationship.
The second rule defines validity of the data.

All operations on this table must respect these propositions.

The operation specification is actually very straightforward and intuitive once you accept the above rules and internalize them.

Insertion into a node is about pushing elements after that node to make space inside the node for the new one.

A data base may start with one single node as follows:
{0, 1, 2} ---remember the tuple representation mentioned above.
Inserting a new node under no other would result with:
{0, 1, 2}
{1, 3, 4}
Inserting another node under 0 would push the right property of node 0, in addition to the properties of all the following nodes (here only one with id=1)
{0, 1, 4}
{1, 5, 6}
{2, 2, 3}

This is of course not my ingenious invention. I read about it somewhere on the web, but ultimately the source I relied on to get the foundation concrete was Joe Celko. Look the name up, I recommend his books - condensed amounts of wisdom.

No comments:

Post a Comment