In need of a fresh perspective
This post is about a problem I’ve been trying to solve for a while now. I have a large data structure that represents, in the most general case, all the relationships between the data in a relational database. It’s like a big directory of all the contents in a database with tons of symbolic links.
My problem is that it needs to stay in sync with all that data. That’s not so much a problem when it comes to deleting or updating data, but when adding new data, figuring out where it should be referenced in this giant data structure without rebuilding the entire thing seems pretty difficult to me.
Why am I doing this? Hell, there might be a better approach to the whole damn thing if I explain, so let me give some back-story.
I’m working on this thing called Vantage. One of the things it does is provide a high level solution to help your users navigate the data in your application. The original idea was to have a tree menu that was procedurally created using the relationships of the data in the system. This would allow the user to browse to any item via associations, an idea I liked for two reasons:
First, as a navigation scheme, it maps directly to the actual data relationships. Not only would this simplify your system by eliminating a layer of abstraction, but now the underlying data structure is exposed to the user, giving them valuable insight into how the system really works.
Second, it encourages you to design your data structures in a way that will manifest intuitive navigation for your user, something that should also benefit your data model. Productivity++
That was the idea. There’s some flaws with this and I’ll go over those in a future post, but Vantage officially started as a prototype for this concept. The data structure I was using to build the prototype partially modeled a system to manage service plans and subscriptions of a hosting company. Here’s a rundown of that model, focusing on associations:
A service is type of service the company provides. It’s like a category. It has many service packages and one account.
A package is the description of a service plan. It belongs to a service and has many subscriptions.
A subscription represents a user’s subscription to a service package. It belongs to both one package and one user.
A user is pretty self-explanatory. It has many subscriptions and one account.
An account in this case is the type used to track financial transactions. Normally this would be part of an accounting system. It’s mostly here just to add a little more complexity to the example.
Since Vantage is built on top of Rails, imagine generating models for these and specifying the relationships (Vantage saves you some typing and tries to do this for you). Then imagine grabbing some icons and throwing in some test data. Your application would then get something like in the image to the right.
The tree structure used to only exist on the client-side. The children would lazily load and get computed on the fly when expanding a node. This became too much of a hassle and eventually it made more sense to pre-compute the entire tree on the server and maintain the entire structure there.
The client still lazy-loads the data, but it’s much faster. When things change that the client already has, the server will send the client some JavaScript to update the tree. So at the moment, the server maintains a completely expanded version of the tree, acting as a sort of relational index, and that’s where the operations happen.
So now I think we have enough information to get back to the problem. If we rule out rebuilding the entire tree any time a new item is created, how can I figure out where to put new nodes in this tree structure that persists on the server? Remember, it can show up in more than one place. In the example above, if I created a new subscription, it would show up under both the package and the user it was for.
In a recent revision, nodes that acted as a parent to items would know the type of items it contained, along with the query to get those items. For example, Services, the first node in the example image, knows it contains services and remembers how to get those same services again. With this information, I could go back and just rebuild the nodes that contained the type of item that was created. If I were to create a service, it would find service container nodes and rebuild from there. This still results in lots of useless rebuilding, though less than rebuilding the whole structure.
Is there any better way? Am I doing something completely retarded? Do I have to rebuild? I’m no indexing expert, but most of the search indexing engines I’ve worked with have to rebuild their entire index. I’m sure there’s indexers that don’t need to. Maybe looking at some of them could help?
February 22nd, 2006 at 5:24 pm
Seems like you have more of a graph structure than a tree structure, http://flickr.com/photos/termie/103078662/ if you take into account the number of places subscriptions are referenced (and accounts too for that matter as you’d probably have a Total Revenue account and such as well), and Hosting should know its users, too, not just subscriptions, and so on and so forth.
So, my helpful perspective suggestion is to look at your tree as graph, when you make a new addition just link it in by assigning its type, its parent, whatever, and instead of expanding the whole tree, build ‘perspectives’ so you can be looking at the data through a service’s perspective, through a user’s perspective… blah blah blah for any of your types. It’s a graph, you pick a center node and go out one level, if everything has a type associated with it you know which icon to put next to it.
February 22nd, 2006 at 11:37 pm
I think this is a similar suggestion to what Adam suggested last night. I mean, the actual data is a graph, but it’s hard to model that in the UI. A tree is the closest thing, but it has the constraint that a node can only have one parent. You can still represent the graph this way, you just have multiple nodes that represent the same thing. This just requires separate representation… there’s no such thing as a pointer DOM node.
Even if there was a single representation for an item node, you’d still have to make all the hook-ups with every parent. That is where the problem is. In some cases you can do a reverse lookup of associations, but there’s many cases where the parent isn’t going to be another item, but a node that represents a query. You can’t do a check to see if something falls under a query without running the query.
I think I’m just going to go with rebuilding the whole tree. Recent revisions have made this a more viable solution and I only have to do it on item creation. Because the overall structure is now defined by a schema metalanguage, I can let the programmer specify hints to help optimize the rebuilding process by narrowing it down to specific parts of the tree.
February 27th, 2006 at 4:38 am
look up ODBMS and ODL, it has functions for managing relationships and the corresponding inverse relationships to any number of objects in any relational schema you choose