Sep 20, 2012

Categorize or Search?

SOA design time governance products come with a variety of methods to categorize the services or assets.  Interestingly I’m currently working on a project involving a document management system, and the select document management tools comes with almost an identical selection of categorization methods.  These include trees, taxonomies, and domains among others.

In some of the earlier SOA design time governance implementations I performed, we spent significant time working on the categorizations – trying to make the catalog and information easy to traverse for the various user categories that would encounter in.  (In the case of design time governance, this might be analysts, architects, developers, QA, and IT management.)

We invariably found designing the categories and approaches took a tremendous amount of time, with every constituency having different ideas and requesting various adjustments to the approach.  To some extent it became the never ending quest for the perfect structure, never to be found.

We saw this same pattern emerge and fail among the World Wide Web.  For those with a little Internet history behind them, they may remember the early Internet Indexing + Search Sites, such as Lycos, AltaVista, Netscape and Yahoo.  Each presented various approaches to indexing, categorizing and presenting the Internet in various taxonomies.

One day a newcomer arrived named Google who presented one simple function…search.  Within a short time all the indexing and taxonomy sites were dead.

The way we do things is limited by our technology.  When we’re doing things manually, our technology may be bookshelves or filing cabinets, paper index cards and human retrieval methods – or even human memory capacity.  As we automate processes with newer technology, it’s perfectly normal to take the previous process and “enhance” it with the new technology – but the previous process still exists.

Once we truly understand the possibilities and capabilities of the new technology, we often supersede the previous process.  In the case of cataloging our information, our models came from books, libraries, index cards, filing cabinets, etc.  Even if we were storing our information in high speed relational databases, our access models were based on our previous process – catalogs, sorted indexes, grouped information.

Yet Google has shown us that model is significantly less efficient and of less value that a capable search. 

When we’re modeling today’s software abilities, user interfaces and data organization approaches, search should be the first and primary approach.  Categorization is just pushing the old much less efficient approach forward.