Is your Taxonomy a Map or a Maze?

By Daniel O'Connor

As per usual inspiration struck me while biking through my neighbourhood one weekend. Before you assume incorrectly I am not an avid cyclist, nor an athlete of any note. It is a form of masochistic torture I endure to avoid feeling my actual age while making myself feel way beyond my actual age.

The inspiration was this: My neighbourhood is right next to the worst maze of streets I've ever encountered. To understand this let me set a stage. I left my house, and turned left on the first major street. I then turned left on the next major street. A mile later I turned left on the next major street. Seems simple, doesn't it? I could concentrate on breathing, staying straight, monitoring my heart rate, and cursing myself for being on the bike again.

Then I decided to take the tunnel under the highway back to my house. I could avoid having to cross four lanes of traffic that tends to do 20 miles per hour over the speed limit, and seemed like a logical shortcut. This involved riding into a neighbourhood that is a series of cul-de-sacs and dead-ends. I turned left into the neighbourhood, which was my first mistake. I then turned right. And then left. another left. a right. down the hill, and to the right again. Left. Right. Completely lost. And out of blind luck I found the road to the tunnel. There was no sense to the layout of the roads in this neighbourhood. There were three unconnected roads with the same name. The road I could have followed all the way to the tunnel had two breaks in it that I had to navigate around. It was truly a maze of streets.

Imagine these two different sets of roads are your taxonomy. I realize that most site navigation specialists will state "Nobody uses taxonomy anyways... they all use search." This is arguable, but not at all my point. This is about data collection, and how most data collection starts from a point of failure; placing an item correctly. How many times have you browsed a Web site to see a boot rack in the middle of a bath towel experience, an Easy-Bake oven in the appliances section, or other obvious miscategorised items? How many times didn't you see the miscategorised items?

Imagine you are attempting to place an item into your taxonomy. Which would you rather have: A map, or a maze? This should be one of the first concerns when setting up a taxonomy, especially if you have a dedicated data collection taxonomy. If an item cannot be easily placed in the correct node in your taxonomy every data quality element that comes afterwards is suspect. How can your data inputters possibly answer the correct responses to your data questions if they are answering attributes that aren't designed to be asked about that item? Correct item placement is the most important starting point in data collection, and often the most overlooked.

In my years I have seen many methodologies to solve this problem. Most involve remediating the issue after it occurs, which readers of my posts know is my least favourite response. Scraping your Web site to find items that are incorrectly classified or waiting for customers to point them out bleeds confidence in your Web site. It's unprofessional, and your Web site experience suffers.

I've also seen process attempted as a solution for this. Attempting to have a human with vague knowledge of a taxonomy layout attempt to make decisions to influence others with vague knowledge of that taxonomy is -  for lack of a better description - silly. The academic world and the non-retail world have been using automated classification techniques for years with varying levels of sophistication. At Taxonomy Boot Camp last year all the non-retail environments weren't talking about how to do automated classification: They had already completed it and were on to the next level. Retail is still playing catch-up.

Automated systems are not simple, nor are they a maintenance-free environment. Retail taxonomies are fluid, and therefore automation systems require similar maintenance. This is an investment in data quality, not a one-time endeavour. For smaller taxonomies, taxonomies that aren't dedicated to data collection, or where the maintenance costs are too great this may not be a palatable solution.

This is where taxonomy development becomes the solution. Simply put, if your taxonomy is a maze it will be difficult for your data inputters to put their items in the correct location. What does a taxonomy maze look like? There is no one magic silver bullet answer to this. It involves many different factors. Here is the short list:

  • The top level nodes are not mutually exclusive. The ambiguous top level nodes leads to confusion.
  • The naming of nodes is not in common language. Calling a sledge hammer a "Macro Adjuster" doesn't describe the items you expect in that node.
  • There are no definitions or synonyms documented and available for each node, parent or child.
  • There are multiple paths to set up a single item.
  • There are "miscellaneous" categories, more commonly known as dumping grounds.

All of these factors lead to items being set up in the incorrect node. Conversely, a taxonomy map has the exact opposite traits. It's mutually exclusive, has available documentation, and uses common terms. The ambiguity is limited. (Let's face it...  Ambiguity exists in almost every taxonomy. If they were perfect we'd never have to maintain them.) Item classification is simplified, and therefore there are fewer points of failure in your item setup process. Data quality improves just by having items in the node with the correct attribution to describe them.

And here is the most important part: The people who classify items in your taxonomy generally think the same way your customers do. If you only have display taxonomies having a map taxonomy is more intuitive for those guests that, believe it or not, still use navigation to filter to items. (I'm one of those people... we do exist.) A more natural map-like taxonomy, where the path to one node is guiding by the path from the previous road, can only improve site experience.

Even if your business has a data collection taxonomy the fundamentals of data quality should always include improving your taxonomy towards meeting these goals. Neglecting item placement is another way to let data quality issues occur, and the rework involved in resolving these kinds of issues is expensive and time consuming. Starting items in the correct classification is paramount to having good controls over your data quality.

Once again, this isn't an entire solution. This is a piece to a puzzle for both taxonomy development and for data quality. There are dozens, and possibly hundreds, of other ways you can influence your data quality that may be cheaper and provide results in a shorter time frame. However, neglecting to understand that your taxonomies are assets to be maintained instead of costs to be incurred, and that your taxonomies are the starting point for all item data, is another way to fail at data quality.

Finally, sometimes the shortcut isn't as short as you think it is.

Dan O'Connor is a business process manager for data solutions at Target Corporation and has worked on retail taxonomies for multiple Fortune 500 retailers.