My kid plants tomatoes every spring. She waters them, gives them some kind of fertilizer, makes sure they get plenty of sunshine and in turn, the plants produce dozens of little bite-sized tomatoes for her enjoyment. I’m always amazed at how little effort it takes to get all those tiny tomatoes.
Like tomatoes, taxonomies require attention. If a taxonomy is left to the weeds, it will lose its ability to return thorough, relevant search results over time as new terms enter the lexicon.
An example of this is the acronym HIX – Health Insurance Exchanges, a term itself which had little importance or prominence until the advent of the Affordable Care Act. Once editors discovered how this new acronym was being used, it was a simple matter to add it to the collection of terms used to run various Health Care searches and thus improve those results.
Improving the taxonomy over time involves more than just identifying new terms. An important aspect of taxonomy improvements circles around disambiguation, especially when people are involved. Being able to understand the difference between Ferguson, Missouri and Bob Ferguson, the Attorney General of Washington is critically important in keeping the relevance of search returns as high as possible. Similarly, knowing that there are two people named Will Smith in the news – one a high profile actor, the other a recently murdered ex-New Orleans Saints football player – is something a quality taxonomy will be able to sort out.
One of the things we pay attention to is the care and feeding of our taxonomy. We improve, refine and update category and entity lists to avoid as many ambiguity problems as possible. We categorize about 400,000 new articles every day into roughly 4,500 taxonomic categories with a system that filters through and identifies millions of people, places, companies, brands and more. Our goal is to revise 4% of our major (most used) filters every month, which provides a one-year half-life on our most important resources.
With an effective taxonomy that is constantly reviewed and regularly updated, the search for content can go on in its most efficient manner and provide the best return possible.