Saturday, June 12, 2004
Don't get me wrong, I do think there is a role for metadata. It's great for record-keeping purposes. Say I want to find all articles authored by a certain person, or created in a particular year. In these circumstances, accurate metadata is essential.
Such search activities are measured in the information retrieval community by the recall metric (which counts how many relevant documents - of the entire set of relevant documents - have been retrieved when some number of documents overall have been retrieved).
In a modern context however, assuming the existence of a content management system such as Sytadel, this activity is much better left to the CMS: firstly in assigning the metadata accurately and secondly in carrying out the retrieval activity (which is typically just a straightforward database query).
But in terms of improving your users' ability to search for documents in the way the people expect to search these days (that is, by issuing two or three query terms to the search engine and getting back a relevant set of results in a couple of seconds), metadata is completely irrelevant (excuse the pun).
For more reading on this issue, read Cory Doctorow's delightful article Metacrap - Putting the torch to seven straw men of the meta-utopia.
For more reading on the fundamentals of search, see Tim Bray's series On Search. (Tim takes the broad view of metadata, not the narrow schema-based view I discuss here. I also think he ascribes too much weight to Google's PageRank value as a significant component of Google's result ranking algorithm, but that's another story.)
I'm indebted to David Hawking for discussions over several years on the subject of metadata and search. I'm also looking forward to an upcoming study from him and Justin Zobel that he mentioned to me yesterday, which sets about objectively measuring the effectiveness of metadata-based search versus non-metadata based search in an enterprise setting with extensive metadata.