Friday, January 20, 2006

Entropy and metadata aggregation 

Stefano Mazzocchi
One thing we figured out a while ago is that merging two (or more) datasets with high quality metadata results in a new dataset with much lower quality metadata.

Independently, the two datasets are very coherent and a lot of time,money, energy was spent in them. Together and even assuming the ontology/schema crosswalks where done so that owners of the two datasets would agree (which is not a given at all, but let's assume that for now), they look and feel like a total mess together (especially when browing them with a faceted browser like Longwell)

The standard solution in the library/museum world is to map against a higher order taxonomy, something that brings order to the mix. But either no such a thing exists for that particular metadata field, or there is but it was incredibly expensive to make and to maintain and have a tendency, almost by definition, to become very hard to displace once you stick to one of them.

I find this discovery a little ironic: the semantic web, by adding more structure to the model but increasing the diversity at the ontological space might become even *more* messy than the current web, not less.

So, are we doomed to turn the web into a babel of languages? and are we doomed to dilute the quality of pure data islands by simply mixing them together?

Luckily, no, not really.

What's missing is the feedback loop: a way for people to inject information back into the system that keeps the system stable. Mixing high quality metadata results in lower quality metadata, but the individual qualities are not lost, are just diluted. There needs to be additional information/energy injected in the system for the quality to return to its previous level (or higher!). This energy can be the one already condensed in the efforts made to create controlled vocabularies and mapping services, or can be distributed on a bunch of people united by common goals/interests and social practices that keep the system stable, trustful and socio-economically feasible.

Topics: Meaning | Entropy | Metadata | RDF

