Tuesday, May 19, 2009

Big Data and the problem of metadata 

Jon via Michael E. Driscoll
The meta-data problem is enormous. Say you’ve got a field called “diagnosis date” for some disease. It’s in a database with a date type, so there’s no format issue. What exactly does that date mean? First appointment with a family doctor? First appointment with a specialist? Is it self-reported? Has the date always meant the same thing, or did the meaning of the field change over time as personnel changed? Those are all problems with interpreting data that is all sitting in one institution’s private database. It’s hard to “integrate” data that is supposedly already integrated.

Topics: Data 2.0 | Metadata

