Friday, June 02, 2006
To me Web 2.0 is really Data 2.0 (on the web), where what I mean by Data 2.0, is a new level of abstraction that sits above our traditional thinking about centralized SQL Relation database systems. The totalitarian authority of a fixed predefined relational schema as the central pivot around which all data processing revolves, may soon be giving way to a more flexible and variable set of data store options, that we are already starting to see with XML, RSS, Atom and RDF data stores and the increased use of custom file based metaphors for storing and caching data. What has made the relational model so powerful, is its grounding in the table/row metaphor, which provided a very intuitive set of constraints that allowed humans to understand a wide variety of problems in a way that also provided an efficient technical solution. What we are now seeing, is that intuitive table metaphor, being abstracted from a specific technical solution so that it can work with a variety of different storage and query solutions. It will also mean that "database/table schemas" become a much more flexible and dynamic concept. To me the most interesting manifestation of this trend is the Microsoft LinQ project. I think one of the most interesting questions for the future, that Jon Udell was emphasising in this interview with Anders Hejlsberg, is the extent to which these additional capabilities will be delivered by expanding the scope of the traditional players like Oracle and SQL Server, or whether it will result in new unexpected pluggable storage solutions with more focused "domain specific" solutions. Some of the specific things that I would like to see include:
- To be able to accept any data without a schema.
- To be able to define multiple schemas for working with the data, after the data is loaded.
- These virtual schemas should appear as strongly typed relational tables with a primary key.
- It should be possible to dynamically apply efficient indexing strategies along with the dynamic application of new schema metadata.
- These virtual tables should be available through a declarative “SQL like” syntax, for run time definition and querying, in a query tool or directly in the syntax of any programming language.
- Queries can be filter using full-text and relational criteria within the same query, and returned a page at-a-time in rank order (i.e. Google).
- As data progresses through workflows, it should become a member of an ever changing set of schemas to match the transformation of the data where the virtual table definitions are controlled by referential integrity
- As the data is updated and extended through additional metadata the updates are reflected in all dynamic/virtual projections of the data.
- Also, to be able to cache aggregate views based on an “as of date”, which represent a sequence of consistent snapshots of the data, based on a cycle (hourly, daily), or based on data change events.