Remembering the Future

Friday, June 16, 2006

Understanding the Future of Data: Data 2.0

I think the future of data lies in creating a virtual database over web-services and other sources of data. Data 2.0 if you will. If we had this virtual database spanning web-services then life would be so much easier for Web 2.0 application developers.
. . .
why? Well you 'the Web 2.0 developer' probably don't own all the data you want to use, and the real owner of that data probably isn't going to give us access to their database!
What we need is something to pull other peoples data together and make it look like it is ours!
So for the last month or so I've been doing a lot of thinking about pulling data together from all over the web, creating a virtual internet database.
What I see as the real key to creating a virtual internet database or Data 2.0 is upgrading the idea of the foreign key.

Topics: Database | Data2.0 | Architecture

# posted by Guy : 10:01 AM

(0) comments

Monday, June 05, 2006

The network is the network: public radio on the web

Jon Udell's Weblog

Here's what I like even more. John has correctly observed that, if a bunch of radio stations were to insert location and topic tags into their RSS feeds, it would be trivial for an aggregator to scoop them up. Listeners could then create -- and share -- custom radio programming.

Topics: Web2.0 | NPR

# posted by Guy : 8:22 AM

(0) comments

Friday, June 02, 2006

Data 2.0

To me Web 2.0 is really Data 2.0 (on the web), where what I mean by Data 2.0, is a new level of abstraction that sits above our traditional thinking about centralized SQL Relation database systems.

The totalitarian authority of a fixed predefined relational schema as the central pivot around which all data processing revolves, may soon be giving way to a more flexible and variable set of data store options, that we are already starting to see with XML, RSS, Atom and RDF data stores and the increased use of custom file based metaphors for storing and caching data.

What has made the relational model so powerful, is its grounding in the table/row metaphor, which provided a very intuitive set of constraints that allowed humans to understand a wide variety of problems in a way that also provided an efficient technical solution.

What we are now seeing, is that intuitive table metaphor, being abstracted from a specific technical solution so that it can work with a variety of different storage and query solutions. It will also mean that "database/table schemas" become a much more flexible and dynamic concept.

To me the most interesting manifestation of this trend is the Microsoft LinQ project. I think one of the most interesting questions for the future, that Jon Udell was emphasising in this interview with Anders Hejlsberg, is the extent to which these additional capabilities will be delivered by expanding the scope of the traditional players like Oracle and SQL Server, or whether it will result in new unexpected pluggable storage solutions with more focused "domain specific" solutions.

Some of the specific things that I would like to see include:

To be able to accept any data without a schema.
To be able to define multiple schemas for working with the data, after the data is loaded.
These virtual schemas should appear as strongly typed relational tables with a primary key.
It should be possible to dynamically apply efficient indexing strategies along with the dynamic application of new schema metadata.
These virtual tables should be available through a declarative “SQL like” syntax, for run time definition and querying, in a query tool or directly in the syntax of any programming language.
Queries can be filter using full-text and relational criteria within the same query, and returned a page at-a-time in rank order (i.e. Google).
As data progresses through workflows, it should become a member of an ever changing set of schemas to match the transformation of the data where the virtual table definitions are controlled by referential integrity
As the data is updated and extended through additional metadata the updates are reflected in all dynamic/virtual projections of the data.
Also, to be able to cache aggregate views based on an “as of date”, which represent a sequence of consistent snapshots of the data, based on a cycle (hourly, daily), or based on data change events.

Topics: Web2.0 | Representation | LinQ

# posted by Guy : 10:33 AM

(1) comments