<$BlogRSDUrl$>

Saturday, December 20, 2003

XML search: We Don’t Know the Answer 

via Tim Bray To return to maybe the most interesting question: given that XML text contains a rich, nested, sequenced, labeled structure, how do we use that to add value to search? . . . The following example finds the set of all paragraphs in the introduction that contain one or more footnotes and also one or more cross references.
set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set2 including Set('xref')

The following example finds the set of paragraphs in the introduction that contain either a cross reference or a footnote but not both.

set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set1 including Set('xref')
set4 = (set2 + set3) - (set2 ^ set3)

Of course, you can combine element sets with search:

set1 = Set('Title', contains="introduction")
set2 = Set('Title', attribute=("xml:lang", "en-US"))

I think that this approach is important to think about because, unlike some of the other approaches below, it has been commercially implemented and has been proven to work just fine and be very useful for searching XML.

. . . In case it’s not obvious, we haven’t figured out what the right way to search XML is. It’s worse than that, here’s a list of the things that we don’t know:


Comments: Post a Comment

This page is powered by Blogger. Isn't yours?