Saturday, December 20, 2003
XML search: We Don’t Know the Answer
set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set2 including Set('xref')
The following example finds the set of paragraphs in the introduction that contain either a cross reference or a footnote but not both.
set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set1 including Set('xref')
set4 = (set2 + set3) - (set2 ^ set3)
Of course, you can combine element sets with search:
set1 = Set('Title', contains="introduction")
set2 = Set('Title', attribute=("xml:lang", "en-US"))
I think that this approach is important to think about because, unlike some of the other approaches below, it has been commercially implemented and has been proven to work just fine and be very useful for searching XML.
. . . In case it’s not obvious, we haven’t figured out what the right way to search XML is. It’s worse than that, here’s a list of the things that we don’t know:Whether there’s going to be a lot of XML around in repositories to search. XML these days is more used in interchange rather than archival applications.
Whether the rewards to be found in enhancing search based on XML’s flexible, dynamic structure are great enough to justify the cost of building search systems that can deal with XML’s flexible, dynamic structure.
If there is a lot of XML around to be searched, and if people actually want to make the effort to use the structure to support searching, which kind of approach—minimal like Element sets, SQL-integrated, or the brave new world of XQuery—will prove to be the winner.