Remembering the Future

Saturday, December 20, 2003

Responses to Topic Map/RDF comment 12/18

Dec 18 2003 Rik Morikaw A basic difference between RDF and TMs is their models i.e. RDF: subject (resource) - predicate (property) - object (value) TM: topic - association - topic So in many ways you are mixing two very different semantic standards; especially when considering capabilities (i.e. topic maps provide a richer semantic environment). If the purposes for your db is to provide higher level semantics and/or represent knowledge, then I think you would note some significant differences between the two. However, at the simpler "object level" (term used by M. Lacher and S. Decker), you would find some similarities, and thus be able to represent RDF triples in TM. (i.e. object level is where identities are assigned to objects, and where binary relationships can be described in graphical terms). Therefore, if your db is of the simpler kind, you could simply implement triples into the relational db as was suggested previously. If you want to model your db with a TM, then your interpretation of the RM might be: Subject as x-node, Predicate as a t-type, Object as x-node Note t-type gives you an assertion type Beyond lower level applications of RDF and TMs however the two can quickly diverge. A telling factor can be seen in their graphical representations RDF represented by directed graphs while TMs are non-directional and have been described using hypergraphs (i.e. greater complexities that must be mathematically modeled). A consideration for your db is who your audience is and the level of sophistication (ie semantics) needed If your db and application is to be interoperable with other external dbs then you might want to see what semantic standard they are using. As of today there seems to be too few ways to accomplish interoperability between RDF and TMs. Jan Algermissen Jason Cupp wrote: > > The RM has no mention of Data Model things like occurrences, scope, names, > variants, etc... only assertions and sub-graphs of the assertion graph that > are targets for reification in the n-ary relationship model: t-topics, > c-topics, r-topics, x-topics, a-topics. The RM operates at a lower level than the SAM. Since occurrences, names, variants etc are essentially all just relationships between subjects, assertions are sufficient to represent all of them. The Standard Application Model (SAM) is then simple the appropriate set of assertions types and properties. The goal of the RM is to enable the definition of such 'sets of semantics' as the SAM (in the current RM version, such a definition is coined as 'Disclosure') > > But if I'm implementing the DM using the new RM, do I have to do the > following things???? In general, the RM doe in no way constrain an application, the only thing you have to do is to document (and thus make explixit) how your application maps to the RM. For example, the a-sidp property might well be hidden in a certain method on objects of a certain class. > > 1) reify every name and occurrence resourceData, they become x-topics in > assertions for relating , , and occurrence>. But reifying literal strings gives me a new reification > operator "subjectLiteral", along with subjectLocator, subjectIndicator? First let me say that it is really great that you understood (or are very close to understandig) the essence of the RM - believe me, I know that it takes some effort to make the neccessary mind-shift. To answer your question: (Note that 'reification' has a different meaning in the SAM world than in the RM world. In the RM understanding, 'to reify' means to provide a surrogate (topic (RM-sense)) for a subject. So the question if you want to reify certain subjects is a matter of your needs: if you want to make statements about certain subjects, you need to reify them otherwise not. There are a lot of implications behind such a decision, especially concerning the power the Application (the Disclosure) you define will have. If you want, I'll go into details on this - please let me know. A remark: In my personal opinion the current SAM draft really falls short of the power it could have, because certain subjects are NOT reified (RM-sense!) which in turn limits the degree of, hmmm, 'information interconnection'. OTH, one of the purposes of the RM is to provide a means for investigating and discussion these decisions and their implications, so people have a sound basis to judge the quality of an Application ('Disclosure'). Bottom line: you can pretty much do what you like (you can also make everything (occurrences,names,etc.) a property) but you need to document what you do, thus maximizing the interchangeability of your topic maps. > > 2) does DM scope then become the total of all the other assertions (those > not included in the DM) made about these other assertions: a) , > b) , c) , d) ...) Here are two completely different, yet possible ways to do scope in terms of the RM: 1) Scope as a property, consisting of a set of topics - define a property in your Application/Disclosure that has a value type 'set of topics'. Let's call this property 'ScopingTopics'. It is an OP (other property) because it does not serve as a basis for merging. - in your systax processing model, define how, for example, XTM's element and children are to be processed and that the set of topics that make up the scope are to be assigned as the value to the 'ScopingTopics' property of topics that surrogate relationships (). Think of the result as t_a --ScopingTopics--> {t1,t2,t3} Major advantage of this approach: ease of implementation Major disadvantage: The scoping set is not reified, so you do not get the possibility to directly see all scoped relationships from a given scope (set of topics). This can only be added by a (proprietary!) indexing mechanism of a certain software. 2) Scope as an assertion between relationships and a subject that is a set of topics - define an assertion type 'at-association-scope' and two roles 'role-association' and 'role-scope' - define a property 'SetMembers' that provides the identity for topics that surrogate sets of topics. It is an SIDP, because topics with the same value for 'SetMembers' must merge (topics that represent the same set represent the same subject and therefore must merge). - have your processing model define the correct way to construct the 'SetMembers' property and the required assertion(s). Your result will look like this:

            at-association-scope
                    |
 role-association   |   role-scope
             |      |      |
             |      |      |
    t_a------C------A------C------x_s[SetMembers={t1,t2,t3}]

(where t_a is (as above) the topic that represents a given relationship and x_s is the topic that represents the set of topics that is the scope). Major advantage: all other relationships that x_s is a scope of will be connected to x_s in the same way, so you can directly see from x_s all the relationships it scopes. (e.g. to anser the question: "What are all occurrences in scpe {t1,t2,t3}). With this approach, this information is directly available, no propriatry indexing mechanisms are needed (meaning that no matter what SAM-conformant sofware you use, you'll allways have that information, likely also in the API then with some scope.getAllScopedAssociations() method. Major disadvantage: Topic Map (in store) increases in size But again, this is a matter of taste and desired power of your Application. > Just curious... jason You are welcome! Do you plan to implement something RM-based? I'd be happy to give you any information you need. Thompson, Bryan Guy and I have discussed this also. I think of this as de-normalizing the RDBMS schema for the reified triple store. Or perhaps re-normalizing it. In any case, it seems to me that this transform (of multiple rows of reified triples / ARCs into a single row of a schema specific table) can only be a post-merge operation. Prior to merging, it seems that the data needs to stay in the "normalized" form of the reified triple store model so that you can correctly recognize the various merging pre-conditions without undue overhead and complexity. Also, once you transform the reified triples into the schema specific table, you sort of lose the subject of the individual assertions that are being flatted into a single row of that schema specific table. I am curious if this matches with your experience. If so, then I can see how this transform can provide very fast query mechansims whenever there is some schema (and I am reading "schema" loosely here as some constraint on patterns of reified triples, so association templates and RDF Schema would both quality). However it will not help scaling for pre-merge operations. Unless you assume that all of your data is governed by some set of schemas that is known in advance? Guy Lukes I think that this is the key to building practical applications. Unless the data is being controlled by some kind of structural constraints, you are limited to processing the equivalent of "reified triples". By grinding everything down to "amino acids" to use a biological metaphor, it seems like it could be a lot of work to build everything back up to an appropriate output structure "protein". In a relational database this is usually something equivalent to numerous self-joins. The other option is to use the equivalent of "schema specific tables and indexes" to cache higher order structure. These structural commitments, however, limit your flexability, and can be difficult to keep in synch. I am also interested to hearing what practical experiences people have had in dealing with this problem. Thomas B. Passin > So in many ways you are mixing two very different semantic standards; > especially when considering capabilities (i.e. topic maps provide a > richer semantic environment). Wait, not so fast here! Let us limit ourselves, temporarily,to binary associations, so as to try to make the closest possible comparison to RDF, and see wherein they are all that different semantically. RDF: resource - represents an element of the world of discourse (WoD). TM: topic - represents (in a computer) an element of the WoD RDF: predicate - relates two elements of the WoD. TM: association (binary)- relates two elements of the WoD. RDF: Explicit roles: subject, object. TM: Explicit roles: whatever you decide. RDF: implicit roles - build into the definition of the predicates. TM: implicit roles - not normally used, but could use a generic role type. TM: subject identity - possible to define fairly well within TM. RDF: subject identity - not always easy to define within RDF. TM: scope - native concept (but not clearly and unambiguously defined) RDF: scope - not a native concept. TM: statement identity - can be established by giving the association an id. RDF: No such notion. RDF "reification" is not really equivalent. To the extent that an occurrence can be considered to be a convention for an association, occurrences are covered by the above. So I would say that there is not all that much difference between RDF and TM, as long as we stick to binary associations. And in RDF, a resource that serves as the hub can act very much as a topic map association that has more than two topics attached. What TM offers, IMHO, is a pattern where certain structure are pre-defined. They happen to be convenient structures (I mean for the type of modeling that we are likely to be interested in with TM), but you can pretty much build them in RDF if you want to. But in RDF, it would be a matter of convention, and so an ordinary RDF processor would not be able to understand and use those patterns to good advantage. This gives TM a lot of power for topic-map-like tasks. But if you want to accumulate a lot of atomic facts, RDF is simpler. If you want to take advantage of OWL's ability to constrain classes, RDF+OWL is (currently) more powerful. If you need to do a lot of complex type-inferencing, OWL processors will give you a lot of power. >If the purposes for your db is to > provide higher level semantics and/or represent knowledge, then I > think you would note some significant differences between the two. > If you want "higher level semantics", then the semantics of RDF - and let's extend it with RDFS and OWL - are much better specified than for topic maps, except for subject identity. Murray Altheim I don't mean to be snooty about it, nor does the world wait for my opinion on this, but I'm *really* tired of RDF and Topic Map comparisons. They are apples and oranges. There've probably been many hundreds of messages on the subject, and most of them compare structures. Okay. So there are some similar structures. But they are designed for such different purposes, operate at different levels, etc. that comparisons really are like between hammers and screwdrivers. Both functional, but not typically interchangable unless you like hammering in screws or poking holes with a screwdriver to push in your nails. Forinstance, the whole idea that OWL is more powerful is again another apples and oranges comparison. I'm currently using my screwdriver to build what somebody else might have done with a hammer, as using Topic Maps as a tool I'm gradually (as a single researcher) building up a toolkit that has at least as much "power" as OWL. Gradually. If I was a team (and sometimes I'm almost as tall as one), I'd probably be more powerful than OWL and able to leap tall buildings too. (The limitations of being one person always bother me, so I try to sleep less...) > If you want "higher level semantics", then the semantics of RDF - and > let's extend it with RDFS and OWL - are much better specified than for > topic maps, except for subject identity. This is simply untrue, insofar as it's entirely possible (i.e., I'm doing it) to specify a set of semantics within Topic Maps for doing precisely the kinds of tasks that RDFS/OWL does. I happen to not be limiting my work to Description Logics, and absent the kind of background in higher mathematics and logic back in university that I now wish I had, I'd probably have some form of modal or second-order categorical logic running right now, something suitable for my tasks. OWL is just one possible schema developed using RDFS on top of RDF. The same kind of thing can be done with Topic Maps -- there's no limitation to it in this sense, and I believe that the inherent subject identity issues that have plagued RDF (which are already solved in TM) give us an edge up. Some might argue that we need a constraint language, but application conventions can certainly do in a pinch. We do need the constraint language in order to share, just as does OWL. Nikita Ogievetsky This is interesting that this discussions are coming back again. At this address [1] I had placed Semantic Web glasses [2] (since 2001, I think) that do exactly this: Transform XML Topic Maps into one of RDF representations. I checked it last week and fixed a few bugs in the RTM translator. (the one based on TMPM4 processing model.) In the paper [2] I used RTM translation to query Topic Maps with RDF Query languages. As TMPM4 is one of RM interpretations of topic maps, it can serve as one of RM based translations of topic maps into RDF. I wonder what people think of it now. The second translator (to QTM schema) is based on Quantum Topic Maps I used it to validate Topic Map against DAML Ontology [4]. [1] http://www.swglasses.com/swglasses/index.aspx [2] http://www.cogx.com/swglasses.html [3] http://www.cogx.com/xtm2rdf/extreme2001/ [4] http://www.cogx.com/kt2002/

# posted by Guy : 11:14 AM

Comments: Post a Comment

Remembering the Future

Saturday, December 20, 2003

Responses to Topic Map/RDF comment 12/18

Links

Archives