<$BlogRSDUrl$>

Tuesday, December 30, 2003

A Generalization Hierarchy of Theories 

From topicmapmail: via Murray Altheim
 ~-----------------------------------------------------------------~
   |                                                                 |
   |                 Tautologies                                     |
   |         ________/  |  \________________                         |
   |        /           |   \               \                        |
   |       /            |    \               \                       |
   |   Antisymmetry     |     \          Symmetry                    |
   |       \            |      \             /                       |
   |        \   Transitivity  Reflexivity   /                        |
   |         \        /    \  /   |        /                         |
   |          \      /      \/    |       /                          |
   |           \    /       /\    |      /                           |
   |            \  /       /  \   |     /                            |
   |         PartialOrdering  Equivalence                            |
   |           /      \                                              |
   |          /        \                                             |
   |       Trees      Lattices                                       |
   |         \          /    \_______________                        |
   |          \        /      |      \       \                       |
   |           \      /       |       \       \                      |
   |      LinearOrdering  Theories  Types  Collections               |
   |        /    \               ___________/    \                   |
   |       /      \             /                 \                  |
   |  Sequences  Numbers     Sets              Mereology             |
   |      \      / \        /   \              /   |   \             |
   |       \    /   \      /     \            /    |    \            |
   |    Integers  Reals  ZF-Sets  VNGB-Sets  /     |     \           |
   |                                        /      |      \          |
   |                                   Discrete  Lumpy  Continuous   |
   |                                                                 |
   ~-----------------------------------------------------------------~
   Figure 2.14: A generalization hierarchy of theories
                from "Knowledge Representation", John Sowa (p.95)


links to this post (0) comments

Workflow, RSS and business documents 

I had been trying to integrate the ideas from some old conversations that I had with Bob Haugen and Bryan Thompson with some new conversation that Bryan and I have been having about the nature of workflow and its relationship to technologies like Topic Maps, RDF, REST and RSS. Bryan has been very interested in thinking of workflow as a conversation, and using RSS as a technology for supporting that kind of interaction. One important insight is the dual nature of messages as both an item and a channel. Bob has a lot of experience with workflow systems and has emphasized workflow as a managed system of business commitments. Key to this approach is an understanding of the role of business documents and business objects and how they are different. At first I was a little resistant to the idea of using RSS to encode workflow. While it allowed a natural way of structuring activities as items, I saw it as an obfuscation of the natural structure of a language for expressing business objects (shared state, contractual commitments etc.). Rereading the old messages and presentations from Bob and Bryan, have helped me sort things out a little. For example: 1. Someone must store the instance of a business object (BO). It may contain links to a highly distributed collection of information, but its identity must be maintained by a coherent source. An RSS item provides a good container for distributing a reference and possibly a historical copy of the BO. 2. An item as a channel is often used to log comments about an item. In a similar way, a BO has a log of business documents that provide a history of its development. Each business document provides a kind of commitment that affects the evolution of a BO. 3. The difference is that while an item is often considered a static statement which is the subject of an appending log of comments, a BO is the current state, or a dynamic aggregation of the effects of a stream of actions/commitment expressed as business documents. A conversation would loose its coherence, if its subject was a moving target, with no record of the actual item for which the conversation comments were responding. The collection of business documents, on the other hand, provides a kind of transaction log that can be replayed to regenerate the state of a business object. 4. This also relates to the subtle issues that surround identity and reification in topic maps and RDF. You need an id for the initial subject of a business conversation, as well as an id for each responding business document. In addition, each business document needs references to: the id of the subject (or stream of subjects) from which it derives is own identity; the id of a snapshot of the conversation to which it is responding, and the id of the current state of the conversation were the aggregation of commitments surrounding the initial business subject can be acquired. The resolution of this sorting process will have to wait till later. See also: Bryan: Workflow Choreography for RestWebServices Bob :Commitment Oriented Orchestration Guy: The New Workflow

links to this post (0) comments

Saturday, December 20, 2003

Responses to Topic Map/RDF comment 12/19 

Dec 19 2003 Lars Marius Garshol | | What TM offers, IMHO, is a pattern where certain structure are | pre-defined. They happen to be convenient structures (I mean for | the type of modeling that we are likely to be interested in with | TM), but you can pretty much build them in RDF if you want to. That's true. However, there are things in topic maps like scope and reification that become very awkward to handle in RDF. You can do it, but there's no pretty way to do it. | But in RDF, it would be a matter of convention, and so an ordinary | RDF processor would not be able to understand and use those patterns | to good advantage. Yep. | This gives TM a lot of power for topic-map-like tasks. But if you | want to accumulate a lot of atomic facts, RDF is simpler. Also true, and I think this is pretty much the essence of the comparison. Topic maps are high-level, RDF is low-level. Compare the Omnigator and BrownSauce for one illustration of that. Lars Marius Garshol | | Consider a service that translates a topic map document, e.g., XTM, | into this internal model and the re-serializes it, e.g., as a | "consistent topic map" using the XTM syntax. I would say that this | is a "topic map processor". So would I. The plan is that ISO 13250-3: Topic Maps -- XML Syntax will pretty much say that, though it will word it differently. In other words: so long as you interpret the syntax correctly, and detect all invalid topic maps, you are a conformant processor. | Perhaps this was one of the points of the topic maps reference model | (or the topic maps processing model) -- that you can re-represent | the XTM syntax as a set of assertions using an ARCx (assertion node, | role node, casting node, player node). I know. Rik Morikaw fix

links to this post (0) comments

Responses to Topic Map/RDF comment 12/18 

Dec 18 2003 Rik Morikaw A basic difference between RDF and TMs is their models i.e. RDF: subject (resource) - predicate (property) - object (value) TM: topic - association - topic So in many ways you are mixing two very different semantic standards; especially when considering capabilities (i.e. topic maps provide a richer semantic environment). If the purposes for your db is to provide higher level semantics and/or represent knowledge, then I think you would note some significant differences between the two. However, at the simpler "object level" (term used by M. Lacher and S. Decker), you would find some similarities, and thus be able to represent RDF triples in TM. (i.e. object level is where identities are assigned to objects, and where binary relationships can be described in graphical terms). Therefore, if your db is of the simpler kind, you could simply implement triples into the relational db as was suggested previously. If you want to model your db with a TM, then your interpretation of the RM might be: Subject as x-node, Predicate as a t-type, Object as x-node Note t-type gives you an assertion type Beyond lower level applications of RDF and TMs however the two can quickly diverge. A telling factor can be seen in their graphical representations RDF represented by directed graphs while TMs are non-directional and have been described using hypergraphs (i.e. greater complexities that must be mathematically modeled). A consideration for your db is who your audience is and the level of sophistication (ie semantics) needed If your db and application is to be interoperable with other external dbs then you might want to see what semantic standard they are using. As of today there seems to be too few ways to accomplish interoperability between RDF and TMs. Jan Algermissen Jason Cupp wrote: > > The RM has no mention of Data Model things like occurrences, scope, names, > variants, etc... only assertions and sub-graphs of the assertion graph that > are targets for reification in the n-ary relationship model: t-topics, > c-topics, r-topics, x-topics, a-topics. The RM operates at a lower level than the SAM. Since occurrences, names, variants etc are essentially all just relationships between subjects, assertions are sufficient to represent all of them. The Standard Application Model (SAM) is then simple the appropriate set of assertions types and properties. The goal of the RM is to enable the definition of such 'sets of semantics' as the SAM (in the current RM version, such a definition is coined as 'Disclosure') > > But if I'm implementing the DM using the new RM, do I have to do the > following things???? In general, the RM doe in no way constrain an application, the only thing you have to do is to document (and thus make explixit) how your application maps to the RM. For example, the a-sidp property might well be hidden in a certain method on objects of a certain class. > > 1) reify every name and occurrence resourceData, they become x-topics in > assertions for relating , , and occurrence>. But reifying literal strings gives me a new reification > operator "subjectLiteral", along with subjectLocator, subjectIndicator? First let me say that it is really great that you understood (or are very close to understandig) the essence of the RM - believe me, I know that it takes some effort to make the neccessary mind-shift. To answer your question: (Note that 'reification' has a different meaning in the SAM world than in the RM world. In the RM understanding, 'to reify' means to provide a surrogate (topic (RM-sense)) for a subject. So the question if you want to reify certain subjects is a matter of your needs: if you want to make statements about certain subjects, you need to reify them otherwise not. There are a lot of implications behind such a decision, especially concerning the power the Application (the Disclosure) you define will have. If you want, I'll go into details on this - please let me know. A remark: In my personal opinion the current SAM draft really falls short of the power it could have, because certain subjects are NOT reified (RM-sense!) which in turn limits the degree of, hmmm, 'information interconnection'. OTH, one of the purposes of the RM is to provide a means for investigating and discussion these decisions and their implications, so people have a sound basis to judge the quality of an Application ('Disclosure'). Bottom line: you can pretty much do what you like (you can also make everything (occurrences,names,etc.) a property) but you need to document what you do, thus maximizing the interchangeability of your topic maps. > > 2) does DM scope then become the total of all the other assertions (those > not included in the DM) made about these other assertions: a) , > b) , c) , d) ...) Here are two completely different, yet possible ways to do scope in terms of the RM: 1) Scope as a property, consisting of a set of topics - define a property in your Application/Disclosure that has a value type 'set of topics'. Let's call this property 'ScopingTopics'. It is an OP (other property) because it does not serve as a basis for merging. - in your systax processing model, define how, for example, XTM's element and children are to be processed and that the set of topics that make up the scope are to be assigned as the value to the 'ScopingTopics' property of topics that surrogate relationships (). Think of the result as t_a --ScopingTopics--> {t1,t2,t3} Major advantage of this approach: ease of implementation Major disadvantage: The scoping set is not reified, so you do not get the possibility to directly see all scoped relationships from a given scope (set of topics). This can only be added by a (proprietary!) indexing mechanism of a certain software. 2) Scope as an assertion between relationships and a subject that is a set of topics - define an assertion type 'at-association-scope' and two roles 'role-association' and 'role-scope' - define a property 'SetMembers' that provides the identity for topics that surrogate sets of topics. It is an SIDP, because topics with the same value for 'SetMembers' must merge (topics that represent the same set represent the same subject and therefore must merge). - have your processing model define the correct way to construct the 'SetMembers' property and the required assertion(s). Your result will look like this:
            at-association-scope
                    |
 role-association   |   role-scope
             |      |      |
             |      |      |
    t_a------C------A------C------x_s[SetMembers={t1,t2,t3}]

(where t_a is (as above) the topic that represents a given relationship and x_s is the topic that represents the set of topics that is the scope). Major advantage: all other relationships that x_s is a scope of will be connected to x_s in the same way, so you can directly see from x_s all the relationships it scopes. (e.g. to anser the question: "What are all occurrences in scpe {t1,t2,t3}). With this approach, this information is directly available, no propriatry indexing mechanisms are needed (meaning that no matter what SAM-conformant sofware you use, you'll allways have that information, likely also in the API then with some scope.getAllScopedAssociations() method. Major disadvantage: Topic Map (in store) increases in size But again, this is a matter of taste and desired power of your Application. > Just curious... jason You are welcome! Do you plan to implement something RM-based? I'd be happy to give you any information you need. Thompson, Bryan Guy and I have discussed this also. I think of this as de-normalizing the RDBMS schema for the reified triple store. Or perhaps re-normalizing it. In any case, it seems to me that this transform (of multiple rows of reified triples / ARCs into a single row of a schema specific table) can only be a post-merge operation. Prior to merging, it seems that the data needs to stay in the "normalized" form of the reified triple store model so that you can correctly recognize the various merging pre-conditions without undue overhead and complexity. Also, once you transform the reified triples into the schema specific table, you sort of lose the subject of the individual assertions that are being flatted into a single row of that schema specific table. I am curious if this matches with your experience. If so, then I can see how this transform can provide very fast query mechansims whenever there is some schema (and I am reading "schema" loosely here as some constraint on patterns of reified triples, so association templates and RDF Schema would both quality). However it will not help scaling for pre-merge operations. Unless you assume that all of your data is governed by some set of schemas that is known in advance? Guy Lukes I think that this is the key to building practical applications. Unless the data is being controlled by some kind of structural constraints, you are limited to processing the equivalent of "reified triples". By grinding everything down to "amino acids" to use a biological metaphor, it seems like it could be a lot of work to build everything back up to an appropriate output structure "protein". In a relational database this is usually something equivalent to numerous self-joins. The other option is to use the equivalent of "schema specific tables and indexes" to cache higher order structure. These structural commitments, however, limit your flexability, and can be difficult to keep in synch. I am also interested to hearing what practical experiences people have had in dealing with this problem. Thomas B. Passin > So in many ways you are mixing two very different semantic standards; > especially when considering capabilities (i.e. topic maps provide a > richer semantic environment). Wait, not so fast here! Let us limit ourselves, temporarily,to binary associations, so as to try to make the closest possible comparison to RDF, and see wherein they are all that different semantically. RDF: resource - represents an element of the world of discourse (WoD). TM: topic - represents (in a computer) an element of the WoD RDF: predicate - relates two elements of the WoD. TM: association (binary)- relates two elements of the WoD. RDF: Explicit roles: subject, object. TM: Explicit roles: whatever you decide. RDF: implicit roles - build into the definition of the predicates. TM: implicit roles - not normally used, but could use a generic role type. TM: subject identity - possible to define fairly well within TM. RDF: subject identity - not always easy to define within RDF. TM: scope - native concept (but not clearly and unambiguously defined) RDF: scope - not a native concept. TM: statement identity - can be established by giving the association an id. RDF: No such notion. RDF "reification" is not really equivalent. To the extent that an occurrence can be considered to be a convention for an association, occurrences are covered by the above. So I would say that there is not all that much difference between RDF and TM, as long as we stick to binary associations. And in RDF, a resource that serves as the hub can act very much as a topic map association that has more than two topics attached. What TM offers, IMHO, is a pattern where certain structure are pre-defined. They happen to be convenient structures (I mean for the type of modeling that we are likely to be interested in with TM), but you can pretty much build them in RDF if you want to. But in RDF, it would be a matter of convention, and so an ordinary RDF processor would not be able to understand and use those patterns to good advantage. This gives TM a lot of power for topic-map-like tasks. But if you want to accumulate a lot of atomic facts, RDF is simpler. If you want to take advantage of OWL's ability to constrain classes, RDF+OWL is (currently) more powerful. If you need to do a lot of complex type-inferencing, OWL processors will give you a lot of power. >If the purposes for your db is to > provide higher level semantics and/or represent knowledge, then I > think you would note some significant differences between the two. > If you want "higher level semantics", then the semantics of RDF - and let's extend it with RDFS and OWL - are much better specified than for topic maps, except for subject identity. Murray Altheim I don't mean to be snooty about it, nor does the world wait for my opinion on this, but I'm *really* tired of RDF and Topic Map comparisons. They are apples and oranges. There've probably been many hundreds of messages on the subject, and most of them compare structures. Okay. So there are some similar structures. But they are designed for such different purposes, operate at different levels, etc. that comparisons really are like between hammers and screwdrivers. Both functional, but not typically interchangable unless you like hammering in screws or poking holes with a screwdriver to push in your nails. Forinstance, the whole idea that OWL is more powerful is again another apples and oranges comparison. I'm currently using my screwdriver to build what somebody else might have done with a hammer, as using Topic Maps as a tool I'm gradually (as a single researcher) building up a toolkit that has at least as much "power" as OWL. Gradually. If I was a team (and sometimes I'm almost as tall as one), I'd probably be more powerful than OWL and able to leap tall buildings too. (The limitations of being one person always bother me, so I try to sleep less...) > If you want "higher level semantics", then the semantics of RDF - and > let's extend it with RDFS and OWL - are much better specified than for > topic maps, except for subject identity. This is simply untrue, insofar as it's entirely possible (i.e., I'm doing it) to specify a set of semantics within Topic Maps for doing precisely the kinds of tasks that RDFS/OWL does. I happen to not be limiting my work to Description Logics, and absent the kind of background in higher mathematics and logic back in university that I now wish I had, I'd probably have some form of modal or second-order categorical logic running right now, something suitable for my tasks. OWL is just one possible schema developed using RDFS on top of RDF. The same kind of thing can be done with Topic Maps -- there's no limitation to it in this sense, and I believe that the inherent subject identity issues that have plagued RDF (which are already solved in TM) give us an edge up. Some might argue that we need a constraint language, but application conventions can certainly do in a pinch. We do need the constraint language in order to share, just as does OWL. Nikita Ogievetsky This is interesting that this discussions are coming back again. At this address [1] I had placed Semantic Web glasses [2] (since 2001, I think) that do exactly this: Transform XML Topic Maps into one of RDF representations. I checked it last week and fixed a few bugs in the RTM translator. (the one based on TMPM4 processing model.) In the paper [2] I used RTM translation to query Topic Maps with RDF Query languages. As TMPM4 is one of RM interpretations of topic maps, it can serve as one of RM based translations of topic maps into RDF. I wonder what people think of it now. The second translator (to QTM schema) is based on Quantum Topic Maps I used it to validate Topic Map against DAML Ontology [4]. [1] http://www.swglasses.com/swglasses/index.aspx [2] http://www.cogx.com/swglasses.html [3] http://www.cogx.com/xtm2rdf/extreme2001/ [4] http://www.cogx.com/kt2002/

links to this post (0) comments

Responses to Topic Map/RDF comment 12/15-12/18 

15 Dec 2003 Guy Lukes To start with, I must say that I am not a topic map expert. But in my resent attempt (with Bryan Thompson) to implement a topic map in a relational database, I was shocked to discover that the underlying database structure was only a slight modifications of RDF triples. The Subject is the a-node The predicate is the r-node The object is the x-node All that was missing was a c-node to reify the triple and a set of PSIs to implement subject roles (predicates/r-nodes/roles) to support topicmap merging behavior. This gives you the simplicity of the semantics free RDF triples, with the power of topicmap subjects and merging (if you need it), plus the ability to leverage all work that is going on in the RDF community. Is there something I am missing, that is going to cause be problems down the road? Jan Algermissen It is true that a topic map graph a la Reference Model can be represented using RDF triples but that does not take you very far, because your RDF triples would only represent the particular graph structure an carry no information about the original relationship you are actually representing. You'd still have to put that information somewhere, it is not inside the RDF! If you just need a graph representation technique to store the assertion structures, I suggest you use relational tables and not RDF (which is overly complex for this sort of thing). Here is what I mean by 'relational table': Suppose you have the relationship "Jack is father of Jim", which gives you the following assertion:

                    T
                    |
             R1     |      R2
             |      |      |
x1[jack]-----C1-----A1-----C2-----x2[jim]

T: assertion type father-son
R1:          role father
R2:          role son
Mathematically, an assertion can be understood as an ordered tuple of the form (T,A1,R1,C1,x1,R2,C2,x2,...Rn,Cn,xn) (Note that it is the order that preserves the structure) Here is where relational tables are handy (and extremely efficient for storage) because you can 'translate that tuple into a row in a table representing all assertions of type T: Assertion type T: A | R1-casting |R1-player | R2-casting | R2-player | ---+------------+----------+------------+-----------+ A1 | C1 | x1 | C2 | x2 | <--- this is the above assertion Patrick Durusau Both RDF and topic maps have a substantial amount of work devoted to them and I don't see them as even competing technologies. They use different underlying conceptions of how to organize information and as a result work best in particular and often differing domains. Not trying to start an RDF vs. topic map flame war so note I will not be responding to any posts of that sort. Sole purpose was to point out the the two are distinct and nothing more. Which one you choose depends upon your domain, which is more familiar, etc. Jack Park Patrick, They may be distinct, but that doesn't mean you cannot have both. It is possible to build a triple-store database and a set of mappers such that topic maps can be persisted that way. If the same database is serving other applications, you avail the opportunity for discovery directly from the database itself. Lars Marius Garshol Well, what you describe here does not have any obvious resemblance to topic maps as they are described in ISO 13250:2000 or ISO 13250:2003, nor to anything that is currently scheduled to go into the next edition of ISO 13250. In short, it's not clear that what you describe here could be called topic maps. It certainly does not follow any published standard by that name. Guy Lukes Lars, I am afraid I have misrepresented my goals. I am not trying to " implement a topic map in a relational database". What I am trying to do is develop a data store, for which a variety of input processors can dump data. These inputs could be in RDF, XTM, or more typically some XML language like RSS. The idea is not to directly support a topicmap application, but to provide a data store that "marks all the distinction boundaries for incoming data" and allows merging based on identity. The goal of these input processors is to provide a common repository for heterogeneous data that preserve the identity based information within the input. Output processors could then be written that could cast/collapse this data into a (typically XML based) output format (based on the distinctions that make a difference) and that supports a variety of semantics unanticipated by the input source. These output formats could be XTM, RDF, merged RSS Feeds etc. and involve a great deal of filtering and aggregation. A round trip from XTM to data to XTM would probably not preserve the original syntatic structure (although a system like that is certainly possible). See XML vs RDF NxM vs N+M http://bitsko.slc.ut.us/blog/xml-vs-rdf.html Thomas B. Passin Consider this thought experiment. Take any RDF triple, and replace it with a topic map association with exactly two role-players, where there is one role called "subject" and one role called "object". This, I suggest, is essentially equivalent to the triple. Notice that the roles are implicit in the RDF version, so they cannot be talked about within RDF itself. The topics that would play the corresponding roles as the subject and object could be given an id equal to their RDF URI. The subject indicator - well, who knows? Thompson, Bryan I think that the granularity of Guy's comment was directed at a normalized RDBMS schema that encodes the assertions found in a topic map document. Of course the behaviors (semantics) of those topic map assertions still need to be implemented for a processor to be a "topic map processor". Consider a service that translates a topic map document, e.g., XTM, into this internal model and the re-serializes it, e.g., as a "consistent topic map" using the XTM syntax. I would say that this is a "topic map processor". Guy made the point that the internal model for the processor could be a set of reified triples. Perhaps this was one of the points of the topic maps reference model (or the topic maps processing model) -- that you can re-represent the XTM syntax as a set of assertions using an ARCx (assertion node, role node, casting node, player node). In this context you can see how the ARCx model can be readily aligned with the RDF model as a reified triple, which is what Guy was suggesting below.

links to this post (0) comments

Responses to Topic Map/RDF comment 12/1-12/15 

Warning: many of these comments need to be taken within the context of the original messages (see links). Jan Algermissen [added from another thread for context] "Occurrences are essentially a specialized kind of association,..." (5.7) "Essentially, a base name is a specialized kind of occurrence,..." (5.5) Then you say it is ok and common practice to use occurrences to represent properties. In addition there are specialized properties in the model (subjectAddress, SubjectIndicators, SourceLoactors) that are *NOT* represented as occurrences. Why are all the specialisations in place? Why not use associations for everything since all the other stuff is *essentially* just an association? I see absolutely no reason for all the specialized items. Can you tell me (and hopefully convince me) why they are there? . . . > > So, why is the model more complex than it needs to be? > > > > Because if you don't you end up with RDF. Sorry, thats a flippant > answer. Please keep RDF out of here....it is really so different and has nothing to do with what I am talking about. . . . Well, you can simply get the core semantics by a core set of association types and simplyfy the model by throwing out occurrence and basename. IOW, if this can be done for class-instance and superclass-subclass (both are part of the core semantics, yes?) why for those and not for occurrence and basename? It would only make the model simpler. Isn't that a reasonable goal for a standard? Mourad OUZIRI Hello world, What are the advantages of the Topic Maps compared to RDF, RDF Schema and DAML in term of representing data/resource semantic ? Thank you in advance Thomas Schwotzer Try this: http://www.ontopia.net/topicmaps/materials/tmrdf.html Jack Park This, because I think there is a bit of tribal behavior going on and I also think that, sometimes, the two tribes (topic maps/RDF) don't really understand each other. First, a snippet of background as a means of revealing my own bias that both tribes belong on this planet and both need to learn to work with each other. I come to this point of view by means of my own work which has evolved to the suspiciion that I can perform such miraculous things as walking on water if I build a persistence mechanism which uses triples as the underlying abstract engine, and build what I call "cassettes" (read: cartridges, if you're an Oracle jockey) which perform the mappings between the many graph dialects which now or will exist 'out there' in knowledge manipulation land. * Lars Marius Garshol | | If you replace RM with TMCL here you've got it. RM itself is much | closer to what you said about RDF: RM "absent an [application] is just | a graph language, like GXL. It has almost no built-in semantics." It | certainly isn't a schema language. * Murray Altheim | | ...every possible opportunity to minimize the importance of the RM | it seems... why not give *that* a rest? Because I stand by what I said. The RM is not a schema language, and I think its creators would be the first to admit that it is not. I think they would also approve the rest of what I've said, given that they've by their own admission gone to great lengths to keep the ontological commitments of the RM to a minimum.

links to this post (0) comments

XML search: We Don’t Know the Answer 

via Tim Bray To return to maybe the most interesting question: given that XML text contains a rich, nested, sequenced, labeled structure, how do we use that to add value to search? . . . The following example finds the set of all paragraphs in the introduction that contain one or more footnotes and also one or more cross references.
set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set2 including Set('xref')

The following example finds the set of paragraphs in the introduction that contain either a cross reference or a footnote but not both.

set1 = Set('Paragraph') within Set('Introduction')
set2 = set1 including Set('footnote')
set3 = set1 including Set('xref')
set4 = (set2 + set3) - (set2 ^ set3)

Of course, you can combine element sets with search:

set1 = Set('Title', contains="introduction")
set2 = Set('Title', attribute=("xml:lang", "en-US"))

I think that this approach is important to think about because, unlike some of the other approaches below, it has been commercially implemented and has been proven to work just fine and be very useful for searching XML.

. . . In case it’s not obvious, we haven’t figured out what the right way to search XML is. It’s worse than that, here’s a list of the things that we don’t know:


links to this post (0) comments

Wednesday, December 17, 2003

The dangers of monocultural programmers 

via Joel on Software . . . What are the cultural differences between Unix and Windows programmers? There are many details and subtleties, but for the most part it comes down to one thing: Unix culture values code which is useful to other programmers, while Windows culture values code which is useful to non-programmers. This is, of course, a major simplification, but really, that's the big difference: are we programming for programmers or end users? Everything else is commentary. . . . Suppose you take a Unix programmer and a Windows programmer and give them each the task of creating the same end-user application. The Unix programmer will create a command-line or text-driven core and occasionally, as an afterthought, build a GUI which drives that core. This way the main operations of the application will be available to other programmers who can invoke the program on the command line and read the results as text. The Windows programmer will tend to start with a GUI, and occasionally, as an afterthought, add a scripting language which can automate the operation of the GUI interface. . . . There are too many monocultural programmers who, like the typical American kid who never left St. Paul, Minnesota, can't quite tell the difference between a cultural value and a core human value. I've encountered too many Unix programmers who sneer at Windows programming, thinking that Windows is heathen and stupid. Raymond all too frequently falls into the trap of disparaging the values of other cultures without considering where they came from. . . . The very fact that the Unix world is so full of self-righteous cultural superiority. . . stems from a culture that feels itself under siege, unable to break out of the server closet and hobbyist market and onto the mainstream desktop. This haughtiness-from-a-position-of-weakness is the biggest flaw of The Art of UNIX Programming, but it's not really a big flaw: on the whole, the book is so full of incredibly interesting insight into so many aspects of programming that I'm willing to hold my nose during the rare smelly ideological rants because there's so much to learn about universal ideals from the rest of the book. Indeed I would recommend this book to developers of any culture in any platform with any goals, because so many of the values which it trumpets are universal. See also The dangers of Gender Monoculture Quantitative and Qualitative Change

links to this post (0) comments

Tuesday, December 16, 2003

Who Groks RSS for Corporate Info? Nokia  

via Dan Gillmor Wow. If you're into RSS and wonder how a company can use it to empower a wider community, check out Nokia Content Syndication Program. From the FAQ:
Q: Why is Nokia allowing other web sites to publish Nokia content?
A: Forum Nokia will never be the only resource developers use when creating mobile applications and mobile content, and there's no reason it should be. Instead, Nokia wants to help other developer communities by providing simple, free access to quality Nokia content like toolkits, documents, images, and videos.

links to this post (0) comments

The dangers of distributed objects and relational mapping  

via Artima.com - Innapropriate Abstractions In this sixth installment, Hejlsberg and other members of the C# team discuss the trouble with distributed systems infrastructures that attempt to make the network transparent, and object-relational mappings that attempt to make the database invisible.

links to this post (0) comments

A Short Course in IBIS Methodology 

A touchstone white paper . . . IBIS (pronounced "eye-bis") stands for Issue-Based Information System, and was developed by Horst Rittel and colleagues during the early 1970's. IBIS was developed to provide a simple yet formal structure for the discussion and exploration of "wicked" problems. . . . How to Tell if a Problem is Wicked A common concern for users of IBIS and QuestMap is knowing when to take the trouble to use them. The short answer is: Use IBIS and QuestMap when the problem is wicked! The list below will help you get a better idea if you are working on a wicked problem: See also IBIS Vocabulary The Cognitive Web

links to this post (0) comments

XML vs. RDF :: N x M vs. N + M 

via Ken MacLeod Yet, in the "just XML" world there is no one that I know of working on a "layer" that lets applications access a variety of XML formats (schemas) and treat similar or even logically equivalent elements or structures as if they were the same. This means each XML application developer has to do all of the work of integrating each XML format (schema): N xM. Forget the AI strawman, forget even the RDF model and format for a moment, and tell me that's not a problem -- today or foreseeable. The RDF model along with the logic and equivalency languages, like OWL (nee DAML+OIL), altogether referred to as "the Semantic Web", is the current W3C effort to address that problem. Factoring those equivalencies into a common layer allows application developers to work with an already-integrated model, and the libraries to do the work of mapping each schema to the integrated model using a shared schema definition: N + M One can take potshots at RDF for how it addresses the problem, and the Semantic Web for possibly reaching too far too quickly in making logical assertions based on relations modeled in RDF, but to dismiss it out of hand or resort to strawmen to attack it all while not recognizing the problem it addresses or offering an alternative solution simply tells me they don't see the problem, and therefore have no credibility in knocking RDF or the Semantic Web for trying to solve it.

links to this post (0) comments

The boundary between standalone and server-based applications 

via William Grosso Sunday Afternoon Thoughts on the Design of RSS Aggregators . . . what's the boundary line between "standalone application" and "server-based" application. That is, when should an application live entirely on an end-user's machine, and when should it live on a server and be accessed through a client program (this distinction gets hazier in the case of RSS Aggregators, which are, in a loose sense, web-clients anyway). See also Thick or Thin -- That is the Question?

links to this post (0) comments

Rethinking Model-View-Controller 

via Bryan Thompson Beyond MVC: A New Look at the Servlet Infrastructure This article is the first of two that examine in depth the origins of the Model-View-Controller (MVC) design pattern and its misapplication to servlet framework architectures. The purpose of this first piece is threefold. First, it attempts to provide an accurate description of the problems brought about by MVC in servlet middleware. Second, it suggests techniques and strategies for coming up with a new design, one better suited to the needs of servlet infrastructure developers. Third, it offers an example of a completely new, nonderivative pattern we might use moving forward. The second article backs up my assertions by introducing and exploring a reference implementation of the new design. Ultimately, my goal with these two articles is to convince the servlet middleware community once and for all to put the dark days of MVC behind us and to lay the groundwork for completely new, nonderivative servlet middleware architectures that better address our common needs. Please bear with me; I wouldn't write a piece about MVC in the servlet tier unless I believed it was a funeral dirge.

links to this post (0) comments

Sunday, December 14, 2003

Topicmap and RDF unification 

It seem that there is a growing concensus that RDF and Topicmaps, are at some abstract level, the same thing. Each has started from different disciplines and perspectives, and each has developed and emphasised different concerns. I have posted a message on this to the topicmap mailing list and have generated some heat, I will comment more when things settle down. [topicmapmail] TM and RDF/S Guy

links to this post (0) comments

Python running fast on .NET  

via Miguel de Icaza Jim Hugunin (the creator of Jython) reports that his early .NET compiler for Python is performing very well, in contrast with ActiveState's previous attempt, in some cases better than CPython. It always bothered meb that ActiveState discontinued work, and came to a conclussion on .NET performance rather quickly, with JScript and VB.NET being proofs that better speeds could be achieved.

links to this post (0) comments

The dangers of Gender Monoculture 

via Jon Udell So, for the record, I think this gender issue we all keep tiptoeing around is quite real, and affects technological choices and strategies far more deeply than many of us XY types would dare imagine. See also: Quantitative and Qualitative Change

links to this post (0) comments

Friday, December 12, 2003

Automatic proxies considered dangerous 

via Adam Bosworth's Weblog SOA relies on people understanding that what matters is the wire format they receive/return, not their code. Automatic proxies in this sense are dangerous and honestly if someone can show me that REST could eliminate the hideous complexity of WSDL and the horror that is SOAP section 5, that alone might make me a convert. Interestingly, if you do believe that programmers should understand the wire format because it is the contract, then one could argue that ASP.NET's use of Forms and the enormous amount of HTML it hides from programmers is a big mistake. But I'm already deep into one digression so I'll leave that thought for others. In either case, there is a basic point here that really resonates with me. If there is always a URL to a resource, then they can much more easily be aggregated, reused, assembled, displayed, and managed by others because they are so easy to share.

links to this post (0) comments

Open source and the government 

via Jon Udell

Jonathan Bollers, vice president and chief engineer at Science Applications International Corp. (SAIC), says that SAIC forks open source projects for in-house development "almost without exception." The problem is that although there is often a desire to give back, it's "a tedious process fraught with more heartache than benefits." The bureaucratic hurdles include security considerations, export controls, and a host of other issues that Bollers sums up as "releasability remediation."

He asks a crucial question and proposes an intriguing answer:

"So how can the defense community give back, apart from well-shrouded blogging, discussion and Usenet postings, and seeding academic research? How can we be viewed as good open source citizens by releasing remediated code?

"In many cases, defense-related development could be a boon to those entrepreneurs engaged in IT commercialization. Modular and well-structured code, properly tagged for defense-sensitive algorithms, functions, identifiers, and key words could be effectively re-released as open source.

"The government might consider incentives for those in the IT contracting business, perhaps in the form of additional fees, credits, or preferential rating for future procurements, to remediate via software engineering for reuse processes that specifically target release to open source.

"These two-way-street incentives would not only revitalize thoughts and ideas in the open source community, they may also be the harbinger for the next wave of technical innovation in the post 'new-economy' economy."

Excellent idea! I'd love to see some of my tax dollars directed toward that end.


links to this post (0) comments

Thursday, December 11, 2003

Web Services vs REST Debate 

Nothing in REST requires or precludes an equivalent to WSDL. However, nothing in WSDL today models the interaction that is typical of WSDL itself: "at this address is a document of a given type. If you GET it, you will find a strongly typed description of another resource (endpoint)". Anybody with experience in database design will tell you that it makes sense to focus a lot of energy on optimizing reads. The web does this well. Other principles that apply: all persistent resources have identifiers. No implicit state. Posted by Sam Ruby It's interesting to note that REST instances are virtually all data-driven. The battle for the future of web services, and scaling thereof, is based on the long debated issue of data-driven systems vs. behavior-driven systems. Posted by Ken MacLeod Yes, REST is an architecture, not a particular tool, but the fact is that the current crop of tools for HTTP (and thereby, REST, at least for Internet apps) suck, and actively encourage people to write Web applications that don't leverage the Web architecture. . . . P.S. - I do agree that it's behaviour vs.data; it's still difficult for Web services people to think in terms of representations of state. Posted by Mark Nottingham Hang on, I'm a REST proponent and a SOAP proponent, so be careful when pigeon-holing. As long as you don't put methods in the SOAP body - which I don't do - I'm quite happy. What's that you say, that's document style SOAP? Well, goodie. Now assign a URI to the thing whose state is represented by that document and we're done; instant Web. . . . Yes, I'm a big SOAP fan. Just not a Web services fan. . . . And here's something to knock your socks off; when used propertly, it actually makes HTTP more RESTful (specifcally, more self-descriptive)! Yes, you heard me right. Ponder this; if you "transport a purchase order" (the paper kind) to a paper shredding service, are you asking them to shred the order, or asking them to fulfill it? Posted by Mark Baker The reason HTTP is involved in REST is because I had to shrink and redesign HTTP/1.0 to match those features that were actually interoperable in 1994, which turned out to be the core of the REST model (it was called the HTTP object model at the time) and that was carried forward into designing the extensions for HTTP/1.1. Thus, the two are only intertwined to the extent that REST is based on the parts of HTTP that worked best. . . . Why is there no WSDL for REST? Because HTML is the WSDL for REST, often supplemented by less understandable choreography mechanisms (e.g., javascript). That usually doesn't sit well with most "real" application designers, but the fact of the matter is that this combination is just as powerful (albeit more ugly) as any other language for informing clients how to interact with services. We could obviously come up with better representation languages (e.g., XML) and better client-side behavior definition languages, but most such efforts were killed by the Java PR machine. Besides, the best services are those for which interaction is an obvious process of getting from the application state you are in to the state where you want to be, and that can be accomplished simply by defining decent data types for the representations. I don't buy the argument that programmers benefit from a Web Services toolkit. Such things do not build applications -- at most they automate the production of security holes. Getting two components to communicate is a trivial process that can be accomplished using any number of toolkits (including the libwww ones). The difficult part is deciding what to communicate, when to communicate it, in what granularity, and how to deal with partial failure conditions when they occur. These are fundamental problems of data transfer and application state. . . . For the same reason, it would be foolish to use REST as the design for implementing a system consisting primarily of control-based messages. Those systems deserve an architectural style that optimizes for small messages with many interactions. Architectures that try to be all things to all applications don't do anything well. . . . it is less complex to describe data than it is to describe all of the application-dependent control interfaces that might be applied to data. It is a design trade-off of simplicity+evolvability versus specificity+efficiency. I claim that the former is better for multi-organizational systems like the Internet, but feel free to disagree. . . . Again, the key thing to understand is that the value of the Web as a whole increases each time a new resource is made available to it. The systems that make use of that network effect gain in value as the Web gains in value, leaving behind those systems that remain isolated in a closed application world. Therefore, an architectural style that places emphasis on the identification and creation of resources, rather than invisible session state, is more appropriate for the Web. Posted by: Roy T. Fielding

links to this post (0) comments

Wednesday, December 10, 2003

WinFS Synchronization  

via via Marc's Voice

I can't possibly blog every session or highlight all the stuff I'm learning here at the PDC - but needless to say my head is spinning.

These folks have come up with their own 'triples' - their own subject-predicate-object sceanrio. It's part of their WinFS file system (formerly known as Cairo.)  They have no idea that this is identical to rdf's approach - and they NEVER mention the S word (Semantic) - but they DO grok that having RELATIONSHIPS baked into the file system - is a good thing.  They also have synchronizatioon and compositing baked in - as well.

Synchronization adaptors is the method they provide for allowing US to create our own sync sceanrios.  All of these technologies I've been talking about - are shown in real live demo - usually by typing in less than 20 lines of XAML or C# code into Visual Studio.  Right in front of our eyes: IM is added (one line), contacts are included (5 lines), documents are synched (10 lines) or even UI's animated (4 lines.)

So this is VERY powerful stuff, being committed to by the world's largest software company - and whether we like it or not - it's at least PART of our future.  WinFS id much more thsan just an o-o file system. It has Documents(files, media, etc.) - Messages (email, IM, fax, confercning) - Contacts (people, orgs, groups and households) ALL as first order objects.

In others words - they're built into the system - at all levels.  EVERY app and service can access shared documents, messages or contacts. Any of these items can have a relationships to another and kept in sync. It's heavy!

See also example of WinFS Here is an example relationship type I could use to create a graph of Foo items:
<RelationshipType Name=“FooToFoo“ BaseType=“System.Storage.Relationship“>
    <Source Name=“SourceFoo“ Type=“Foo“/>
    <Target Name=“TargetFoo“ Type=“Foo“/>
    <Property Name=“X“ Type=“System.Storage.WinFSTypes.Int32“/>
</Relationship>
If I have two Foo items, I can relate them as follows: ItemContext ic = ItemContext.Open(); Foo f1 = ic.FindItemById( id1 ); Foo f2 = ic.FindItemById( id2 ); f1.OutRelationships.Add( new FooToFoo( f2 ) ); ic.SaveChanges(); So, now to answer the questions. WinFS comes with an item type “Folder” and relationship type “FolderMember”. The FolderMember relationship requires that the source item type be Folder but allows the target item type. So, Robert, because items are put in folders using relationships and an item can be targeted by more then one relationship, you can put an item in more then one folder. And Shawn, in WinFS foldering isn't really obsolete, just expanded to encompass a much more powerful concept: relationships.

links to this post (0) comments

XAML, Flash, Royale and Laszlo 

via Marc's Voice

A couple of points about this post:

1) Laszlo is an XML technology - while Royale is Flash technology.

2) Jon's right about XAML - it goes around CSS and HTML - like only Microsoft can.  And it certainly IS in the 2006 timeframe.

But Laszlo is ready to go NOW.  They're about to head into their v 2.0, whilke Royale is not even in beta yet.

Laszlo is VERY much like Avalon, enabling developers to build entire UIs - with barely any scriupted code.  But the KEY difference between Laszlo and Royale - is that while Laszlo spits of .swf today, it will be able to spit out Avalon tomorrow.

So smart developers to build on top of Laszlo today, utilize the installed base of Flash but THEN be able to easily migrate their code to Avalon later.  While Rpoyale users will be locked into .swf.


links to this post (0) comments

Visualisation 

URLs from Miles Thompson http://www.ads.tuwien.ac.at/AGD/MANUAL/MANUAL.html http://www.caida.org/tools/visualization/walrus/ http://www.touchgraph.com/ http://zvtm.sourceforge.net/ http://www.w3.org/2001/11/IsaViz/ http://www.cybergeography.org/ http://graphics.stanford.edu/~munzner/ http://silpion.dyndns.org/TM3D/ http://networkviz.sourceforge.net

links to this post (0) comments

Friday, December 05, 2003

Collaborative Editing  

Collaborative Editing with Rendezvous

links to this post (0) comments

Thursday, December 04, 2003

Microsoft needs better process support for development 

Microsoft is missing the SDLC boat....big time....
. . . So, they've got the development environment and the platform down pretty well. They are missing the boat BIGTIME though when it comes to rounding out the rest of the development lifecycle. Here are my beefs: . . . only 3 or 4 of the server products have managed APIs and those are definitely 1st gen APIs. . . . Source Control - Visual Sourcesafe 6.0xyz. Yeah, 'xyz' is the next super minor build that will be available next month. Ok, kidding aside, Sourcesafe needs to be seriously overhauled and/or replaced. I'll bet anyone $100 dollars that it is being replaced but it is WAY too late. . . . Build Process - Holy cow, Batman! Microsoft only has one answer for the build process and it was created by a consultant from a third party?! . . . Microsoft, PLEASE share some of your expertise in testing with us. We can all write better software and have better tools if you would open up your bag of goodies for everyone to use. . . . DB Source Control/Versioning Tools. This is incredibly painful to do correctly. MSFT needs to step up here and provide a source control solution for sql server . . . Incident/Request tracking . . . Packaging(is this part of automated build??) . . . Patch deployment

links to this post (0) comments

Wednesday, December 03, 2003

Defensive Programming vs Design by Contract 

From: DonXML Demsak's All Things Techie UML for .NET Developers Model Constraints & The Object Constraint Language
What is the Object Constraint Language? A language for expressing necessary extra information about a model A precise and unambiguous language that can be read and understood by developers and customers A language that is purely declarative – ie, it has no side-effects (in other words it describes what rather than how) . . . Defensive programming is the practice of making the supplier of an operation check the pre-conditions before executing the body of the method, and throwing an exception if a pre-condition is broken. Notice that in this scheme, the constraints are reversed (they are the NOT of the OCL constraints, since we are interested in when they’re NOT satisfied. For this reason – and others - I’m not too keen on the defensive approach…

links to this post (0) comments

Creator of first commercial Java Servlet engine starts XAML project 

From Jeremy Allaire

A former colleague/collaborator, Paul Colton, has just released a new product called Xamlon, which provides a simple XAML implementation on top of .NET 1.1 and the Windows Forms framework.  For those not familiar with Paul, he founded LiveSoftware, and created JRun, the first commercial Java Servlet engine -- he and his team went on to invent what became JSP and JSP Tag Libraries (what they called Dynamic Taglets), and CF_Anywhere, a CFML processor on top of their Java Tag framework.  Paul left Allaire after we acquired LiveSoftware, and has been playing around with a lot of ideas, but this one seems pretty cool! 

XAML is the new XML-based user interface programming language that will be part of the Windows Longhorn release in 2006.  Paul clearly liked XAML and thought that developers would be interested in developing with it (albeit much smaller/simpler in scale and richness) today.  This will be an interesting project to track.

XAML Application Development for Today s .NET Platforms
XAML is a very powerful system, and Longhorn, Microsoft's next generation Windows, is one of the most exciting things to ever come from Microsoft. XAML has one "catch"—it is currently only part of Avalon, which is the presentation layer of Longhorn. Many capabilities of XAML are tied directly to Longhorn, but on the other hand, many of the concepts could, in theory, be applied to today's .NET Framework on today's Windows platforms.

links to this post (0) comments

Monday, December 01, 2003

An approach to achieving rich offline and occasionally-connected behaviors 

Mediator vs. Controller I also brushed past an interesting discovery we've made about the duties of a client-side Controller in rich apps: It often speaks with (or encapsulates) a mediator bridge that links a client-side controller to a server-side controller in rich apps. I like to refer to this is an Application Mediator because rather than sending Data Transfer Objects from a server-side domain model over to a thin client, in many cases we're talking about distributing the domain model so that it can partially reside on the client, with asynchronous messaging between the tiers for synchronization of the various concerns; this seems to be the successful approach to achieving rich offline and occasionally-connected behaviors. This is a patterns topic worth exploring more deeply.

links to this post (0) comments

A rule based schema language 

An Introduction to Schematron The Schematron schema language differs from most other XML schema languages in that it is a rule-based language that uses path expressions instead of grammars. This means that instead of creating a grammar for an XML document, a Schematron schema makes assertions applied to a specific context within the document. If the assertion fails, a diagnostic message that is supplied by the author of the schema can be displayed. One advantages of a rule-based approach is that in many cases modifying the wanted constraint written in plain English can easily create the Schematron rules.

links to this post (0) comments

Topicmap representation and facets 

First I really don't understand on a pragmatic level the distinction you make between information "connected to", "contained inside", or "about" a topic. I've always figured a topic as a binding point, and my maths background tends to make me consider that there is nothing inside a point, so there is nothing (like the) inside of a topic. All the information relevant to a topic (names, roles, occurrences) is somehow "aggregated from outside", the same way information about a point in geometry is "external", like its belonging to such line or surface, its coordinates in such frame and so on. I guess that your philosophical background should make you agree that a topic fundamental nature is emptiness :)) Bernard Vatant I don't think that it is correct to use an occurrence of topic T to express meta data about the topic itself, but that it is perfectly valid to use occurrences to express meta data about the subject that the topic is a proxy for. . . . There are properties that express the subject address and subject indicators of a topic. But the values of those properties are not typed. The [occurrences] property of a topic consists of a sequence of typed values. . . . An occurrence consists of a value (string or locator), a type, and a scope. . . . occurrences are not used to establish identity and there is no mechanism in topic maps for saying "this property establishes identity" and "this property does not". Kal Ahmed, Techquila It just seems to me that the nature of the debate here rests with an underlying thought that an is *not* a . It also seems that, among some of us who created XTM in the first place, that can be a , and that, had we added as an alias for none of this discussion would be happening. Jack Park

links to this post (0) comments

This page is powered by Blogger. Isn't yours?