[sc34wg3] TMQL - Comments against draft dtd 2007-07-13

Robert Barta rho at devc.at
Tue Oct 30 06:43:45 EDT 2007


On Fri, Oct 19, 2007 at 01:45:34PM +0200, Lars Heuer wrote:
> Hi all,
> 
> As I know, the TMQL have nothing special to do this weekend, so here
> some TMQL comments ;)

Ooops, I only realize now that I have not responded to it. Or I have,
and am just too overworked to remember. :-)

> Topic Maps -- Query Language
> ============================
> 
> Namespaces
> ----------
> - TMQL is silent about a mechanism to define namespaces. It
>   uses QNames but it seems to be impossible to define these.
>   (we have some predefined prefixes, but how does the user define
>   one?)
>   TMQL mentions the environment (section 6.3) but wouldn't it
>   be good if the user can define prefixes for a query ad hoc?
>   (see also my comments for section 6.3.)

Yes, EVERYTHING which amounts to ontological information had been
dropped in Oslo:

  - functions (they define a functional relationship between things
    in the application domain)
  - predicates (they define a boolean function between things in the
    domain
  - and prefixes. They define 'handles' to whole ontologies.

At that point, everything of the above was defined as a topic map and
you could do magic things with it. But this is gone now.

So the only artefact which reminds us that there is something missing
is the 'Environment Clause'

   http://kill.devc.at/system/files/tmql.html#EnvironmentClause

That was discussed in Leipzig. AFAICR, the more conservative
proponents wanted to get completely rid of the environment clause and
replace it with an single, ugly, primitive, adhoc-ish PREFIX
declaration.

The more beautiful, stylish, architectural-thinking, visionary
proponents of the committee argued that all the ontological stuff
should be left to an ontology language. And how much an implementation
offers in this sector should be left open. Only the place (between the
""", or whatever) should be well-defined because of the scope the
declarations have. Extension-proof.

> 4.3. Item References
> --------------------
> - [A] item-reference ::= *
>   CTM uses the wildcard '*' to create a (more or less) anonymous
>   topic. If CTM uses '*' for that purpose, it is confusing if
>   TMQL uses the same syntax to refer to a specific topic which
>   is defined as 'tm:subject'. Either we should blame the CTM
>   editors or the TMQL editors for that confusing overlap
>   (I can imagine the answer from the TMQL-editors ;)).

Yes, this was discussed in Leipzig and also recently offline within
the editors. There is a list of TM features which should have a
similar syntax throughout CTM and TMQL (and hence TMCL).

> 4.4. Navigation
> ---------------
> 
> - [B] step::= >> instances 
>   'instances' is not part of production [18]. Either this axis should
>   be added or the forward direction should be removed, and therefor
>   only '<<' 'types' should be a valid axis.

The way the shortcuts are defined in TMQL is like this

    "shortcut"  expands to "longer, canonical form"

this is encoded as transformation in the context of a non-terminal:

    step  ::=    '>>' instances        ==>    '<<' types 
    ^^^^
    non-terminal ^^^^^^^^^^^^^^               ^^^^^^^^^^
                 shortcut                     long form


So this 'shortcut' introduces an 'instances' axis as the reverse
direction of the 'types' axis.

> - I wonder if we can reuse the 'isa' (is instance of) and 'ako' 
>   (a kind of) keywords somehow instead of introducing 'types'
>   and 'supertypes'.

It was felt in Oslo that these axes are mandated by TMDM. The more
beautiful committee members argued that the axes actually should be
(have been) part of TMDM :-)

> - I wonder if we should move those axes where the 'anchor' is ignored
>   to another production. If the 'anchor' is ignored anyhow, why should
>   we allow to specify it? If an 'anchor' is specified where it is
>   ignored, an error would be more helpful than ignoring it silently,
>   IMO.

Yes. Some axes NEED an anchor. Some CAN have one, but provide a
default. Others do not need one, it is always IGNORED.

And yes, you can reform the productions to be sensitive for each
different step whether it needs an anchor or not. This does NOT help
the presentation IMO and adds 10 more productions.

> If I understand the production correctly, the following
>   statement::
>         
>         person << types
>   
>   is equivalent to this one::
>         
>         person << types <http://www.something.which/is.ignored.but.allowed>
>   
>   The 'anchor' seems to have only a relevance for the following
>   productions:
>   - players
>   - characteristics

Sounds about right. With the current setup, though, we would have the
freedom to add meaning to an anchor easily.

> - 'locators' / 'indicators'
> 
>   - Any chance, that TMQL adapts the CTM syntax for subject
>     identifiers / subject locators?
>     (TMQL uses '=' and '~' as suffix, CTM does not use ~ for subject
>     identifiers at all and uses = as IRI-prefix for subject locators)
>     Is the '~' necessary at all? Why is not every plain old IRI a subject
>     identifier?

Also this is currently discussed offline among the editors.

> - 'reifier'
>   - Wouldn't be one tilde enough? Instead of '~~>' we could use '~>'

I like tildes. :-) The more, the merrier. Seriously (if one can be
serious about syntax), as long as it nicely symbolizes a "zoom"
operation (this is what reification in TMs is about) I would be happy.

>   - Why is the forward shortcut ~~> defined, but not a backward shortcut
>     (<~~)?

That would be possible and that was already an issue

   http://www.jtc1sc34.org/repository/0827.pdf

   "Add inverse reification shorthand"

As there was no response and I figured that it is of limited use, I
dropped the idea.

For the records, you can introduce reverse directions for _EVERY_
axis.

> - The 'characteristics' axis:
>   - Is the 'anchor' necessary at all? Isn't it possible to use
>     ``tm:occurrence`` and ``tm:name`` (or to be more exact
>     ``tm:topic-name``) as axis and ``tm:characteristic`` to retrieve
>     both: names and occurrences?

You mean you want to introduce two new axes "name" and "occurrence"
instead of the one "characteristics"? Also that was discussed in Oslo.

I tried to get rid of characteristics, but what happened in the text
is that all what was said for "name" was also true for
"occurrence". The two are just too similar not to be covered as one.

The killer, though, was that all the formal shortcut mechanism broke
down. I would have had to introduce more formalism to make it work
again. The costs I regarded to by prohibitively expensive.

> Maybe it becomes more complicated if
>     the user wants a specific type, but if 'isa' could be reused, the
>     following seems to be possible::
>     
>     a)  john >> tm:occurrence [. isa homepage]
> 
>         retrieves all occurrences of type 'homepage'
>     
>     b)  john >> tm:name [. isa nickname]
>         
>         retrieves all nicknames from the topic 'john'

Yup. The reason I tended to

          john >> characteristics homepage

was that (a) an anchor is already available for this control
information and (b) it is much shorter to write in the first place.
Replacing it with you solution would not minimize the number of
concepts. If we would get rid of the 'control anchor' altogether,
there would be a gain.

>     c)
>         i.  john >> tm:characteristic [. isa whatever]
>         ii. john >> whatever
>         
>         retrieves all characteristics (names *and* occs) of type 'whatever'

Yup.

>     d)  Retrieving all occurrences by foot:
>         
>         john >> tm:characteristic [. isa tm:occurrence]

Yup.

>     e)  Retrieving all names of type 'nickname' by foot:
>     
>         john >> tm:characteristic [. isa tm:name][. isa nickname]
> 
>         or:
>     
>         john >> tm:characteristic [. isa tm:name & . isa nickname]

Yup. But cough :-)

> - While writing this, I wonder if the need these uniform axes at all.

Ad uniformity: I like that for programming.

>   Why do we need the << >> axis incl. keywords, if we'd introduce some
>   specific "axes": ako, isa, <-, ->, ~>, <~ (the last four axes are
>   already part of TMQL). Well, I may be mistaken here, but we could
>   remove some (lengthly) keywords and introduce a dedicated syntax for
>   them.

That would be a lot of syntax (because of the many node types in TMDM). I
preferred to have a middle ground:

  (a) have one canonical syntax
  (b) and introduce shortcuts ad libitum for those situations which
      may appear often

But as we figured in Oslo, this is actually a TMDM discussion: "How
can I get around the info in an TMDM instance". To define a 'dedicated
syntax for each an every axis and every direction', .... that may not
look too pretty.

TMQL itself does not actually care about the navigation. I use here
TMQL also on top of TMRM and use then TMRM navigation, obviously.

> 4.7. Composite Content
> ----------------------
> - I just want to mention, that I find the '==', '++' and '--' infix 
>   operators confusing / not very intuitive
>   - for intersection ('==') I'd use '&'
>   - for union ('++') I'd use '|'
>   - for difference ('--') I'd use '-'
>   
>   Well, it's just syntax and '&' and '|' are already used for 'and' 
>   and 'or', but ... hmmm ... anyway ... I'll move on

Yes, please. :-)

The reason to use double symbols (==, ++, --) is to make it clear that
the operation is operating on whole tuple sequences, not just on one
value. ++ is NOT a set operation, and -- is NOT a set operation,
either.  And == has EXISTS semantics. So better not use familiar
symbols for something with an unfamilar semantics.

And what is intuitive or not is like discussing neareast politics. And
there was a time, not too far ago, where people regarded COBOL as
intuitive. Today it's Java.

> - Has someone verified, that the condition (if .. then .. else ..) is
>   unambiguous (even if conditions are nested)?

It should be, because the keywords act as bracket. Also not that the
'else' is NOT optional. That could have been a point of possible
confusion:

if ....
   then ...
     if ...
     then ...
     else ...  # now what?

> 4.10. Topic Map Content
> -----------------------
> - The topic map content is wrapped inside triple quotes ("""), but
>   CTM itself uses triple quotes. Maybe another syntax should be used
>   to wrap topic map content, otherwise the implementator has to
>   count the number of """ to decide if the topic map content block 
>   is closed or if a CTM string is opened / closed.

Yes, they should use the same mechanisms. Not only for this.

>   Additionally, this is bad for syntax highlighers etc.

Really?

> 4.13. Boolean Expressions
> -------------------------
> - see 4.7. :/

I think the choice for & and | is not too far-fetched. But it is
fu**ing syntax, as larsbot uses to put it. :-)

> 6.3. Environment Clause
> -----------------------
> - The environment clause seems to be a string, why does TMQL not adapt
>   the directives from CTM?

Because they do not make any sense in the context of TMQL? The closest
to being useful is the prefix, but, well ... see above.

>   Maybe TMQL could add some directives for
>   setting the default topic map etc.

Isn't there the FROM clause doing that?

---

Thanks for the close-up reading.

\rho


More information about the sc34wg3 mailing list