[sc34wg3] Almost arbitrary markup in resourceData

19 Nov 2003 16:02:08 -0500

On Wed, 2003-11-19 at 11:41, Mason, James David (MXM) wrote:
> See Below.
> 
> Jim Mason
>  
>         Lars Marius:
>         What does make me worried about <baseNameString> is two
>         things:
>         
>           1) our rationale for allowing XML in <resourceData> is that
>         it's
>              equivalent to <resourceRef>, but <baseNameString> really
>         isn't,
>              and topic names have no [locator] property,
>         
>           2) base names are crucial to all kinds of user interfaces,
>         because
>              they provide labels for the topics, and without those you
>         don't
>              really have much of a UI. We can have resources as names
>         for
>              topics (through variants), but having base names as
>         strings
>              ensures that there's always *something* that can be
>         displayed as
>              a mere string.
>         
>              If we allow markup in here that goes out the door. You
>         may have
>              to strip (or, even worse, render) XML markup to be able
>         to label
>              your topics.
>         
>         I'd be interested to hear what people think of this. Should we
>         change our minds and only do this for <resourceData>?
>         
>         
> It was initially baseNameString that I was most interested in
> corrupting. resourceData is of less importance to me. 
>  
> It may be laziness/ignorance on my part, but baseNameString is what I
> choose to display to my users. I see that name (and indeed all names)
> primarily for human consumption. Variants, resourceData, are of less
> interest to me precisely because baseNameString is what drives my UI.
> It's true that I'm working in an environment where I have a lot of
> control, that I never expect to receive or transmit an arbitrary TM,
> so I don't need the fallback of somewhere having a string that's
> guaranteed to be raw text.
>  
> As I've commented elsewhere in this thread, I don't believe in
> arbitrary interchange. I expect there to be an at least implicit DTD
> for all my data. So there's never really "almost arbitrary markup" for
> me, though the markup may come as a surprise to the topic map engine.
>  
> I never believed in name-based merging because, as a linguist, I'm all
> too aware of the variability and fragility of names. 
>  
> Yes, I need to render XML. For me, the primary problem is that there
> are things I need to display that I can't display without additional
> markup. I sometimes need to display more than one paragraph. I need
> subscripts and superscripts. I need (Oh Horror!) the dreaded
> <emphasis> tag. I need things that will require XSLT processing, such
> as generating labels like "Note:" I don't want the topic map engine to
> mess with that stuff, just pass it through to where the user agent can
> do whatever it takes to render the stuff.
>  
> Maybe I'm pushing topic maps too hard, but the projects I have in my
> shop now generally involve creating portals to collections of
> information, and the users want the information displayed in the
> portal to look like the information it's the gateway to. My impression
> is that I'm not alone in this, that Eric, for one, has similar
> requirements.
>  
>         Lars Marius (and Steve N.):
>         | So, if there's markup in a <baseNameString>, and name-based
>         merging is
>         | switched on, on what basis will name matching be done?
>         
>         The equivalence rule for topic name items. We haven't defined
>         it yet in the presence of markup (will be part of the XML
>         representation proposal), but I think we'll have to base it on
>         Canonical XML. (From what I gathered from Dan Connolly, that
>         seems to be what the RDF folks will do, and for the same
>         reason I propose it: lack of alternatives.)
>         
> As I said, I never liked name-based merging. I'd much rather have
> merging based on some formalized subject indicator. In one of the maps
> I'm currently working on, name-based merging would be absolutely
> disasterous. I'm mapping our products and their parts. Several of our
> products have parts called "apple", but those parts, though named
> identically, are wildly different things. (Yes, I know, I could
> qualify the apples, and indeed I do scope the names according to the
> parent product. But my TMs are generated by scripts from data that I
> don't control, and I've had to go back and generate scopes for names,
> scopes that aren't in the source data, just to protect myself.)
>  
>         LMG and SRN:
>         | I don't like it when things get more complex.  There's gotta
>         be a damn
>         | good reason.  Jim says he has one, and I take him at his
>         word, but I'd
>         | be happier if he would explain why <variantName> won't meet
>         his needs,
>         | [...]
>         
>         I'd very much like to hear this too. Jim?
> This is all dreadfully complex anyway. I hate making the topic map
> engine have to do any more work than is necessary, but we can't assume
> that topic maps live in spendid isolation from the data they're
> mapping. Real data is messy. I'm spending most of my time now trying
> to unscramble other folks' data to the point where I can reliably run
> XSLT scripts on it to generate TMs that work. I'm about to increase
> the number of system parts in the TM I mentioned above by about an
> order of magnitude. I'm getting this data from multiple sources, some
> of them older than a number of members of SC34. It's really messy. My
> other project, the one I've talked about at Extreme, is an interface
> to a document-management system. When you start talking about
> documents, things get really messy (after all, that's why most of us
> work in SGML/XML and not in HTML). What more can I say? I can't talk
> about TMs just out in TM land. The map is not the territory, but it
> can't be separated from the territory, either. I'm a publisher, not an
> abstract topologist.
>  
> 
>