|TITLE:||Topic Maps -- Reference Model Use Cases|
|SOURCE:||Mr Patrick Durusau; Dr Steven R. Newcomb|
|PROJECT:||WD 13250-5: Information Technology -- Topic Maps -- Reference Model|
|PROJECT EDITORS:||Mr. Patrick Durusau; Dr. Steven R. Newcomb|
|ACTION:||For review and comment|
|DISTRIBUTION:||SC34 and Liaisons|
Dr. James David Mason
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
[parid9000] First draft of RM use cases document - PD
[parid9001] Edited by SRN.
[parid9002] Entirely new draft by PD, including material contributed by AW.
[parid9003] Added new first paragraph, changed association to connection (all cases), added new use cases for non-Unicode texts and soundex merging, incorporated comments from Jan, general tightening of language
[parid9004] Added material by Duane Degler.
[parid9005] Edited by SRN.
[parid9006] Edited some of the language in the introduction, Subject Identity and Legacy Data becomes Subject Identity and Data (with internal edits as well), disclosure of merging rules tightened up. PLD
[parid0002] The following use cases have been developed to guide the discussion of requirements for the Topic Maps Reference Model (TMRM). There has been extensive work on and discussion of the both topic maps and TMRM and this document is written against that background. The casual reader is therefore cautioned that terms of art and usage occur without warning or explanation.
|1||[parid0003] No Abstract Model of Topic Maps|
[parid0004] Currently ISO 13250 provides no abstract model of topic maps. The situation is analogous to the path not taken in the early development of airplanes. Without the underlying model that guided the design of the Wrights' airplane, others could copy their work, making airplanes that, like the Wright's flyer, would really fly -- but only for a few hundred meters. The development of airplanes for diverse practical purposes required a general model of the dynamics of powered flight -- one that could form a basis on which many problems could have many creative solutions. Similarly, the first interchange syntaxes and processing models for topic maps have guided the construction of topic maps that really work. However, by themselves, these syntaxes and processing models provide an inadequate basis for creating and using diverse solutions to the evolving problems confronted by those who create, manage and use human knowledge.
[parid0005] The existing interchange syntaxes and processing models for topic maps reflect particular approaches to the identification of subjects -- specific techniques for determining when two or more topics represent the same subject. Both ISO 13250 and the proposed revisions of it concede that their interchange syntaxes and processing models can be extended, but they provide no guidance for modeling or meaningfully disclosing those extensions, such that meaningful construction and interchange of such topic maps are possible. In the absence of an abstract model for topic maps, it is not possible for vendors and users to extend current syntaxes and processing models in a reliable and interchangeable way.
[parid0096] The TMRM exposes principles on the basis of which diverse designs for topic maps can be expressed, compared, evaluated, and made to work together. This document describes some use cases in which the TMRM is expected to enable solutions to problems that, in the absence of the TMRM, would be more difficult to solve.
|2||[parid0006] Subject Identity Based on Connections|
[parid0008] In topic maps, the connections between topics represent connections between subjects. George may be connected to Laura (his wife), to the US Government (his employer) and to Osama bin Laden (his nemesis). The question raised by this use case is: "How, in the absence of an abstract model for saying so, we can know whether to merge two topics on the basis of their connections to other topics?"
[parid0010] In this particular use case, the Social Security Administration, an agency of the US government responsible for distributing funds to elderly and disabled US citizens, is interested in investigating fraud in claims for payment. In some cases, fraud is committed by persons who claim multiple payments by pretending that they are multiple persons, each with a different "social security number" (a different presumably-unique identifier assigned by the US government). In order to detect this kind of fraud, the Social Security Administration wishes to enable merging of topics that represent individuals on the basis of their connections to other individuals, geographic locations, treating physicians and the types of claims being made.
[parid0012] Topics that represent individual claimants. Each such topic has a property whose value is the claimant's social security number(s).
[parid0013] Topics that represent types of claims.
[parid0014] Topics that represent treating physicians.
[parid0015] Topics that represent geographic locations.
[parid0016] Connections between the above topics that reflect the information known to the Social Security Information.
[parid0018] The investigator needs to enable merging of topics that represent individuals, despite the lack of equal locator items (as specified in the proposed Topic Maps - Data Model Section 5.4.6 "Properties") on the basis of connections to type of claim, treating physician and both the person's and physician's geographic location topics.
[parid0020] Investigator obtains merger of topics representing individuals who share a connection to geographic location as specified with a connection to a particular physician and connection to a type of claim. When such mergers occur, there is some possibility of fraud if the resulting merged topic has more than one social security number.
|2.5||[parid0021] Business Case|
[parid0022] The approach allows efficient detection of cases in which there is a possibility of collusion of patients and physicians in fraud. The approach depends on having flexibility in how subject identity is determined. (If the Social Security Administration needed to determine subject identity solely on the basis of social security numbers, the TMDM's approach to subject identification would be adequate.) This use case illustrates that there is utility in applying merging rules other than those provided by the TMDM.
|3||[parid0041] Specifying Properties of Topics|
[parid0042] The interchange syntax-based explication of topic maps in ISO 13250 enunciates certain properties for topics, including topic name, occurrence, and association. Some current proposals, such as the TMDM, recognize that 13250's interchange syntax is not intended to constrain the properties of the non-interchangeable topic objects found in implementations. These proposals provide additional properties, but they nevertheless would provide all topics with only a single, specific fixed set of properties, exclusively reserving unto themselves the privilege of defining the properties of topics.
[parid0043] There is nothing particularly sacred about the properties of topics reflected in any interchange syntax or proposed data model. As has been the case for many years, different industries and users employ different notions of subject identity, even when they are processing exactly the same information. Creators of topic maps should have the ability to declare the properties in terms of which they intend their topics to be understood, and their ability to declare such properties should not be constrained by the topic maps standard. Users of topic maps should be free to use the advice provided by their creators, or to ignore it; users should be able to decide for themselves the properties in terms of which they wish to understand topic maps.
[parid0045] In this particular use case, the US Geological Survey (USGS), another agency of the US government, wishes to construct a topic maps in which topics have, in addition to names, location properties whose values are expressed in terms of longitude and latitude. As it happens, the USGS does not wish to create topics to represent individual quanta of geographic space; instead, it prefers to understand latitude and longitude values as points in their respective continua. This attitude has implications for subject identity, and therefore for merging, and the USGS needs to understand and explain those implications to itself and to the users of its topic map products.
[parid0047] The USGS wishes to build a topic map that contains topics whose subjects are geographic locations. For each such topic, the following information will be conveyed:
[parid0048] the name of the location
[parid0049] the variant names of the location
[parid0050] the longitude of the location
[parid0051] the latitude of the location
[parid0064] The USGS intends the topic map to be understood in such a way that, when any two topics have the same longitude (within some tolerance), the same latitude (within some tolerance), and any name or variant name in common, the two topics will be regarded as having the same subject (i.e., they will be merged).
[parid0056] The ability to integrate information about a given specific set of geographic coordinates is just one of the USGS requirements. Another requirement is to be able to respond to queries about identified locations with respect to any set of coordinates. In the data model that the USGS needs to use for its topic maps, longitude and latitude are as much characteristics of its topics as topic names or information locators may be in some other data model.
[parid0058] The TMRM shows how the USGS can enjoy the benefits of Topic Maps without having to surrender the freedom to construct subject identity properties that accurately reflect its own understandings and attitudes with respect to the subjects within its domain (in this case, geographic locations). Because the TMRM establishes the minimum requirements for usefully disclosing such understandings and attitudes, the USGS can provide the users of its topic maps with the option of understanding them exactly as USGS intends them to be understood.
|3.5||[parid0059] Business Case|
[parid0060] The TMRM allows the benefits of creating and using Topic Maps to be realized by diverse user communities, even when their notions about how subjects should be identified are highly specialized, or are themselves subject to change. It allows the topic maps paradigm to be adapted to the attitudes of its users with respect to their knowledge domains, rather than requiring the users to adapt their thinking about the representation of their knowledge domains to the constraints of topic maps. This can significantly reduce the learning curve burdens of new users who already have a data model with which they are already familiar. It also maximizes the freedom of topic map creators/maintainers to adapt to changes that occur within their knowledge domains.
|4||[parid0061] Topic Maps and Diverse Information Resources|
[parid0062] One of the listed purposes of ISO 13250 was to provide integration of diverse information resources (both structured and unstructured) through the use of a topic map. The practical requirements for integrating truly diverse resources may, but very likely will not always, be fully satisfied by the properties of topics (and the merging rules based on those properties) that have been proposed as revisions to ISO 13250.
[parid0063] This section, Providing an Integrated View of European and UK Parliamentary Information, written by Ann Wrightson, is one example of such a use case. This material, Copyright 2003 Ann Wrightson, appears here with her permission.
|4.1||[parid0200] Providing an Integrated View of European and UK Parliamentary Information|
[parid0203] It is a fact that at the time of writing, the European Parliament is evaluating Topic Maps as a medium for recording the existence and organization of a range of information assets, and the UK Parliament has decided to adopt RDF for indexing some of its information assets. This usecase takes this situation forward into a plausible future scenario where both these illustrious organizations have followed through on these early directions, and have furthermore made substantial collections of their respective information assets available by remote access. These access interfaces include the following capabilities:
[parid0204] Performing a query on the collection's subject index and other metadata, using an RDF or Topic Map query respectively. These queries may return "flat" values, or may return a more or less substantial helping of RDF or Topic Map respectively.
[parid0205] Retrieving individual resources using parameters such as a document ID.
[parid0065] This usecase is a high level description of a user interface that gives an integrated view across these two collections, including search and retrieval functions that do not require the user to interact separately with the two collections of information. The researcher is an independent third party.
[parid0208] Researcher wishes to investigate a matter with relevant sources in both UK and European Parliament collections
[parid0209] Researcher has a tool driven by the TMRM - called below the RM-Nav
[parid0210] Each collection includes metadata, for example Dublin Core.
[parid0211] Access to both collections is available, through an interface supporting querying of subject index & metadata, and retrieval of documents.
[parid0212] A "researcher's friend" ontology is available - say in a third technology, X - that provides cross-references between subject headings used in the two domains. (This is called the X-ref ontology below.)
|Action||Software reaction||TMRM Contribution|
|Researcher starts RM-Nav||Following user authentication, RM-Nav presents a browsing interface, including a navigable network of subject headings.||Queries to both collections retrieve their current sets of subject headings, including structure such as hierarchy. These are combined with the X-ref ontology to yield a single structured collection of subject headings. This is called the combined subject index below.|
|Researcher selects a major subject heading||RM-Nav "zooms in" to the part of the network of subject headings that pertains to this major subject.||Filtering of combined subject index by applying hierarchy and/or proximity measure.|
|Researcher navigates through the network to a specific subject term||Navigation interface, followed by list of documents pertaining to the subject term selected. (Assumes that there are a smallish number of documents.)||Supports the formulation of queries (ending up as RDF Queries and TMQL queries that use suitable local subject terms) to retrieve document IDs, and metadata to populate the list of identified documents.|
|Researcher selects a document to read||The document (information
resource) is retrieved, and rendered according to the data
format recorded in its metadata.
Links are provided to a selection of closely related documents.
|A selection of closely related documents is identified by combining and filtering information gathered by queries (to both collections) using suitable local terms derived from the selected document's metadata & the combined subject index.|
|Researcher requests information on the creator of the selected document||Presents summary information from the document metadata, plus a suitable interface to relevant documents (across both collections), eg a list if few, a navigation interface if many.||A selection of relevant documents is identified by combining and filtering information gathered by queries (to both collections) using the Creator term from the document's metadata.|
[parid0215] Researcher obtained suitable source document, with citation and background on creator.
|4.1.5||[parid0216] Business case|
[parid0217] Effective use of copious published information.
|5||[parid0101] Subject Identity and Data|
[parid0104] The Widget Corporation wishes to use topic maps to access its current sales activities and to plan its marketing strategies. In order to determine subject identiy, it wishes to use values returned from its current database. Due to its long presence in international markets, some of the data in question is stored in a variety of encodings, including Unicode, Shift_JIS, EUC-JP, KS C 5601-1992, and others. Numerical data is stored in a uniform encoding but names of personnel, sales territories and other data used primarily by local offices varies by locale.
[parid0108] Subject identity is determined on the basis of values returned from database. (Note: subject identity is not determined on the basis of pointers to those values.)
[parid0109] Data is stored in a variety of encodings.
[parid0111] Widget Corporation wishes to specify custom rules for determining subject identity based upon actual data and not based upon pointers to that information. Those rules must allow for the matching of data held in various character encodings.
[parid0113] Subject identity based upon data held by the Widget Corporation allows it to capitalize on its existing data, enhanced by the use of topic maps.
|5.5||[parid0114] Business Case|
[parid0115] Without being restricted to pointers to data, Widget Corporation can make effective use of its existing data to determine subject identity and by implication, the merging rules that apply to subjects that are of interest to it. Reuse of data is an important consideration for Widget Corporation due to its long term investment both in the development and maintenance of that data.
|6||[parid0150] Subject Identity and Soundex Matching|
[parid0152] While used by telephone companies to assist operators for years, soundex algorithms have taken on a new importance in the current war on terrorism. The cancellation of flights to the US based upon faulty soundex matching that resulted in a five year old girl being suspected of being a terrorist is well known.
[parid0153] Even if soundex matching yields unreliable results, the technique is used because it offers at least some advantages in dealing with the generally intractable problem of public security. It appears here as a use case because it is an example of a technique other than string-matching that is used to establish subject identity.
[parid0154] A major provider of security for an unspecified airport wishes to use topic maps to assist in screening passengers who are embarking on both domestic and international flights. Some of the details that are of interest in establishing subject identity are listed as preconditions.
[parid0157] Names of passengers in various non-Unicode encodings
[parid0158] Soundex results of passenger names
[parid0159] Soundex results of suspected terrorist names
[parid0160] Other information deemed relevant to screening passengers
[parid0162] While screening passengers for eventual boarding, the security provider wishes to use both soundex matching of names, along with other criteria not disclosed, but that are supported by actual substantive data, and not by pointers to such data, in order to establish the identities of passengers scheduled to board a particular flight.
[parid0164] The security provider can utilize data comparison algorithms, like soundex, as part of the process of determining subject identity for airline security.
|6.5||[parid0165] Business Case|
[parid0166] The need to utilize a variety of means to evaluate subject identity, in the very real sense of who is going to board a commercial aircraft, or to enter a secure location, cannot be doubted. Those charged with providing that security should have the means to adapt subject identity, in the topic maps sense, to that task as they see fit.
|7||[parid0023] Disclosure of Merging Rules|
[parid0025] The prior use case on merging of Social Security Claim records is certainly allowable under both ISO 13250 and under current proposals for revising ISO 13250. However, neither the current standard nor any proposal (other than the TMRM), provide for the disclosure of such variant merging rules.
[parid0027] In this particular use case, the Social Security Administration, an agency of the US government, wishes to evaluate new topic map software for use in its fraud detection unit. It has no knowledge of any merging rules that were customized as part of its current topic maps application.
[parid0029] The Social Security Administration informs the new vendor that it has the following topics stored in its topic map:
[parid0030] Topics that represent individual claimants.
[parid0031] Topics that represent types of claims.
[parid0032] Topics that represent treating physicians.
[parid0033] Topics that represent geographic locations.
[parid0034] Associations among the various topics.
[parid0066] Further, the new vendor is allowed to observe the operation of the current system, including the inputs and outputs of proposed merger operations.
[parid0036] The Social Security Administration, before choosing a new vendor, wishes to have both assurances and a confirming independent evaluation that the new software will precisely duplicate the current system's functionality with respect to the merging of topics.
[parid0038] The vendor is able to provide a formal specification that claims rigorously (and legally actionably) the ability of the new software to duplicate the behavior of the present system. The proposed new software's conformance to the specification can be independently verified.
|7.5||[parid0039] Business Case|
[parid0040] Without disclosure of merging rules and the objects in topic maps affected by such merging rules, vendors will be unable to provide meaningful assurances to clients that their software will duplicate or exceed current capabilities. Customers will be unable to undertake meaningful assessments of the risks involved in changing from one topic maps software vendor to another.
|8||[parid0067] Access to Information from Multiple Sources, Preserving Context|
[parid0068] The following is a use case description drafted by Duane Degler on March 6, 2004. Copyright 2004 Duane Degler, email: email@example.com. This material may be copied and redistributed provided that the copyright notice and author's e-mail address are included on all distributed copies.
|8.1||[parid0069] Access to Information from Multiple Sources, Requiring Context|
[parid0071] There are many cases where information relevant to a user's task is the responsibility of more than one organizational entity - whether that is two or more departments within an organization, or two or more separate organizations. This is particularly true in government, because activities are governed by agreements between government agencies/departments in situations where jurisdictional boundaries are crossed in the completion of a task or activity (see the scenarios at the end of this document for two examples of this). Data exchange agreements will exist for managing the transactional data, but the process of a user entering that data also requires access to content that may not fall under the agreement, as it may not be considered "structured data" in application terms.
[parid0072] In a situation where accessing information is incidental to the user's main task, the software application being used may take on the responsibility for accessing required information (e.g. policy, instructions, or data contributing to task completion) as a background activity.
[parid0074] Data provider, Data entry application, Local information source, Remote information source Note: for the purposes of convenience, "Organization A" will be used to denote the one responsible for the data entry application and the local information source, and "Organization B" will be used for a remote information source.
[parid0077] Organizations A and B have information sharing agreements spanning the scope of the task supported by the data entry application.
[parid0078] Organizations A and B have a model for describing and mapping the information resources, and expose these maps to the data entry application.
[parid0079] The user (data provider) initiates a request for information at a particular point while working in the data entry application.
|User Action||Role of TM|
|While using the data entry application, user requests assistance and clarification about the data being entered at a particular point in the data entry process.||Protocol for packaging the request in a standard form for exchange between applications, exposing whatever knowledge the data entry application has about what data the user is working on, the nature of the information being sought, and how it processes its own information associations.|
|Representing the information sources' (local and remote) maps. Merging or associating with topics disclosed by the data entry application. Disclosing what processing was undertaken to manage the maps involved in the request.|
|Merge or derive association inferences from among the various responses. Identify and provide or assist in provision of occurrence references. Support categorization and presentation of information based on topics known to have participated in the request.|
|User sees some policy information presented directly, and/or sees a list of specific references relevant to the initial request.|
[parid0082] User is presented with relevant documents and data based on the activity being performed, with little or no need to perform further search or query activity to refine the information received (i.e. access to the remote information source is more specific than just a "home page" or table of contents).
[parid0084] None at this time.
|8.1.7||[parid0085] Related Scenarios|
|184.108.40.206||[parid0086] Scenario: Policy query when providing financial data to a government agency|
[parid0087] Data about an individual's earnings is collected by one government agency. The individual's employer provides this data (playing the Actor role of data provider). The data is transmitted by the agency that captures the information to two other agencies that store and use the data to support their service missions. Policy and guidelines defining the nature and quality of the data provided is established by all three agencies, based on their jurisdictional responsibilities. Each of them is individually responsible for publishing and maintaining policy and guidelines information.
[parid0088] While the user (data provider) is entering information, a question arises about how particular data should be itemized. In order for the data entry application to access the relevant supporting information, the information must be located that is appropriate to the application being used, the task being performed within that application, the particular activity being carried out (at either function or field level), and the conditions particular to the data being entered (classification of employer organization, classification of employee, financial threshold/range, exception conditions).
[parid0089] Goal: Immediate access to only the relevant policy and guideline information directly from within the data entry application.
[parid0090] Expected outcome: Presentation of a list of reference paragraphs/documents to the user, with supporting categorization information to help the user make an informed selection.
|220.127.116.11||[parid0091] Scenario: Asking for information about an organization for security review|
[parid0092] A user (playing the Actor role of data provider) wants to get data and supporting information about security considerations relating to a potential supplier of services, in advance of a meeting that the user will be having with that organization. The security criteria for that organization, and supporting guidelines for the user, are held by more than one government agency/department. In order to frame the request, the user needs to enter some data about him/herself and about the organization in question.
[parid0093] The data entry application submits a request to a network of information providers, requesting both specific data and supporting policy information.
[parid0094] Goal: Access to the data and relevant policy information.
[parid0095] Expected outcome: Presentation of data to the user, along with a list of reference paragraphs/documents, with supporting categorization information to help the user make an informed selection. Information is presented in sets associated with the details of the security data provided. Clear reasons and policies are presented to the user about information that could not be accessed based on security criteria that were not met by that user's particular profile.
|9||[parid0300] Conclusion: A Procrustean Bed of Subject Identity?|
[parid0301] One of the common features of all the use cases described above can be stated as follows: Subject identity, and the properties considered by users to define it, do not always fit into predefined categories or even the notion of discrete property values. Subject identity is often, if not more often than not, a matter of values that lie upon a range of values that a user considers to represent the same subject.
[parid0302] Consider the use case of the USGS, which wishes to regard longitude and latitude data as the subject-defining properties of topics that represent geographic locations. This demonstrates that not all topic characteristics for determining subject identity consist of discrete values. For some subjects, those values may lie anywhere along a user-defined continuum. The handling, if any, of such values in merging operations must also be user-definable.
[parid0304] The characteristics of a subject that define its identity and the rules for merging topics on the basis of those characteristics, must be declarable by standard means in the Topic Maps standard. To be do otherwise belies the claim that:
[parid0100] In the most generic sense, a 'subject' is any thing whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. (ISO 13250 (2002), 3.18 subject
[parid0116] A subject is anything whatsover whose identity can be described by...
[parid0305] The second common feature of these use cases is the need for a means for users to disclose the choices they made in determining the characteristics that govern subject identity, i.e., the rules they established for merging of topics. Since no predefined set of characteristics is sufficient for determining the identity of every subject in every circumstance, it stands to reason that, at least in the general case, topic map information cannot be meaningfully exchanged without disclosing the characteristics and rules that govern identity in a particular topic map instance.
[parid0306] These use cases illustrate the need for a model of topic maps that allows diverse users to define subject identity and merging rules for subjects in their diverse domains and contexts. The model must be flexible enough to allow subject identity to be understood in terms of the inherent properties of the topic, or in terms of the topic's relationships to other topics. The model must provide for disclosure of such design choices that is sufficiently rigorous that, whenever the same topic map information is understood in terms of the same disclosure, it is understood to mean the same thing, and it is interpreted in the same way.
[parid0099] Any single way of interpreting a single interchange syntax, or any single set of topic properties, or any single set of merging rules, can, within its limitations, enable topic map interchange. However, every such thing is necessarily also by itself a procrustean bed for subject identity, and, by itself, is insufficient to serve the stated scope of ISO 13250.