[ ]
[ Learn about SGML! ]  SGML document introducing you to SGML

Please scroll down to read the "Document Instance" section of this page, first.

To see what the document instance looks like when passed through a simple reformatting process, which translates the marked up SGML text into a normal web page, click here.

To read another brief essay introducing the idea of SGML and why it is important: ...*Darwin Among the Machines* (Susanne Langer and SGML), click here.

In addition to the billions of HTML pages on the World Wide Web, there are also some native SGML pages on the web (e.g., American Civil War period literary archives at The University of North Carolina). --Up until about August, 1998, SoftQuad Corporation used to freely distribute trial versions of a Netscape SGML viewer plugin, which enabled anyone to view native SGML Internet pages in a Netscape web browser (read the historic document in which Yuri Rubinsky announced this software!).

[ Check SGML on the WEB book at Amazon.com! ]Then SoftQuad sold its SGML product line to Interleaf, and I don't know if there is now any way to get free software to view native SGML Internet pages. If you can find a copy of Yuri Rubinsky and Murray Maloney's book: SGML on the WEB: Small steps beyond HTML (Prentice Hall, 1997), it includes a compact disk (CD) with Panorama Pro 2.0 software which not only displays existing SGML pages, but enables you to create and format your own. This software is worth far more than the price of the book, but the book seems at present (October, 1999) to be out of print. The actual content of the book: how to build SGML pages for web publication, is also very important, but, unfortunately, it describes a great vision of a future for The Internet, which now never will happen. (For a little more information about SoftQuad's divestiture of its SGML products, click here)

For various reasons, SGML seems to have become (what one the SGML community's own newsletters has called:) "a dead language, like Latin". But SGML has been reborn as XML (eXtended Markup Language), with an enthusiasm in the computer industry that SGML itself never achieved. To read some thoughts about XML, including a polemic against XML, and second thoughts I had after attending the GCA XML 98 conference (14-20 Nov 98, Chicago), click here. (To examine some XML pages I have developed, and which can be viewed with Microsoft Internet Explorer 5 or newer, click here.)

Best wishes! Thank you.


Document Type Definition (DTD)


<!-- Note: this line is an SGML comment -->
<!-- Note that all SGML markup is human-readable: no unprintables -->

<!DOCTYPE heuristic [ 
<!ELEMENT heuristic    - - (purpose, inote?, realization, docinfo)>
                              <!-- The "top level" element for a -->
                              <!-- document structured according -->
                              <!-- to this DTD is: "heuristic".  -->
                              <!-- The "heuristic" element must  -->
                              <!-- contain exactly one "purpose" -->
                              <!-- element, optionally followed  -->
                              <!-- by an "inote", obligatorily   -->
                              <!-- followed by one "realization" -->
                              <!-- element, followed by one      -->
                              <!-- "docinfo" element, and        -->
                              <!-- nothing else!                 --> 
<!ATTLIST heuristic 
            version  CDATA  #FIXED "1.52" 
            security (unrestricted | internal.use | confidential |
                               need.to.know) #REQUIRED >

<!ELEMENT purpose      - - (#PCDATA) >  
                              <!-- "#PCDATA" means: parsable     -->
                              <!-- character data, i.e., text    -->
                              <!-- in which the "<" character    -->
                              <!-- is treated as a start-of-     -->
                              <!-- tag character, etc.           -->
                                        
<!ELEMENT inote        - - (#PCDATA) >
<!ELEMENT realization  - - (point+) >   <!-- a "realization"     -->
                              <!-- consists of 1 or more "points"-->

<!ELEMENT point        - - (title, text+) >  <!-- a "point"      -->
                              <!-- must contain one "title",     -->
                              <!-- followed by 1 or more "text"  -->
                              <!-- elements                      -->

  <!ELEMENT title      - - (#PCDATA) >    <!-- a "title" element -->
                              <!-- can contain only characters,  -->
                              <!-- with no further embedded      -->
                              <!-- elements                      -->

  <!ELEMENT text       - - (#PCDATA) +(quote | institution | ruler | 
                               list | product | link | country) >
                              <!-- a text element can contain    -->
                              <!-- zero or any number of quote,  -->
                              <!-- institution, ruler, list...   -->
                              <!-- elements, as well as          -->
                              <!-- character text....            -->

<!ELEMENT docinfo      - - (url, author, auth.eaddr, rev.date) >
<!ELEMENT url          - - (server, account, page.id) >
  <!ELEMENT server     - - (#PCDATA) >
  <!ELEMENT account    - - (#PCDATA) >
  <!ELEMENT page.id    - - (#PCDATA) >
<!ELEMENT author       - - (given.name,sur.name,credential*) >
  <!ELEMENT given.name - - (#PCDATA) >
  <!ELEMENT sur.name   - - (#PCDATA) >
  <!ELEMENT credential - - (#PCDATA) >
<!ELEMENT auth.eaddr   - - (#PCDATA) >
<!ELEMENT rev.date     - - (rev.yr, rev.mo, rev.da) >
  <!ELEMENT rev.yr     - - (#PCDATA) >
  <!ELEMENT rev.mo     - - (#PCDATA) >
  <!ELEMENT rev.da     - - (#PCDATA) >

<!ELEMENT quote        - - (#PCDATA) +(quote | institution | 
                               product | ruler | country) >
<!ATTLIST quote 
            type     (copied.text | hearsay | memory | ipse.dixit | 
                               idiom | fictive | text.as.object | 
                               other) #REQUIRED
            source   CDATA  #IMPLIED >

<!ELEMENT link         - - (#PCDATA) >
<!ATTLIST link
            tgt      CDATA  #REQUIRED > 
 
<!ELEMENT institution  - - (#PCDATA) >
<!ATTLIST institution
            type     (corp | school | other) #REQUIRED >
<!ELEMENT product      - - (#PCDATA) >

<!ELEMENT ruler        - - (#PCDATA) >
<!ATTLIST ruler
            type     (king | prince | queen | president | 
                               prime.minister | other) #REQUIRED >
<!ELEMENT country      - - (#PCDATA) >
<!ELEMENT list         - - (item+) >
  <!ELEMENT item       - - (#PCDATA) +(quote | institution | 
                               product | ruler | list) >

<!ELEMENT i            - - (#PCDATA) >
 
<!ENTITY % THINK  "include" >
<!ENTITY % TAGS   "ignore" >
<!ENTITY % NOTAGS "include" >
<!ENTITY   lt     "&#060;" >

<?STYLESPEC "Style1" "heuristic.ssh" >
]>

Document Instance

<!-- Note: this line is an SGML comment -->
<heuristic security="unrestricted">

<purpose>This page aims to introduce you to <![%THINK;[SGML]]>
</purpose>  <!-- Note: the slash ('/') is SGML's "end-of-element" -->
            <!-- indicator.  E.g., the /purpose tag ends the      --> 
            <!-- purpose element, and the text between the        --> 
            <!-- two tags is the content of the purpose element   -->

<![%NOTAGS[<inote>When this document is formatted for display 
as an HTML web page, with all the SGML structure-describing tags 
reduced to HTML style-descriptors (with, in some cases, the tagging 
just being discarded and not translated into anything...), it 
looks nicer, but information is lost.</inote>]]>

<realization>
<point>
<title>What is the idea of SGML?</title>
<text>The idea of SGML is to thematize the structure of text, by
adding explicit structure descriptors -- markup tags -- to the text.  
For example, if we wish to communicate something about a product, 
instead of saying something like: "Netscape's Navigator is not 
related to Portugal's Henry the Navigator", we can say: 
<quote type="fictive"><institution type="corp">Netscape</institution>'s 
<product>Navigator</product> is not related 
to <country>Portugal</country>'s <ruler type="prince">Henry the 
Navigator</ruler>.</quote> <![%NOTAGS;[Note that, in the 
preceding sentence, when the structure tags are reduced to simple 
HTML page formatting, the tagged up sentence looks pretty much
the same as the untagged plain text version.]]></text>
<text>Texts, like physical buildings, can either be products of habit 
and custom (<quote type="idiom">vernacular architecture</quote>), or 
be products of conscious, self-accountable creativity and critique 
(<institution type="school">The Bauhaus</institution>, structural 
engineering, etc.). I hypothesize SGML may portend 
a <quote type="memory" source="Henry Adams">change of phase</quote> 
in our relation to language, such as previously was effected by 
alphabetic writing and uniform printed editions.  Among other things, 
SGML is what <quote type="idiom">diagramming sentences</quote> 
in high-school English should have been but wasn't: SGML realizes, 
as effective and proactive social activity, and not merely as 
social-scientists' dissociated theorizing, a notion 
of <quote type="idiom">generative grammar</quote>!    
</text></point>

<point>
<title>What is the value of adding explicit structure descriptors 
       to text?</title>
<text>The rewards of adding explicit structure descriptors to text 
are many, including:
<list>
<item>Stimulating reflection on the logical structure of what one 
  is saying, leading to:
    <list>
    <item>Better style: Making what one is trying to say clearer 
      to the persons who will read and try to understand it</item>
    <item>Clearer thinking: Forcing oneself to become clearer 
      what one is trying to say</item>
    <item>Discovery: Discovering things about both the content and 
      the form of expression which one had not previously thought 
      of, through the process of articulating what one thinks 
      one wants to say</item>
    </list></item>
<item>Facilitating computer programs to process the text (it's much 
  easier for a computer program to find all the references to rulers 
  of countries if they are all labelled something like: 
  <quote type="text.as.object"><ruler type="..."> ... </ruler></quote>,
  than if their names simply appear -- like the reference to Henry 
  the non-Netscape Navigator in this sentence -- as undistinguished 
  sub-strings of homogenous character strings.</item>
</list>
</text></point>

<point>
<title>Once you've "tagged up" your text with these structural 
       descriptors, what can you do with it?</title>
<text>Adding structural descriptors to text is the first step of what, 
in general, is an at least two-step process.  Persons do not generally 
directly read the tagged text<![%TAGS;[ (e.g., what you are here 
reading now...)]]>.  What generally happens is that one also 
writes a <quote type="idiom">style sheet</quote>, which a computer 
program reads, along with the tagged text file: The computer program 
then generates a formatted document, as paper printout, online web 
pages [<link tgt="heuristic.html">see example</link>], spoken text 
for the blind, etc.</text>
<text>One of the benefits of SGML is that, once you have produced 
your tagged text document, which, in a sense, has no format (other 
than being a human- and computer-readable ASCII text file), anyone 
can generate output in any desired form, by simply writing different 
style sheets for appropriate processing programs.  Example: 
In a hospital, insurance providers, oversight groups, doctors 
and nurses, patients themselves, and others perhaps not yet 
foreseen, may wish to examine patient records in all 
sorts of different ways.  If the hospital maintains its patient 
records in SGML, then everybody can put the data into the form 
they want, without the hospital having to do anything except make 
the raw source data available.  If a doctor wishes to see patient 
information on a palm-top computer, the data can be transformed 
by a web browser (or browser plug-in, such 
as <institution type="corp">SoftQuad</institution>'s <product>Panorama 
Viewer</product>), into display pages, even selecting only the 
particular kinds of data of interest to the doctor (e.g., the course
of symptoms, but not billing information).  An insurance provider, 
on the other hand, can download the billing information over a 
high-speed data link, and either process it as-is, or run it through 
a program which loads it into their database management system (DBMS).   
</text></point>

<point>
<title>What is an SGML document like, exactly?</title>
<text>You have been looking at an SGML document all the time you 
have been reading this.  An SGML document consists of two parts:
<list>
<item>A document type description (DTD).  A DTD defines the
  pattern for a certain kind of document.  It specifies what  
  kind of elements can exist in a document coded according to  
  its logical form, which of those elements can be contained 
  in which other elements, and the order in which different
  elements must occur.  The present document's
  DTD [<link tgt="#DTD">see above</link>] specifies that
  quotes can appear in text elements, but not (e.g.) in titles 
  or in the document's meta-document information (the 
  <quote type="text.as.object">docinfo</quote> element).
  On the other hand, an author element can only occur inside
  a docinfo element, there must be one and only one author
  element, and it must immediately follow a url element 
  in the docinfo block....</item>
<item>The document instance itself, i.e.: the text, tagged to 
  explicitly articulate its structure according to the  
  specified document type description.</item>
</list>
Note that a given document type description (DTD) can be used as a 
template for an indefinite number of document instances (to 
facilitate this, the DTD is generally kept in a separate computer 
file, apart from the document instances which use it).</text>
<text>Also note that, if, as is generally done, one edits 
one's document using an SGML-aware text editing program (such 
as <institution type="corp">ArborText</institution>'s 
 <product>AdeptEditor</product>, the text editor can make sure the 
document instance conforms to its type, by not permitting entry of 
elements (tags) except where they are allowed.  Conversely, such an 
editor can aid the writing process -- possibly even helping 
overcome writers' blocks -- by telling the writer what kinds of 
items can be entered at any place in the document being edited.
</text></point>

<point>
<title>But isn't SGML a straitjacket, then?</title>
<text>No! SGML is not like a straitjacket (or even like dictionaries 
and grammar books), because you can create your own document type 
definitions (DTDs), and, as you write your documents according to 
a DTD you have created, if you find it doesn't let you do what you 
want the way you want to do it, you can change the DTD to make it 
more suited to your purposes.  A side-effect of this process is that, 
because every change you make must be explicitly declared in your 
DTD, you always have an up-to-date record of your conception of your 
document's structure, without having to make any extra effort to 
write extrinsic documentation.  A well-written SGML document 
largely documents itself!
</text></point>

<point>
<title>What's the difference between SGML, HTML, and XML?</title>
<text>HTML looks like SGML, but HTML tags are mostly style 
descriptors rather than structure descriptors.  Example: the
HTML <quote type="text.as.object"><b></quote> tag says:
make the following text appear in bold-face.  The HTML 
<quote type="text.as.object"><address></quote> element, on 
the other hand, is an example of a structure-describing tag: it 
doesn't prescribe what the text it contains is supposed to look 
like, but rather it articulates what that content means.</text>  
<text>XML is a simplified, <quote type="idiom">dumbed-down</quote> 
variant of SGML. XML tags should generally be structure-describing, 
but SGML is functionally much richer than XML.  Also, XML does not 
require an explicit document type (document structure) specification 
(DTD).  This makes XML easier to code, but it also provides less 
motivation and assistance to think about the structure of one's 
document, since, in an XML document instance, you can make up new 
(or inconsistent) elements as you go along, e.g., calling a 
quotation here a: <quote type="text.as.object"><quote></quote>, 
there a: <quote type="text.as.object"><quotation></quote>,  
elsewhere a: <quote type="text.as.object"><quot></quote>, etc.
</text></point>  

<point>
<title>Conclusion</title>
<text>To paraphrase Martin Luther: 
    <quote type="memory">A mighty fortress is our SGML!</quote>    
</text></point>
</realization>

<docinfo>
<url><server>www.cloud9.net</server><account>bradmcc</account>
    <page.id>WhatIsSGML.html</page.id></url>
<author><given.name>Brad</given.name><sur.name>McCormick</sur.name>
    <credential>Ed.D.</credential></author>
<auth.eaddr>bradmcc@cloud9.net</auth.eaddr>
<rev.date><rev.yr>1999</rev.yr><rev.mo>06</rev.mo>
    <rev.da>01</rev.da></rev.date>
</docinfo>
</heuristic>

[ Return to explanation at top of page! ]Go back to explanation at top of this page.
The fine print: What's really going on here? I wrote a real .sgm file, which contains the text the main body (the DTD and document instance sections...) of this page displays (heuristic.sgm). That file is a valid SGML document instance (it passes a validating parse with James Clark's SP). I run that file through a Perl script (heuristic.pl), to generate the present page. I run the same "source" file [pre-pended with an SGML declaration, sgmlset.txt, to allow element names longer than 8 characters...], through an OmniMark down-translate script (heuristic.xom), to produce the formatted HTML version (heuristic.html). I also have run it through Panorama Publisher, and produced a stylesheet (heuristic.ssh), to enable Panorama -- and conformant -- web viewers to directly display the SGML text (the original: heuristic.sgm), nicely formatted, in a web browser. The process of producing these pages exemplifies what I describe in the text as the ability SGML provides to: write once and format / process everywhere! <![%THINK;[SGML]]>
Note: There is a computer programming language which enrichs our relation to numbers a way somewhat analogous to the way SGML enriches our relation to language: Kenneth E. Iverson's APL (A Programming Language). Click here for a high-level overview of APL.
If you are interested in eXtensible Markup Language (XML) -- the new avatar of SGML --, and you are using Microsoft Internet Explorer 5 (or newer) web browser, you can click here to examine some experimental XML pages I am developing.

Go to sample HTML formatted version of SGML on this page.
Go to Panorama Viewer version of this page (requires plug-in!).
 
Go/Return   to another intro to SGML: *Darwin Among the Machines* (Susanne Langer and SGML). [ ]
Read Laurent Sabarthez' notes about XML (SGML's "replacement").
[ Learn about SGML! ]
 
Go/Return to page introducing you to APL.

Return to Brad McCormick's resume.
 
Go to website Table of Contents.
Return to Brad McCormick's home page.
Return to site map.
[ ] [ Go to Site Map! ] [ ] [ Go to website Table of Contents! ] [ ] [ Go home! (BMcC website Home page!) ] [ ]
[ ]

http://www.cloud9.net/~bradmcc/WhatIsSGML.html
page generated by: heuristic.pl, ver: 17 January 2009 (v06.07)
Copyright © 1998-2006 Brad McCormick, Ed.D.
bradmcc@cloud9.net [ Email me! ]
01 June 1999 (ver: 1.52)
[ Made with Cascading Style Sheets! ]
[ ]
[ Loose HTML 4.01 Checked! Test me! ]
[ ]
[ Download Panorama SGML Viewer! ]
Dead link
[ ]
[ ]