This page aims to introduce you to SGML

(This document is for UNRESTRICTED distribution.)

When this document is formatted for display as an HTML web page, with all the SGML structure-describing tags reduced to HTML style-descriptors (with, in some cases, the tagging just being discarded and not translated into anything...), it looks nicer, but information is lost.


What is the idea of SGML?

The idea of SGML is to thematize the structure of text, by adding explicit structure descriptors -- markup tags -- to the text. For example, if we wish to communicate something about a product, instead of saying something like: "Netscape's Navigator is not related to Portugal's Henry the Navigator", we can say: "Netscape's Navigator© is not related to Portugal's PRINCE Henry the Navigator." Note that, in the preceding sentence, when the structure tags are reduced to simple HTML page formatting, the tagged up sentence looks pretty much the same as the untagged plain text version.

Texts, like physical buildings, can either be products of habit and custom ("vernacular architecture"), or be products of conscious, self-accountable creativity and critique (The Bauhaus, structural engineering, etc.). I hypothesize SGML may portend a "change of phase" (--Henry Adams) in our relation to language, such as previously was effected by alphabetic writing and uniform printed editions. Among other things, SGML is what "diagramming sentences" in high-school English should have been but wasn't: SGML realizes, as effective and proactive social activity, and not merely as social-scientists' dissociated theorizing, a notion of "generative grammar"!

What is the value of adding explicit structure descriptors to text?

The rewards of adding explicit structure descriptors to text are many, including:

Once you've "tagged up" your text with these structural descriptors, what can you do with it?

Adding structural descriptors to text is the first step of what, in general, is an at least two-step process. Persons do not generally directly read the tagged text. What generally happens is that one also writes a "style sheet", which a computer program reads, along with the tagged text file: The computer program then generates a formatted document, as paper printout, online web pages [see example], spoken text for the blind, etc.

One of the benefits of SGML is that, once you have produced your tagged text document, which, in a sense, has no format (other than being a human- and computer-readable ASCII text file), anyone can generate output in any desired form, by simply writing different style sheets for appropriate processing programs. Example: In a hospital, insurance providers, oversight groups, doctors and nurses, patients themselves, and others perhaps not yet foreseen, may wish to examine patient records in all sorts of different ways. If the hospital maintains its patient records in SGML, then everybody can put the data into the form they want, without the hospital having to do anything except make the raw source data available. If a doctor wishes to see patient information on a palm-top computer, the data can be transformed by a web browser (or browser plug-in, such as SoftQuad's Panorama Viewer©), into display pages, even selecting only the particular kinds of data of interest to the doctor (e.g., the course of symptoms, but not billing information). An insurance provider, on the other hand, can download the billing information over a high-speed data link, and either process it as-is, or run it through a program which loads it into their database management system (DBMS).

What is an SGML document like, exactly?

You have been looking at an SGML document all the time you have been reading this. An SGML document consists of two parts:

Note that a given document type description (DTD) can be used as a template for an indefinite number of document instances (to facilitate this, the DTD is generally kept in a separate computer file, apart from the document instances which use it).

Also note that, if, as is generally done, one edits one's document using an SGML-aware text editing program (such as ArborText's AdeptEditor©, the text editor can make sure the document instance conforms to its type, by not permitting entry of elements (tags) except where they are allowed. Conversely, such an editor can aid the writing process -- possibly even helping overcome writers' blocks -- by telling the writer what kinds of items can be entered at any place in the document being edited.

But isn't SGML a straitjacket, then?

No! SGML is not like a straitjacket (or even like dictionaries and grammar books), because you can create your own document type definitions (DTDs), and, as you write your documents according to a DTD you have created, if you find it doesn't let you do what you want the way you want to do it, you can change the DTD to make it more suited to your purposes. A side-effect of this process is that, because every change you make must be explicitly declared in your DTD, you always have an up-to-date record of your conception of your document's structure, without having to make any extra effort to write extrinsic documentation. A well-written SGML document largely documents itself!

What's the difference between SGML, HTML, and XML?

HTML looks like SGML, but HTML tags are mostly style descriptors rather than structure descriptors. Example: the HTML "<b>" tag says: make the following text appear in bold-face. The HTML "<address>" element, on the other hand, is an example of a structure-describing tag: it doesn't prescribe what the text it contains is supposed to look like, but rather it articulates what that content means.

XML is a simplified, "dumbed-down" variant of SGML. XML tags should generally be structure-describing, but SGML is functionally much richer than XML. Also, XML does not require an explicit document type (document structure) specification (DTD). This makes XML easier to code, but it also provides less motivation and assistance to think about the structure of one's document, since, in an XML document instance, you can make up new (or inconsistent) elements as you go along, e.g., calling a quotation here a: "<quote>", there a: "<quotation>", elsewhere a: "<quot>", etc.

Conclusion

To paraphrase Martin Luther: "A mighty fortress is our SGML!"


Return to SGML document introducing you to SGML
Go to *Darwin Among the Machines* (Susanne Langer and SGML) page

[ Go to Site Map! ] Return to Brad McCormick's home page
Go to Brad McCormick's website map

[ HTML 3.2 Checked! Test me! ]
http://www.cloud9.net/~bradmcc/WhatIsSGML.html
(Page generated by OmniMark script: heuristic.xom, v1.31, 01 June 1999)
Brad McCormick, Ed.D.
bradmcc@cloud9.net
01 June 1999