DocBook is a flavour of SGML
(Standard Generalized Markup
Language), a syntax for marking up the contents
of documents. HTML is another very common flavour of SGML;
DocBook markup looks very similar to HTML markup, although
the names of the markup tags differ.
Why SGML?: The simple answer to that is that SGML allows you
to create multiple formats of a given document from a single
source. Currently it is used to create HTML, PDF, PS
(PostScript) and Text versions of the Wine books.
What do I need?: You need the SGML tools. There are various places where you
can get them. The most generic way of getting them is from their
source as discussed below.
Quick instructions: These are the basic steps to create the Wine books from the SGML source.
Most Linux distributions have everything you need already
bundled up in package form. Unfortunately, each
distribution seems to handle its SGML environment
differently, installing it into different paths, and
naming its packages according to its own whims.
SGML markup contains a number of syntactical elements that
serve different purposes in the markup. We'll run through
the basics here to make sure we're on the same page when
we refer to SGML semantics.
The basic currency of SGML is the
tag. A simple tag consists of a
pair of angle brackets and the name of the tag. For
example, the para tag would appear in
an SGML document as <para>. This start tag indicates
that the immediately following text should be classified
according to the tag. In regular SGML, each opening tag
must have a matching end tag to show where the start tag's
contents end. End tags begin with
"</" markup, e.g.,
</para>.
The combination of a start tag, contents, and an end tag
is called an element. SGML
elements can be nested inside of each other, or contain
only text, or may be a combination of both text and other
elements, although in most cases it is better to limit
your elements to one or the other.
The XML (eXtensible Markup
Language) specification, a modern subset of
the SGML specification, adds a so-called empty
tag, for elements that contain no text
content. The entire element is a single tag, ending with
"/>", e.g.,
<xref/>. However, use of this
tag style restricts you to XML DocBook processing, and
your document may no longer compile with SGML-only
processing systems.
Often a processing system will need more information about
an element than you can provide with just tags. SGML
allows you to add extra "hints" in the form
of SGML attributes to pass along
this information. The most common use of attributes in
DocBook is giving specific elements a name, or an ID, so
you can refer to it from elsewhere. This ID can be used
for many things, including file-naming for HTML output,
hyper-linking to specific parts of the document, and even
pulling text from that element (see the <xref> tag).
An SGML attribute appears inside the start tag, between
the < and > brackets. For example, if you wanted to
set the id attribute
of the <book> element to
"mybook", you would create a start tag like
this:
<book id="mybook">
Notice that the contents of the attribute are enclosed in
quote marks. These quotes are optional in SGML, but
mandatory in XML. It's a good habit to use quotes, as it
will make it much easier to migrate your documents to an
XML processing system later on.
You can also specify more than one attribute in a single
tag:
<book id="mybook" status="draft">
Another commonly used type of SGML markup is the
entity. An entity lets you
associate a block of text with a name. You declare the
entity once, at the beginning of your document, and can
invoke it as many times as you like throughout the
document. You can use entities as shorthand, or to make
it easier to maintain certain phrases in a central
location, or even to insert the contents of an entire file
into your document.
An entity in your document is always surrounded by the
"&" and ";" characters. One
entity you'll need sooner or later is the one for the
"<" character. Since SGML expects all
tags to begin with a "<", the
"<" is a reserved character. To use it in
your document (as I am doing here), you must insert it
with the < entity. Each time
the SGML processor encounters <,
it will place a literal "<" in the output
document. Similarly you must use the >
and & entities for the
">" and "&" characters.
The final term you'll need to know when writing simple
DocBook documents is the DTD
(Document Type Declaration). The
DTD defines the flavour of SGML a given document is written
in. It lists all the legal tag names, like <book>, <para>, and so on, and declares
how those tags are allowed to be used together. For
example, it doesn't make sense to put a <book> element inside a <para> paragraph element -- only
the reverse makes sense.
The DTD thus defines the legal structure of the document.
It also declares which attributes can be used with which
tags. The SGML processing system can use the DTD to make
sure the document is laid out properly before attempting
to process it. SGML-aware text editors like
Emacs can also use the DTD to
guide you while you write, offering you choices about
which tags you can add in different places in the
document, and beeping at you when you try to add a tag
where it doesn't belong.
Generally, you will declare which DTD you want to use as
the first line of your SGML document. In the case of
DocBook, you will use something like this:
<!doctype book PUBLIC "-//OASIS//DTD
DocBook V3.1//EN" []> <book> ...
</book>
Note that you must specify your toplevel element inside
the doctype declaration. If you were writing an article
rather than a book, you might use this declaration instead:
<!doctype article PUBLIC "-//OASIS//DTD DocBook V3.1//EN" []>
<article>
...
</article>
Once you're comfortable with SGML, creating a DocBook
document is quite simple and straightforward. Even
though DocBook contains over 300 different tags, you can
usually get by with only a small subset of those tags.
Most of them are for inline formatting, rather than for
document structuring. Furthermore, the common tags have
short, intuitive names.
Below is a (completely nonsensical) example to illustrate
how a simple document might be laid out. Notice that all
<chapter> and <sect1> elements have id attributes. This is not
mandatory, but is a good habit to get into, as DocBook is
commonly converted into HTML, with a separate generated
file for each <book>,
<chapter>, and/or <sect1> element. If the given
element has an id
attribute, the processor will typically name the file
accordingly. Thus, the below document might result in
index.html,
chapter-one.html,
blobs.html, and so on.
Also notice the text marked off with "<!--
" and " -->" characters. These
denote SGML comments. SGML processors will completely
ignore anything between these markers, similar to
"/*" and "*/" comments in C
source code.
<!doctype book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" []>
<book id="index">
<bookinfo>
<title>A Poet's Guide to Nonsense</title>
</bookinfo>
<chapter id="chapter-one">
<title>Blobs and Gribbles</title>
<!-- This section contains only one major topic -->
<sect1 id="blobs">
<title>The Story Behind Blobs</title>
<para>
Blobs are often mistaken for ice cubes and rain
puddles...
</para>
</sect1>
<!-- This section contains embedded sub-sections -->
<sect1 id="gribbles">
<title>Your Friend the Gribble</title>
<para>
A Gribble is a cute, unassuming little fellow...
</para>
<sect2 id="gribble-temperament">
<title>Gribble Temperament</title>
<para>
When left without food for several days...
</para>
</sect2>
<sect2 id="gribble-appearance">
<title>Gribble Appearance</title>
<para>
Most Gribbles have a shock of white fur running from...
</para>
</sect2>
</sect1>
</chapter>
<chapter id="chapter-two">
<title>Phantasmagoria</title>
<sect1 id="dretch-pools">
<title>Dretch Pools</title>
<para>
When most poets think of Dretch Pools, they tend to...
</para>
</sect>
</chapter>
</book>
Once you get used to the syntax of SGML, the next hurdle
in writing DocBook documentation is to learn the many
DocBook-specific tag names, and when to use them. DocBook
was created for technical documentation, and as such, the
tag names and document structure are slanted towards the
needs of such documentation.
To cover its target audience, DocBook declares a wide
variety of specialized tags, including tags for formatting
source code (with somewhat of a C/C++ bias), computer
prompts, GUI application features, keystrokes, and so on.
DocBook also includes tags for universal formatting needs,
like headers, footnotes, tables, and graphics.
We won't cover all of these elements here (over 300
DocBook tags exist!), but we will cover the basics. To
learn more about the other tags, check out the official
DocBook guide, at http://docbook.org. To
see how they are used in practice, download the SGML
source for this manual (the Wine Developer Guide) and
browse through it, comparing it to the generated HTML (or
PostScript or PDF).
There are often many correct ways to mark up a given piece
of text, and you may have to make guesses about which tag
to use. Sometimes you'll have to make compromises.
However, remember that it is possible to further customize
the output of the SGML processors. If you don't like the
way a certain tag looks in HTML, that doesn't mean you
should choose a different tag based on its output formatting.
The processing stylesheets can be altered to fix the
formatting of that same tag everywhere in the document
(not just in the place you're working on). For example,
if you're frustrated that the <systemitem> tag doesn't produce
any formatting by default, you should fix the stylesheets,
not change the valid <systemitem> tag to, for example,
an <emphasis> tag.
Here are the common SGML elements:
Structural Elements
<book>
The book is the most common toplevel element, and is
probably the one you should use for your document.
<set>
If you want to group more than one book into a
single unit, you can place them all inside a set.
This is useful when you want to bundle up
documentation in alternate ways. We do this with
the Wine documentation, using
<book> to
put each Wine guide into a separate directory (see
documentation/wine-devel.sgml,
etc.).
<chapter>
A <chapter>
element includes a single entire chapter of the
book.
<part>
If the chapters in your book fall into major
categories or groupings (as in the Wine Developer
Guide), you can place each collection of chapters
into a <part>
element.
<sect?>
DocBook has many section elements to divide the
contents of a chapter into smaller chunks. The
encouraged approach is to use the numbered section
tags, <sect1>,
<sect2>, <sect3>, <sect4>, and <sect5> (if necessary).
These tags must be nested in order: you can't place
a <sect3> directly
inside a <sect1>.
You have to nest the <sect3> inside a <sect2>, and so forth.
Documents with these explicit section groupings are
easier for SGML processors to deal with, and lead to
better organized documents. DocBook also supplies a
<section> element
which you can nest inside itself, but its use is
discouraged in favor of the numbered section tags.
<title>
The title of a book, chapter, part, section, etc.
In most of the major structural elements, like
<chapter>,
<part>, and the
various section tags, <title> is mandatory. In
other elements like <book> and <note>, it's optional.
<para>
The basic unit of text is the paragraph, represented
by the <para> tag.
This is probably the tag you'll use most often. In
fact, in a simple document, you can probably get
away with using only <book>, <chapter>, <title>, and <para>.
<article>
For shorter, more targeted documents, like topic
pieces and whitepapers, you can use <article> as your toplevel
element.
Inline Formatting Elements
<filename>
The name of a file. You can optionally set the
class attribute
to Directory,
HeaderFile, and
SymLink to further classify the
filename.
<userinput>
Literal text entered by the user.
<computeroutput>
Literal text output by the computer.
<literal>
A catch-all element for literal computer data. Its
use is somewhat vague; try to use a more specific
tag if possible, like <userinput> or <computeroutput>.
<quote>
An inline quotation. This tag typically inserts
quotation marks for you, so you would write <quote>This is a
quote</quote> rather
than "This is a quote". This usage may be a little
bulkier, but it does allow for automated formatting
of all quoted material in the document. Thus, if
you wanted all quotations to appear in italic, you
could make the change once in your stylesheet,
rather than doing a search and replace throughout
the document. For larger chunks of quoted text, you
can use <blockquote>.
<note>
Insert a side note for the reader. By default, the
SGML processor usually prefixes the content with
"Note:". You can change this text by adding a
<title> element.
Thus, to add a visible FIXME comment to the
documentation, you might write:
<note>
<title>EXAMPLE</title>
<para>This is an example note...</para>
</note>
The results will look something like this:
EXAMPLE: This is an example note...
<sgmltag>
Used for inserting SGML tags, etc., into a SGML
document without resorting to a lot of entity
quoting, e.g., <. You can change the
appearance of the text with the class attribute. Some
common values of this are
starttag,
endtag,
attribute,
attvalue, and even
sgmlcomment. See this SGML file,
documentation/documentation.sgml,
for examples.
<prompt>
The text used for a computer prompt, for example a
shell prompt, or command-line application prompt.
<replaceable>
Meta-text that should be replaced by the user, not
typed in literally, e.g., in command descriptions
and --help outputs.
<constant>
A programming constant, e.g.,
MAX_PATH.
<symbol>
A symbolic value replaced, for example, by a
pre-processor. This applies primarily to C macros,
but may have other uses. Use the <constant> tag instead of
<symbol> where
appropriate.
<function>
A programming function name.
<parameter>
Programming language parameters you pass with a
function.
<option>
Parameters you pass to a command-line executable.
<varname>
Variable name, typically in a programming language.
<type>
Programming language types, e.g., from a typedef
definition. May have other uses, too.
<structname>
The name of a C-language struct
declaration, e.g., sockaddr.
<structfield>
A field inside a C struct.
<command>
An executable binary, e.g., wine
or ls.
<envar>
An environment variable, e.g, $PATH.
<systemitem>
A generic catch-all for system-related things, like
OS names, computer names, system resources, etc.
<email>
An email address. The SGML processor will typically
add extra formatting characters, and even a
mailto: link for HTML pages.
Usage: <email>user@host.com</email>
<firstterm>
Special emphasis for introducing a new term. Can
also be linked to a <glossary> entry, if
desired.
Item Listing Elements
<itemizedlist>
For bulleted lists, no numbering. You can tweak the
layout with SGML attributes.
<orderedlist>
A numbered list; the SGML processor will insert the
numbers for you. You can suggest numbering styles
with the numeration attribute.
<simplelist>
A very simple list of items, often inlined. Control
the layout with the type attribute.
<variablelist>
A list of terms with definitions or descriptions,
like this very list!
Block Text Quoting Elements
<programlisting>
Quote a block of source code. Typically highlighted
in the output and set off from normal text.
<screen>
Quote a block of visible computer output, like the
output of a command or chunks of debug logs.
Hyperlink Elements
<link>
Generic hypertext link, used for pointing to other
sections within the current document. You supply
the visible text for the link, plus the name of the id attribute of the
element that you want to link to. For example:
<link linkend="configuring-wine">the section on configuring wine</link>
...
<sect2 id="configuring-wine">
...
<xref>
In-document hyperlink that can generate its own
text. Similar to the <link> tag, you use the
linkend
attribute to specify which target element you want
to jump to:
By default, most SGML processors will auto generate
some generic text for the <xref> link, like
"Section 2.3.1". You can use the
endterm
attribute to grab the visible text content of the
hyperlink from another element:
This would create a link to the
configuring-wine element,
displaying the text of the
config-title element for the
hyperlink. Most often, you'll add an id attribute to the
<title> of the
section you're linking to, as above, in which case
the SGML processor will use the target's title text
for the link text.
Alternatively, you can use an xreflabel attribute in
the target element tag to specify the link text:
Note: <xref> is an
empty element. You don't need a closing tag for
it (this is defined in the DTD). In SGML
documents, you should use the form <xref>, while in XML
documents you should use
<xref/>.
<anchor>
An invisible tag, used for inserting id attributes into a
document to link to arbitrary places (i.e., when
it's not close enough to link to the top of an
element).
You can write SGML/DocBook documents in any text editor you
might find although some editors are more friendly for
this task than others.
The most commonly used open source SGML editor is Emacs,
with the PSGML mode, or extension.
Emacs does not supply a GUI or WYSIWYG (What You See Is What
You Get) interface, but it does provide many helpful
shortcuts for creating SGML, as well as automatic
formatting, validity checking, and the ability to create
your own macros to simplify complex, repetitive actions.