Using reStructuredText with ht2html

Author: Ollie Rutherfurd
Contact: oliver@rutherfurd.net
Version: 1.3
Date: 2003-02-26

Abstract

Almost all of the content of this site is written in reStructuredText then converted to HTML pages with a common set of navigation links and a consistent look and feel, using Docutils and ht2html. This is a guide to how I've done this with these tools.

Attention!

This articles refers to a previous version of this site.

Introduction

At the end of August, 2002, I moved from Arlington, VA to New York City, NY. When I moved I lost my DSL connection and though I recently got a nice fast cable connection, I no longer have a static IP at home. As I could no longer count on running my site off my machine at home, I had to move it somewhere else.

My old site, much like the new one, was a small collection of pages. It had some recipes I like , macros I've written for jEdit, a couple Python modules I've written, a couple little programs I written, and some pictures of family and friends. It grew organically, and though portions of it were easy to manage, as a whole it didn't look consistent and was a hassle to update and maintain.

I decided that before moving the pages to a new server, I wanted to re-work how I created the site to make it:

  1. Have a consistent look and feel
  2. Easy to modify and maintain

I was in the process of trying to figure out the easiest way to do this when I read an announcement by Barry Warsaw about ht2html. ht2html is the script used to generate www.python.org, www.list.org, and others. ht2html can do almost everything I wanted. It takes care of creating section and navigation links and giving pages a consistent look and feel. The one hangup is that it expects ".ht" files, which are essentially HTML files with a couple headers and the body content of the page. For me, part of being quick and easy to update and maintain, is that I don't want to edit HTML. I'd much rather just write in plain text have it magically converted to HTML. That's where reStructuredText fits it and that what I ended up doing.

I wrote a Writer for Docutils which takes reStructuredText input and creates ".ht" files which can be converted into HTML pages by ht2html. I created the files needed by ht2html to define the structure and look of my site, and lastly I wrote a python script to convert ".txt" files to ".ht" files and the ".ht" files to HTML pages. The end result is a site that has a consistent look and feel and is easy to update, created from a collection of plain text files.

This is a guide to how I've done this with these tools. I will take you through what files I created for ht2html and some of the stumbling blocks I hit along the way and how I worked through them.

I'll be the first to admit that I'm new to using ht2html. I've just started using it and don't really know the ins and outs of the tools, so if you spot anything that doesn't look correct, could be done a better way, or that I've left out, please let me know!

The Tools

Here are the tools used to create this site:

Python

If you're reading this, it's probably save to assume that you know what Python is. However, just in case:

Python is a freely available, very-high-level, interpreted language developed by Guido van Rossum. It combines a clear syntax with powerful (but optional) object-oriented semantics. Python is available for almost every computer platform you might find yourself working on, and has strong portability between platforms.

In short, Python rocks.

Both Docutils and ht2html are written in Python, so you must have a Python interpreter installed to use either of them.

See http://www.python.org/ for more information and downloads.

Note: Docutils requires Python 2.1 or greater.

Docutils

Docutils is a set of tools, written in Python, for processing plain text into other formats such as HTML, XML, and TeX.

See http://docutils.sourceforge.net/ for more information and downloads.

ht2html

ht2html is a web page template processor written in Python. The following are examples of sites created with ht2html:

See http://ht2html.sourceforge.net/ for more information and downloads.

Setup

Python

Download and install Python from http://www.python.org/.

Note

If you're running windows, you might want to install ActivePython which is available from ActiveState.

Docutils

Download and install the latest Docutils snapshot from: http://docutils.sourceforge.net/docutils-snapshot.tgz.

Install by unzipping, then from within the created directory run:

python setup.py install

".ht" Writer

Until the ".ht" Writer is part of the Docutils core, it can be downloaded and installed from: http://www.rutherfurd.net/software/rst2ht/.

Install by unzipping, then from within the created directory, run:

python setup.py install

rst2ht.py will be installed into the "scripts" in Python's installation directory on Windows and "/usr/bin" on Linux.

ht2html

Download ht2html from: http://sourceforge.net/project/showfiles.php?group_id=46757. ht2html does not have a "setup.py", so just plunk it somewhere. You can just execute "ht2html.py" from wherever you put it.

Using ht2html

As mentioned in Introduction, I wanted to use ht2html to create the site, to get a consistent look and feel that was easy to modify and maintain.

Using ht2html requires doing 3 things:

  1. Defining your site's sections and section links.
  2. Creating a Style class to control the creation of your site's pages.
  3. Creating the HTML pages from ".ht" files.

Defining Site Sections and Links

ht2html allows you to define your site's sections and section links in a single file called "links.h". "links.h" should be placed into the root directory of the site, which is the easiest spot from which to run "ht2html.py".

A "links.h" consist of sections which are in the format:

<h3>{SECTION NAME}</h3>

where {SECTION NAME} is the name of the section and section links which are in the format:

<li><a href="{URL}">{NAME}</a>

where {URL} is the link target and {NAME} is the visible text for the link.

Note, a section may also be a link, as done below.

Here's an example of 2 sections each with 2 section links:

<h3><a href="%(rootdir)sjEdit">jEdit</a></h3>
<li><a href="%(rootdir)sjEdit/macros/index.html">Macros</a>
<li><a href="%(rootdir)sjEdit/plugins/index.html">Plugins</a>
<h3>Python</h3>
<li><a href="%(rootdir)spython/winreg/index.html">winreg</a>
<li><a href="%(rootdir)spython/properties/index.html">properties</a>

In the above example,``%(rootdir)s`` will be replaced with the path to the root of the site when each page is created by "ht2html.py". This way the same links.h can be used for all pages in the site.

Important

In "links.h", you must wrap your href tag attributes with double-quotes. If you use single-quotes, they will be included as part of the href and your links won't work.

Here's my complete "links.h" file:

<h3><a href="%(rootdir)sarticles/index.html">Articles</a></h3>
<li><a href="%(rootdir)sarticles/rst-ht2html.html">reStructuredText with ht2html</a>

<h3><a href="%(rootdir)scookbook/index.html">Cookbook</a></h3>

<h3><a href="%(rootdir)sgentoo/index.html">Gentoo</a></h3>
<li><a href="%(rootdir)sgentoo/cfghelper/index.html">cfghelper</a>

<h3><a href="%(rootdir)sjEdit/index.html">jEdit</a></h3>
<li><a href="%(rootdir)sjEdit/jo/index.html">jo</a>
<li><a href="%(rootdir)sjEdit/launcher/index.html">Launcher</a>
<li><a href="%(rootdir)sjEdit/macros/index.html">Macros</a>
<li><a href="%(rootdir)sjEdit/modes/index.html">Modes</a>
<li><a href="%(rootdir)sjEdit/plugins/index.html">Plugins</a>

<h3><a href="%(rootdir)spython/index.html">Python</a></h3>
<li><a href="%(rootdir)spython/properties/index.html">properties</a>
<li><a href="%(rootdir)spython/sendkeys/index.html">SendKeys</a>
<li><a href="%(rootdir)spython/winreg/index.html">winreg</a>

<h3><a href="%(rootdir)ssoftware/index.html">Software</a></h3>
<li><a href="%(rootdir)ssoftware/rst2chm/index.html">rst2chm</a>
<li><a href="%(rootdir)ssoftware/rst2ht/index.html">rst2ht</a>
<li><a href="%(rootdir)ssoftware/xchecker/index.html">xChecker</a>

<h3>Site</h3>
<li><a href="%(rootdir)ssite/search.html">Search</a>
<li><a href="%(rootdir)ssite/sitemap.html">Site Map</a>
<li><a href="%(rootdir)ssite/changes.html">Changes</a>

<h3><a href="%(rootdir)sweblog/index.html">Weblog</a></h3>
<li><a href="%(rootdir)sweblog/archive.html">Archive</a>

Defining Site Look and Feel

To customize how "ht2html.py" will generate each page, you must specify a Style class for it to use. The style class controls the generation of the HTML page from the ".ht" template,including what colors are used, what stylesheet the page should use, whether the page should have a SideBar or Banner, and other options.

To create my Style class, RutherfurdDotNet I started by grabbing code from the existing Style classes provided with ht2html and changing things like colors. I then made some modifications to make the pages appear how I wanted them.

I wanted all pages to use a stylesheet and I wanted the stylesheet to be specified in the ".ht" template, so I overrode get_stylesheet() like so:

def get_stylesheet(self):
    default = '%(rootdir)sdefault.css' % self.__d
    return self.__parser.get('stylesheet',default)

Also, instead of using the Banner component, for "Site Links" I wanted a trail of breadcrumbs for each page. To do this,. I created a BreadCrumbBanner class , which generates the HTML for a breadcrumb trail to the page being generated. In RutherfurdDotNet, I use it instead of Banner:

def get_banner(self):
    return BreadCrumbBanner.get_banner(self)

Note

The name of the Style class must be the same as the name of the module containing the class. For example, my Style class is RutherfurdDotNet and the filename is "RutherfurdDotNet.py".

Here's an overview of the modifications I made:

BreadCrumbBanner

Rather than having the standard "Site Links" along the top of the page in the "Banner" section, I wanted to have breadcrumbs. So if the user is looking at the "Macros" page in the "jEdit" section, in the "Banner" they will see:

Home >> jEdit >> Macros

where the first 2 will be links to their respective locations.

Section Links

I wanted to allow "Section" headers in the left Sidebar to be links. For example, the "Cookbook" section header should be a link to the main Cookbook page. To do this, I added the following code to generator's __init__ method.:

for i in range(0, len(p.sidebar)):
    if type(p.sidebar[i]) in StringTypes:
        p.sidebar[i] = p.sidebar[i] % self.__d

Looking at the code for SideBar, it seems that this is the way to tell whether a link is a section or a link within a section. Here's an example, from "links.h" of a section which is a link:

<h3><a href="%(rootdir)scookbook/index.html">Cookbook</a></h3>

Here's the entire source of "RutherfurdDotNet.py":

"""
ht2html Style class for creating rutherfurd.net.

Ollie Rutherfurd <oliver@rutherfurd.net>

$Id$
"""

import os
import posixpath
import sys
import time
from types import StringTypes
try:
    from cStringIO import StringIO
except IOError:
    from StringIO import StringIO

from HTParser import HTParser
from LinkFixer import LinkFixer
from Sidebar import Sidebar, BLANKCELL
from Skeleton import Skeleton


class BreadCrumbBanner:

    """
    Banner class which generates a trail of "bread crumbs"
    instead of site  links.

    It's not very efficient, but works well enough for
    my purposes.
    """

    def __init__(self, filename, rootdir):
        self.__filename = filename
        self.__rootdir = rootdir

    def get_banner(self):
        stdout = sys.stdout
        html = StringIO()
        try:
            sys.stdout = html
            self.__do_breakcrumbs()
        finally:
            sys.stdout = stdout
        return html.getvalue()

    def __do_breakcrumbs(self):
        # split filename into path elements
        chunks = self.__filename.split(os.sep)

        # prevent main section pages from including
        # a link to themselves.
        if self.__filename.endswith('index.html'):
            sub = 1
        else:
            sub = 0

        for i in range(1,len(chunks)-sub):

            # get directory name
            dirname = os.sep.join(chunks[:i])
            # check index page in directory
            filename = os.path.join(dirname, 'index.ht')
            # if index page found, use that, otherwise
            # use the name of the directory to display
            if os.path.exists(filename):
                href = filename + 'ml'
                p = HTParser(filename)
                title = p.get('title')
            else:
                href = dirname
                title = dirname.split(os.sep)[-1]

            # HACK: want a different title for homepage
            # (actual title is 'Welcome to Rutherfurd.net')
            if title.find('Welcome') > -1:
                title = 'Home'

            # get href for current link
            href = self.get_link_path(href, self.__rootdir)
            print '<a href="%(href)s">%(title)s</a>' % locals() + ' <b>&#187;</b> ',

        # get current pagename title (removing 'ml' from '.html' in file extension)
        p = HTParser(self.__filename[:-2])
        print p.get('title')

    def get_link_path(self, href, root):
        # running on windows...
        href = href.replace('\\','/')
        path = posixpath.join(root,href)
        return posixpath.normpath(path)


class RutherfurdDotNet(Skeleton, Sidebar, BreadCrumbBanner):

    """
    Style class to convert an ".ht" template to HTML for
    rutherfurd.net.
    """

    AUTHOR = 'Ollie&nbsp;Rutherfurd'
    EMAIL = 'oliver&#64;rutherfurd.net'

    def __init__(self, filename, rootdir, relthis):
        root,ext = os.path.splitext(filename)
        html = root + '.html'
        p = self.__parser = HTParser(filename, self.AUTHOR, self.EMAIL)
        self.__body = None
        self.__linkfixer = LinkFixer(html, rootdir, relthis)

        BreadCrumbBanner.__init__(self, html, rootdir)

        # Calculate the sidebar links, adding a few of our own.
        self.__d = {'rootdir': rootdir}
        p.process_sidebar()
        p.sidebar.append(BLANKCELL)
        # It is important not to have newlines between the img tag and the end
        # end center tags, otherwise layout gets messed up.
        p.sidebar.append(('http://www.jedit.org','''
<center>
<img src="http://www.jedit.org/made-with-jedit-9.png"
alt="Crafted with jEdit" border="0" width="120"
height="40"></center>''' % self.__d))    # substitute '%(rootdir)s'
        p.sidebar.append(BLANKCELL)
        p.sidebar.append(('http://www.python.org/', '''
<center>
    <img alt="[Python Powered]" border="0"
         src="%(rootdir)s/images/PythonPowered.png"></center>
    ''' % self.__d))    # substitute '%(rootdir)s'
        self.__linkfixer.massage(p.sidebar, self.__d)
        Sidebar.__init__(self, p.sidebar)
        p.sidebar.append(BLANKCELL)
        copyright = self.__parser.get('copyright', '1999-%d' %
                                      time.localtime()[0])
        p.sidebar.append((None, '&copy; ' + copyright))
        self.__linkfixer.massage(p.sidebar)
        Sidebar.__init__(self, p.sidebar)
        # kludge!
        for i in range(len(p.sidebar)-1, -1, -1):
            if p.sidebar[i] == 'Email Us':
                p.sidebar[i] = 'Email me'
                break

        # another kludge, allow for section title as a link
        # since I'm including '%(rootdir)s in the link
        # I want that replaced with the value of rootdir
        for i in range(0, len(p.sidebar)):
            if type(p.sidebar[i]) in StringTypes:
                # substitute '%(rootdir)s'
                p.sidebar[i] = p.sidebar[i] % self.__d

    def get_title(self):
        return self.__parser.get('title')

    def get_sidebar(self):
        return Sidebar.get_sidebar(self)

    def get_corner(self):
        rootdir = self.__linkfixer.rootdir()
        return """
<center><a href="%(rootdir)sindex.html">Home</a></center>
    """ % self.__d  # substitute '%(rootdir)s'

    def get_corner_bgcolor(self):
        return '#ecebeb'

    def get_banner(self):
        return BreadCrumbBanner.get_banner(self)

    def get_body(self):
        if self.__body is None:
            self.__body = self.__parser.fp.read()
        return self.__body

    def get_lightshade(self):
        return '#ecebeb'

    def get_mediumshade(self):
        return '#7a8d9f'

    def get_darkshade(self):
        return '#8eb5c8'

    def get_stylesheet(self):
        # default stylesheet, substitute '%(rootdir)s'
        default = '%(rootdir)sdefault.css' % self.__d
        return self.__parser.get('stylesheet',default)

    def get_charset(self):
        return 'utf-8'


# :sidekick.parser=python:
# :indentSize=4:lineSeparator=\n:noTabs=true:tabSize=4:

Creating ".ht" Files

To create ".ht" files for "ht2html.py", I'm using reStructuredText and the ".ht" Writer I wrote for Docutils.

For example, here's the reStructuredText source for the home page of the site:

=========================
Welcome to Rutherfurd.net
=========================

Recent Changes
==============

.. include:: _recent_changes.txt

Complete `history of changes <site/changes.html>`_.

.. :lineSeparator=\n:noTabs=true:tabSize=4:

Here's a link to the created file, index.ht

"index.ht" was created using the following command:

rst2ht.py -g -t -s index.txt index.ht

Building the Site

The site is built using a Python script, makesite.py, which does the following:

  1. Find all ".txt" files within the site directory tree and convert from from reStructuredText, to ".ht" templates, using rst2ht.py.

    makesite.py calls rst2ht.py for each ".txt" file found within the site directory tree with the following arguments:

    • -g (let people know the site is created by Docutils)
    • -t (date & time page was generated)
    • -s (provide a link to view reStructuredText source)
    • --stylesheet (path to default.css in site root directory this is calculated for each file.
    • --base-section=3: This tells the ".ht" writer to use "H3" as the base section for sections within the generated page.

Note

When run, makesite.py stores checksums of all ".txt" files it finds so that the next time it runs, it only needs to convert files that have changed.

  1. Find all ".ht" files within the site directory tree and convert them from ".ht" files to HTML using "ht2html.py" and "RutherfurdDotNet.py".

    Tip

    The main hangup I had using "ht2html.py" was that it was creating all links relative to the directory of the ".ht" file being processed, not the directory I was running "ht2html.py" from. This wreaked havoc on the links in "links.h" as they are relative to the site root. As a workaround, "makesite.py" calculates rootdir and pass it to "ht2html.py" for each file being converted from ".ht" to HTML.

  2. Delete all ".ht" files created.

makesite.py

Here's "makesite.py", the script that creates the site:

"""
makesite.py - A script for generating rutherfurd.net
from a collection of reStructuredText files to ".ht"
to files which are converted to HTML using ``ht2html.py``.

To make generating the site faster, MD5 digests are
created for all found text files and saved, so that
only files changed between generation times will need
to be converted from ".txt" -> ".ht" -> ".html".

Note that one may pass "--force" to regenerate files
whether they've changed or not.

$Id$
"""

import fnmatch
import getopt
import md5
import os
import sys

try:
    True
except NameError:
    True,False = 1,0

# default section number H{x} used for sections
BASE_SECTION = 3

# site root, relative to this script
ROOTDIR = '.'

# file containing MD5 digests for '.txt' files
DIGEST_FILE = 'digests'

# path to "ht2html.py" script
HT2HTML = 'c:\\bin\\ht2html-2.0\\ht2html.py'

# path to "rst2ht.py" script
RST2HT = 'rst2ht.py'

# python style class to pass to "ht2html"
STYLECLASS = 'RutherfurdDotNet'

# stylesheet to pass to "rst2ht.py"
STYLESHEET='default.css'


def findFiles(rootdir,pattern):
    """
    Returns a list of files within `rootdir` matching
    the glob `pattern`.
    """
    found = []
    def callback((found,pattern), directory, files):
        is_hidden = False
        # check if we're in a directory tree where one
        # of the dirs starts with '.' -- if so, ignore
        # this directory and these files
        for d in directory.split(os.sep):
            if len(d) > 1 and d[0].startswith('.'):
                is_hidden = True
        if is_hidden:
            return
        found.extend([os.path.join(directory,f) \
            for f in files \
            if fnmatch.fnmatch(f,pattern)])
    os.path.walk(rootdir, callback, (found,pattern))
    return found

def getPathToRoot(path,sub=2):
    """
    Returns relative path to `root` given path

    >>> from makesite import getPathToRoot
    >>> getPathToRoot('.\\cookbook\\index.txt')
    '../'
    >>> getPathToRoot('.\\jEdit\\plugins\\editorscheme\\index.txt')
    '../../../'
    >>>
    """
    root = '../' * (len(path.replace('\\','/').split('/')) - sub)
    if not root:
        root = './' # root directory must be relative to current dir
    return root

def loadDigests(rootdir):
    """
    Returns dict of MD5 digests for '.txt' files.

    {filename: hexdigest[,...]}
    """
    digests = {}
    try:
        f = open(os.path.join(rootdir,DIGEST_FILE))
        for line in f.xreadlines():
            # ignore blank lines
            if not line.strip():
                continue
            # format is {hexdigest}\t{filepath}
            digest,name = line.strip().split('\t')
            digests[name] = digest
        f.close()
    except IOError, e:
        pass
    return digests

def saveDigests(rootdir,digests):
    """
    Saves MD5 hex digests for files in `rootdir`.

    File format is::

        {digest}\t{filepath}
        ...
        {digest}\t{filepath}

    """
    f = open(os.path.join(rootdir,DIGEST_FILE),'w')
    _digests = digests.items()
    _digests.sort()
    for name,digest in _digests:
        f.write('%s\t%s\n' % (digest,name))
    f.close()

def getDigest(path):
    """
    Creates md5 hex digest for `path`.
    """
    m = md5.new()
    f = open(path)
    m.update(f.read())
    f.close()
    return m.hexdigest()

def isUpToDate(path,digests):
    """
    Checks whether a file's contents has changed since the last
    time this script was run.  If a digest is not found, `False`
    is returned.

    Adds the current digest to `digests` regardless of whether
    the file is up to date or not.
    """
    digest = getDigest(path)
    if digest != digests.get(path,''):
        digests[path] = digest
        return False
    return True

def main(forceall,clean):

    path = ROOTDIR
    digests = loadDigests(ROOTDIR)

    # reST to .ht (files starting with ``_`` are private)
    txt_files = findFiles(path, '[A-z]*.txt')
    for txt_file in txt_files:
        if not forceall and isUpToDate(txt_file,digests):
            continue
        rootdir = getPathToRoot(txt_file)
        ht_file = os.path.splitext(txt_file)[0] + '.ht'
        cmd = 'c:\\python234\\python.exe %s -g -t -s  --report=3 --stylesheet=%s --base-section=%d %s %s' % \
            (RST2HT, rootdir + STYLESHEET, BASE_SECTION, txt_file, ht_file,)
        print cmd
        os.system(cmd)

    # .ht to HTML (files starting with ``_`` are private)
    ht_files = findFiles(path, '[A-z]*.ht')
    for ht_file in ht_files:
        if not forceall and isUpToDate(ht_file,digests):
            continue
        rootdir = getPathToRoot(ht_file)
        cmd = 'c:\\python234\\python.exe "%s" -r %s -s %s %s' % \
            (HT2HTML, rootdir, STYLECLASS, ht_file,)
        print cmd
        os.system(cmd)

    # remove ".ht" files
    if clean:
        for f in findFiles(path, '*.ht'):
            print "removing %s" % f
            os.unlink(f)

    print 'saving digests...'
    saveDigests(ROOTDIR,digests)
    print 'OK'

def usage():
    print os.path.split(sys.argv[0])[-1] + '-h|--help -f|--force -c|--clean'

if __name__ == '__main__':
    # by default don't force a rebuild of all pages
    forceall = False
    clean = False
    try:
        opts,args = getopt.getopt(sys.argv[1:], "-hfc", ["help","force","clean"])
    except getopt.GetoptError:
        usage()
        sys.exit(1)
    for o,a in opts:
        if o in ('-h','--help'):
            usage()
            sys.exit()
        if o in ('-f','--force'):
            forceall = True
        elif o in ('-c','--clean'):
            clean = True

    main(forceall,clean)

# :indentSize=4:lineSeparator=\n:noTabs=true:tabSize=4:

Downloads

Here are links to download files referenced in this article.

Feedback

I'd appreciate any feedback or comments on this article. Also, I'd be happy to answer any questions readers might have about using these tools.

History

  • 2003-08-1
    • Packaged rst2ht for easier installation & updated doc accordingly.
  • 2003-02-25
    • fixed links.h error: don't need to end links with </li>.