| Author: | Ollie Rutherfurd |
|---|---|
| Contact: | oliver@rutherfurd.net |
| Version: | 1.3 |
| Date: | 2003-02-26 |
Abstract
Almost all of the content of this site is written in reStructuredText then converted to HTML pages with a common set of navigation links and a consistent look and feel, using Docutils and ht2html. This is a guide to how I've done this with these tools.
Attention!
This articles refers to a previous version of this site.
At the end of August, 2002, I moved from Arlington, VA to New York City, NY. When I moved I lost my DSL connection and though I recently got a nice fast cable connection, I no longer have a static IP at home. As I could no longer count on running my site off my machine at home, I had to move it somewhere else.
My old site, much like the new one, was a small collection of pages. It had some recipes I like , macros I've written for jEdit, a couple Python modules I've written, a couple little programs I written, and some pictures of family and friends. It grew organically, and though portions of it were easy to manage, as a whole it didn't look consistent and was a hassle to update and maintain.
I decided that before moving the pages to a new server, I wanted to re-work how I created the site to make it:
I was in the process of trying to figure out the easiest way to do this when I read an announcement by Barry Warsaw about ht2html. ht2html is the script used to generate www.python.org, www.list.org, and others. ht2html can do almost everything I wanted. It takes care of creating section and navigation links and giving pages a consistent look and feel. The one hangup is that it expects ".ht" files, which are essentially HTML files with a couple headers and the body content of the page. For me, part of being quick and easy to update and maintain, is that I don't want to edit HTML. I'd much rather just write in plain text have it magically converted to HTML. That's where reStructuredText fits it and that what I ended up doing.
I wrote a Writer for Docutils which takes reStructuredText input and creates ".ht" files which can be converted into HTML pages by ht2html. I created the files needed by ht2html to define the structure and look of my site, and lastly I wrote a python script to convert ".txt" files to ".ht" files and the ".ht" files to HTML pages. The end result is a site that has a consistent look and feel and is easy to update, created from a collection of plain text files.
This is a guide to how I've done this with these tools. I will take you through what files I created for ht2html and some of the stumbling blocks I hit along the way and how I worked through them.
I'll be the first to admit that I'm new to using ht2html. I've just started using it and don't really know the ins and outs of the tools, so if you spot anything that doesn't look correct, could be done a better way, or that I've left out, please let me know!
Here are the tools used to create this site:
If you're reading this, it's probably save to assume that you know what Python is. However, just in case:
Python is a freely available, very-high-level, interpreted language developed by Guido van Rossum. It combines a clear syntax with powerful (but optional) object-oriented semantics. Python is available for almost every computer platform you might find yourself working on, and has strong portability between platforms.
In short, Python rocks.
Both Docutils and ht2html are written in Python, so you must have a Python interpreter installed to use either of them.
See http://www.python.org/ for more information and downloads.
Note: Docutils requires Python 2.1 or greater.
Docutils is a set of tools, written in Python, for processing plain text into other formats such as HTML, XML, and TeX.
See http://docutils.sourceforge.net/ for more information and downloads.
ht2html is a web page template processor written in Python. The following are examples of sites created with ht2html:
See http://ht2html.sourceforge.net/ for more information and downloads.
Download and install Python from http://www.python.org/.
Note
If you're running windows, you might want to install ActivePython which is available from ActiveState.
Download and install the latest Docutils snapshot from: http://docutils.sourceforge.net/docutils-snapshot.tgz.
Install by unzipping, then from within the created directory run:
python setup.py install
Until the ".ht" Writer is part of the Docutils core, it can be downloaded and installed from: http://www.rutherfurd.net/software/rst2ht/.
Install by unzipping, then from within the created directory, run:
python setup.py install
rst2ht.py will be installed into the "scripts" in Python's installation directory on Windows and "/usr/bin" on Linux.
Download ht2html from: http://sourceforge.net/project/showfiles.php?group_id=46757. ht2html does not have a "setup.py", so just plunk it somewhere. You can just execute "ht2html.py" from wherever you put it.
As mentioned in Introduction, I wanted to use ht2html to create the site, to get a consistent look and feel that was easy to modify and maintain.
Using ht2html requires doing 3 things:
ht2html allows you to define your site's sections and section links in a single file called "links.h". "links.h" should be placed into the root directory of the site, which is the easiest spot from which to run "ht2html.py".
A "links.h" consist of sections which are in the format:
<h3>{SECTION NAME}</h3>
where {SECTION NAME} is the name of the section and section links which are in the format:
<li><a href="{URL}">{NAME}</a>
where {URL} is the link target and {NAME} is the visible text for the link.
Note, a section may also be a link, as done below.
Here's an example of 2 sections each with 2 section links:
<h3><a href="%(rootdir)sjEdit">jEdit</a></h3>
<li><a href="%(rootdir)sjEdit/macros/index.html">Macros</a>
<li><a href="%(rootdir)sjEdit/plugins/index.html">Plugins</a>
<h3>Python</h3>
<li><a href="%(rootdir)spython/winreg/index.html">winreg</a>
<li><a href="%(rootdir)spython/properties/index.html">properties</a>
In the above example,``%(rootdir)s`` will be replaced with the path to the root of the site when each page is created by "ht2html.py". This way the same links.h can be used for all pages in the site.
Important
In "links.h", you must wrap your href tag attributes with double-quotes. If you use single-quotes, they will be included as part of the href and your links won't work.
Here's my complete "links.h" file:
<h3><a href="%(rootdir)sarticles/index.html">Articles</a></h3>
<li><a href="%(rootdir)sarticles/rst-ht2html.html">reStructuredText with ht2html</a>
<h3><a href="%(rootdir)scookbook/index.html">Cookbook</a></h3>
<h3><a href="%(rootdir)sgentoo/index.html">Gentoo</a></h3>
<li><a href="%(rootdir)sgentoo/cfghelper/index.html">cfghelper</a>
<h3><a href="%(rootdir)sjEdit/index.html">jEdit</a></h3>
<li><a href="%(rootdir)sjEdit/jo/index.html">jo</a>
<li><a href="%(rootdir)sjEdit/launcher/index.html">Launcher</a>
<li><a href="%(rootdir)sjEdit/macros/index.html">Macros</a>
<li><a href="%(rootdir)sjEdit/modes/index.html">Modes</a>
<li><a href="%(rootdir)sjEdit/plugins/index.html">Plugins</a>
<h3><a href="%(rootdir)spython/index.html">Python</a></h3>
<li><a href="%(rootdir)spython/properties/index.html">properties</a>
<li><a href="%(rootdir)spython/sendkeys/index.html">SendKeys</a>
<li><a href="%(rootdir)spython/winreg/index.html">winreg</a>
<h3><a href="%(rootdir)ssoftware/index.html">Software</a></h3>
<li><a href="%(rootdir)ssoftware/rst2chm/index.html">rst2chm</a>
<li><a href="%(rootdir)ssoftware/rst2ht/index.html">rst2ht</a>
<li><a href="%(rootdir)ssoftware/xchecker/index.html">xChecker</a>
<h3>Site</h3>
<li><a href="%(rootdir)ssite/search.html">Search</a>
<li><a href="%(rootdir)ssite/sitemap.html">Site Map</a>
<li><a href="%(rootdir)ssite/changes.html">Changes</a>
<h3><a href="%(rootdir)sweblog/index.html">Weblog</a></h3>
<li><a href="%(rootdir)sweblog/archive.html">Archive</a>
To customize how "ht2html.py" will generate each page, you must specify a Style class for it to use. The style class controls the generation of the HTML page from the ".ht" template,including what colors are used, what stylesheet the page should use, whether the page should have a SideBar or Banner, and other options.
To create my Style class, RutherfurdDotNet I started by grabbing code from the existing Style classes provided with ht2html and changing things like colors. I then made some modifications to make the pages appear how I wanted them.
I wanted all pages to use a stylesheet and I wanted the stylesheet to be specified in the ".ht" template, so I overrode get_stylesheet() like so:
def get_stylesheet(self):
default = '%(rootdir)sdefault.css' % self.__d
return self.__parser.get('stylesheet',default)
Also, instead of using the Banner component, for "Site Links" I wanted a trail of breadcrumbs for each page. To do this,. I created a BreadCrumbBanner class , which generates the HTML for a breadcrumb trail to the page being generated. In RutherfurdDotNet, I use it instead of Banner:
def get_banner(self):
return BreadCrumbBanner.get_banner(self)
Note
The name of the Style class must be the same as the name of the module containing the class. For example, my Style class is RutherfurdDotNet and the filename is "RutherfurdDotNet.py".
Here's an overview of the modifications I made:
Rather than having the standard "Site Links" along the top of the page in the "Banner" section, I wanted to have breadcrumbs. So if the user is looking at the "Macros" page in the "jEdit" section, in the "Banner" they will see:
Home >> jEdit >> Macros
where the first 2 will be links to their respective locations.
I wanted to allow "Section" headers in the left Sidebar to be links. For example, the "Cookbook" section header should be a link to the main Cookbook page. To do this, I added the following code to generator's __init__ method.:
for i in range(0, len(p.sidebar)):
if type(p.sidebar[i]) in StringTypes:
p.sidebar[i] = p.sidebar[i] % self.__d
Looking at the code for SideBar, it seems that this is the way to tell whether a link is a section or a link within a section. Here's an example, from "links.h" of a section which is a link:
<h3><a href="%(rootdir)scookbook/index.html">Cookbook</a></h3>
Here's the entire source of "RutherfurdDotNet.py":
"""
ht2html Style class for creating rutherfurd.net.
Ollie Rutherfurd <oliver@rutherfurd.net>
$Id$
"""
import os
import posixpath
import sys
import time
from types import StringTypes
try:
from cStringIO import StringIO
except IOError:
from StringIO import StringIO
from HTParser import HTParser
from LinkFixer import LinkFixer
from Sidebar import Sidebar, BLANKCELL
from Skeleton import Skeleton
class BreadCrumbBanner:
"""
Banner class which generates a trail of "bread crumbs"
instead of site links.
It's not very efficient, but works well enough for
my purposes.
"""
def __init__(self, filename, rootdir):
self.__filename = filename
self.__rootdir = rootdir
def get_banner(self):
stdout = sys.stdout
html = StringIO()
try:
sys.stdout = html
self.__do_breakcrumbs()
finally:
sys.stdout = stdout
return html.getvalue()
def __do_breakcrumbs(self):
# split filename into path elements
chunks = self.__filename.split(os.sep)
# prevent main section pages from including
# a link to themselves.
if self.__filename.endswith('index.html'):
sub = 1
else:
sub = 0
for i in range(1,len(chunks)-sub):
# get directory name
dirname = os.sep.join(chunks[:i])
# check index page in directory
filename = os.path.join(dirname, 'index.ht')
# if index page found, use that, otherwise
# use the name of the directory to display
if os.path.exists(filename):
href = filename + 'ml'
p = HTParser(filename)
title = p.get('title')
else:
href = dirname
title = dirname.split(os.sep)[-1]
# HACK: want a different title for homepage
# (actual title is 'Welcome to Rutherfurd.net')
if title.find('Welcome') > -1:
title = 'Home'
# get href for current link
href = self.get_link_path(href, self.__rootdir)
print '<a href="%(href)s">%(title)s</a>' % locals() + ' <b>»</b> ',
# get current pagename title (removing 'ml' from '.html' in file extension)
p = HTParser(self.__filename[:-2])
print p.get('title')
def get_link_path(self, href, root):
# running on windows...
href = href.replace('\\','/')
path = posixpath.join(root,href)
return posixpath.normpath(path)
class RutherfurdDotNet(Skeleton, Sidebar, BreadCrumbBanner):
"""
Style class to convert an ".ht" template to HTML for
rutherfurd.net.
"""
AUTHOR = 'Ollie Rutherfurd'
EMAIL = 'oliver@rutherfurd.net'
def __init__(self, filename, rootdir, relthis):
root,ext = os.path.splitext(filename)
html = root + '.html'
p = self.__parser = HTParser(filename, self.AUTHOR, self.EMAIL)
self.__body = None
self.__linkfixer = LinkFixer(html, rootdir, relthis)
BreadCrumbBanner.__init__(self, html, rootdir)
# Calculate the sidebar links, adding a few of our own.
self.__d = {'rootdir': rootdir}
p.process_sidebar()
p.sidebar.append(BLANKCELL)
# It is important not to have newlines between the img tag and the end
# end center tags, otherwise layout gets messed up.
p.sidebar.append(('http://www.jedit.org','''
<center>
<img src="http://www.jedit.org/made-with-jedit-9.png"
alt="Crafted with jEdit" border="0" width="120"
height="40"></center>''' % self.__d)) # substitute '%(rootdir)s'
p.sidebar.append(BLANKCELL)
p.sidebar.append(('http://www.python.org/', '''
<center>
<img alt="[Python Powered]" border="0"
src="%(rootdir)s/images/PythonPowered.png"></center>
''' % self.__d)) # substitute '%(rootdir)s'
self.__linkfixer.massage(p.sidebar, self.__d)
Sidebar.__init__(self, p.sidebar)
p.sidebar.append(BLANKCELL)
copyright = self.__parser.get('copyright', '1999-%d' %
time.localtime()[0])
p.sidebar.append((None, '© ' + copyright))
self.__linkfixer.massage(p.sidebar)
Sidebar.__init__(self, p.sidebar)
# kludge!
for i in range(len(p.sidebar)-1, -1, -1):
if p.sidebar[i] == 'Email Us':
p.sidebar[i] = 'Email me'
break
# another kludge, allow for section title as a link
# since I'm including '%(rootdir)s in the link
# I want that replaced with the value of rootdir
for i in range(0, len(p.sidebar)):
if type(p.sidebar[i]) in StringTypes:
# substitute '%(rootdir)s'
p.sidebar[i] = p.sidebar[i] % self.__d
def get_title(self):
return self.__parser.get('title')
def get_sidebar(self):
return Sidebar.get_sidebar(self)
def get_corner(self):
rootdir = self.__linkfixer.rootdir()
return """
<center><a href="%(rootdir)sindex.html">Home</a></center>
""" % self.__d # substitute '%(rootdir)s'
def get_corner_bgcolor(self):
return '#ecebeb'
def get_banner(self):
return BreadCrumbBanner.get_banner(self)
def get_body(self):
if self.__body is None:
self.__body = self.__parser.fp.read()
return self.__body
def get_lightshade(self):
return '#ecebeb'
def get_mediumshade(self):
return '#7a8d9f'
def get_darkshade(self):
return '#8eb5c8'
def get_stylesheet(self):
# default stylesheet, substitute '%(rootdir)s'
default = '%(rootdir)sdefault.css' % self.__d
return self.__parser.get('stylesheet',default)
def get_charset(self):
return 'utf-8'
# :sidekick.parser=python:
# :indentSize=4:lineSeparator=\n:noTabs=true:tabSize=4:
To create ".ht" files for "ht2html.py", I'm using reStructuredText and the ".ht" Writer I wrote for Docutils.
For example, here's the reStructuredText source for the home page of the site:
=========================
Welcome to Rutherfurd.net
=========================
Recent Changes
==============
.. include:: _recent_changes.txt
Complete `history of changes <site/changes.html>`_.
.. :lineSeparator=\n:noTabs=true:tabSize=4:
Here's a link to the created file, index.ht
"index.ht" was created using the following command:
rst2ht.py -g -t -s index.txt index.ht
The site is built using a Python script, makesite.py, which does the following:
Find all ".txt" files within the site directory tree and convert from from reStructuredText, to ".ht" templates, using rst2ht.py.
makesite.py calls rst2ht.py for each ".txt" file found within the site directory tree with the following arguments:
Note
When run, makesite.py stores checksums of all ".txt" files it finds so that the next time it runs, it only needs to convert files that have changed.
Find all ".ht" files within the site directory tree and convert them from ".ht" files to HTML using "ht2html.py" and "RutherfurdDotNet.py".
Tip
The main hangup I had using "ht2html.py" was that it was creating all links relative to the directory of the ".ht" file being processed, not the directory I was running "ht2html.py" from. This wreaked havoc on the links in "links.h" as they are relative to the site root. As a workaround, "makesite.py" calculates rootdir and pass it to "ht2html.py" for each file being converted from ".ht" to HTML.
Delete all ".ht" files created.
Here's "makesite.py", the script that creates the site:
"""
makesite.py - A script for generating rutherfurd.net
from a collection of reStructuredText files to ".ht"
to files which are converted to HTML using ``ht2html.py``.
To make generating the site faster, MD5 digests are
created for all found text files and saved, so that
only files changed between generation times will need
to be converted from ".txt" -> ".ht" -> ".html".
Note that one may pass "--force" to regenerate files
whether they've changed or not.
$Id$
"""
import fnmatch
import getopt
import md5
import os
import sys
try:
True
except NameError:
True,False = 1,0
# default section number H{x} used for sections
BASE_SECTION = 3
# site root, relative to this script
ROOTDIR = '.'
# file containing MD5 digests for '.txt' files
DIGEST_FILE = 'digests'
# path to "ht2html.py" script
HT2HTML = 'c:\\bin\\ht2html-2.0\\ht2html.py'
# path to "rst2ht.py" script
RST2HT = 'rst2ht.py'
# python style class to pass to "ht2html"
STYLECLASS = 'RutherfurdDotNet'
# stylesheet to pass to "rst2ht.py"
STYLESHEET='default.css'
def findFiles(rootdir,pattern):
"""
Returns a list of files within `rootdir` matching
the glob `pattern`.
"""
found = []
def callback((found,pattern), directory, files):
is_hidden = False
# check if we're in a directory tree where one
# of the dirs starts with '.' -- if so, ignore
# this directory and these files
for d in directory.split(os.sep):
if len(d) > 1 and d[0].startswith('.'):
is_hidden = True
if is_hidden:
return
found.extend([os.path.join(directory,f) \
for f in files \
if fnmatch.fnmatch(f,pattern)])
os.path.walk(rootdir, callback, (found,pattern))
return found
def getPathToRoot(path,sub=2):
"""
Returns relative path to `root` given path
>>> from makesite import getPathToRoot
>>> getPathToRoot('.\\cookbook\\index.txt')
'../'
>>> getPathToRoot('.\\jEdit\\plugins\\editorscheme\\index.txt')
'../../../'
>>>
"""
root = '../' * (len(path.replace('\\','/').split('/')) - sub)
if not root:
root = './' # root directory must be relative to current dir
return root
def loadDigests(rootdir):
"""
Returns dict of MD5 digests for '.txt' files.
{filename: hexdigest[,...]}
"""
digests = {}
try:
f = open(os.path.join(rootdir,DIGEST_FILE))
for line in f.xreadlines():
# ignore blank lines
if not line.strip():
continue
# format is {hexdigest}\t{filepath}
digest,name = line.strip().split('\t')
digests[name] = digest
f.close()
except IOError, e:
pass
return digests
def saveDigests(rootdir,digests):
"""
Saves MD5 hex digests for files in `rootdir`.
File format is::
{digest}\t{filepath}
...
{digest}\t{filepath}
"""
f = open(os.path.join(rootdir,DIGEST_FILE),'w')
_digests = digests.items()
_digests.sort()
for name,digest in _digests:
f.write('%s\t%s\n' % (digest,name))
f.close()
def getDigest(path):
"""
Creates md5 hex digest for `path`.
"""
m = md5.new()
f = open(path)
m.update(f.read())
f.close()
return m.hexdigest()
def isUpToDate(path,digests):
"""
Checks whether a file's contents has changed since the last
time this script was run. If a digest is not found, `False`
is returned.
Adds the current digest to `digests` regardless of whether
the file is up to date or not.
"""
digest = getDigest(path)
if digest != digests.get(path,''):
digests[path] = digest
return False
return True
def main(forceall,clean):
path = ROOTDIR
digests = loadDigests(ROOTDIR)
# reST to .ht (files starting with ``_`` are private)
txt_files = findFiles(path, '[A-z]*.txt')
for txt_file in txt_files:
if not forceall and isUpToDate(txt_file,digests):
continue
rootdir = getPathToRoot(txt_file)
ht_file = os.path.splitext(txt_file)[0] + '.ht'
cmd = 'c:\\python234\\python.exe %s -g -t -s --report=3 --stylesheet=%s --base-section=%d %s %s' % \
(RST2HT, rootdir + STYLESHEET, BASE_SECTION, txt_file, ht_file,)
print cmd
os.system(cmd)
# .ht to HTML (files starting with ``_`` are private)
ht_files = findFiles(path, '[A-z]*.ht')
for ht_file in ht_files:
if not forceall and isUpToDate(ht_file,digests):
continue
rootdir = getPathToRoot(ht_file)
cmd = 'c:\\python234\\python.exe "%s" -r %s -s %s %s' % \
(HT2HTML, rootdir, STYLECLASS, ht_file,)
print cmd
os.system(cmd)
# remove ".ht" files
if clean:
for f in findFiles(path, '*.ht'):
print "removing %s" % f
os.unlink(f)
print 'saving digests...'
saveDigests(ROOTDIR,digests)
print 'OK'
def usage():
print os.path.split(sys.argv[0])[-1] + '-h|--help -f|--force -c|--clean'
if __name__ == '__main__':
# by default don't force a rebuild of all pages
forceall = False
clean = False
try:
opts,args = getopt.getopt(sys.argv[1:], "-hfc", ["help","force","clean"])
except getopt.GetoptError:
usage()
sys.exit(1)
for o,a in opts:
if o in ('-h','--help'):
usage()
sys.exit()
if o in ('-f','--force'):
forceall = True
elif o in ('-c','--clean'):
clean = True
main(forceall,clean)
# :indentSize=4:lineSeparator=\n:noTabs=true:tabSize=4:
Here are links to download files referenced in this article.
I'd appreciate any feedback or comments on this article. Also, I'd be happy to answer any questions readers might have about using these tools.