Fast Static Site with make

[2021-05-25 Tue] on Yann Esposito's blog
A deeper view of my static site builder Makefile

This article will dig a bit deeper about how I generate my static website. In a previous article I just gave the rationale and an overview to do it yourself. Mainly it is very fast and portable.

A few goal reached by my current build system are:

  1. Be fast, try to make as few work as possible. I don't want to rebuild all the html pages if I only change one file.
  2. Source file format agnostic. You can use markdown, org-mode or even directly writing html.
  3. Support gemini
  4. Optimize size: minify HTML, CSS, images
  5. Generate an index page listing the posts
  6. Generate RSS/atom feed (for both gemini and http)

make will take care of handling the dependency graph to minimize the amount of effort when a change occurs in the sources. For some features, I built specifics small shell scripts. For example to be absolutely agnostic in the source format for my articles I generate the RSS out of a tree of HTML files. But taking advantage of make, I generate an index cache to transform those HTML into XML which will be faster to use to build different indexes. To make those transformations I use very short a shell scripts.

Makefile overview

A Makefile is constitued of rules. The first rule of your Makefile will be the default rule. The first rule of my Makefile is called all.

A rule as the following format:

target: file1 file2
    cmd --input file1 file2 \
        --output target

if target does not exists, then make will look at its dependencies. If any of its dependency need to be updated, it will run all the rules in the correct order to rebuild them, and finally run the script to build target. A file need to be updated if one of its dependency need to be updated or is newer.

The ususal case of make is about building a single binary out of many source files. But for a static website, we need to generate a lot of files from a lot of files. So we construct the rules like this:

all: site

# build a list of files that will need to be build
DST_FILES := ....
# RULES TO GENERATE DST_FILES
ALL += $(DST_FILES)

# another list of files
DST_FILES_2 := ....
# RULES TO GENERATE DST_FILES_2
ALL += $(DST_FILES_2)

site: $(ALL)

In my Makefile I have many similar block with the same pattern.

  1. I retrieve a list of source files
  2. I construct the list of destination files (change the directory, the extension)
  3. I declare a rule to construct these destination files
  4. I add the destination files to the ALL variable.

I have a block for:

Assets

The rules to copy assets will be a good first example.

  1. find all assets in src/ directory
  2. generate all assets from these file in _site/ directory
  3. make this rule a dependency on the all rule.
SRC_ASSETS := $(shell find src -type f)
DST_ASSETS := $(patsubst src/%,_site/%,$(SRC_ASSETS))
_site/% : src/%
    @mkdir -p "$(dir [email protected])"
    cp "$<" "[email protected]"
.PHONY: assets
assets: $(DST_ASSETS)
ALL += assets

OK, this looks terrible. But mainly:

About the line @mkdir -p "$(dir [email protected])":

For the line with cp you just need to know that $< will represent the first dependency.

So my Makefile is composed of similar blocks, where I replace the first find command to match specific files and where I use different building rule. An important point, is that the rule must be the most specific possible because make will use the most specific rule in case of ambiguity. So for example, the matching rule _site/%: src/% will match all files in the src/ dir. But if we want to treat css file with another rule we could write:

_site/%.css: src/%.css
    minify "$<" "[email protected]"

And if the selected file is a css file, this rule will be selected.

Prelude

So to start I have a few predefined useful variables.

all: site
# directory containing the source files
SRC_DIR ?= src
# directory that will contain the site files
DST_DIR ?= _site
# a directory that will contain a cache to speedup indexing
CACHE_DIR ?= .cache

# options to pass to find to prevent matching files in the src/drafts
# directory
NO_DRAFT := -not -path '$(SRC_DIR)/drafts/*'
# option to pass to find to not match  org files
NO_SRC_FILE := ! -name '*.org'

CSS

So here we go, the same simple pattern for CSS files.

# CSS
SRC_CSS_FILES := $(shell find $(SRC_DIR) -type f -name '*.css')
DST_CSS_FILES := $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%,$(SRC_RAW_FILES))
$(DST_DIR)/%.css : $(SRC_DIR)/%.css
    @mkdir -p "$(dir [email protected])"
    minify "$<" > "[email protected]"
.PHONY: css
css: $(DST_CSS_FILES)
ALL += css

This is very similar to the block for raw assets. The difference is just that instead of using cp we use the minify command.

ORG → HTML

Now this one is more complex but is still follow the same pattern.

# ORG -> HTML
EXT ?= .org
SRC_PANDOC_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_PANDOC_FILES ?= $(patsubst %$(EXT),%.html, \
                        $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
                            $(SRC_PANDOC_FILES)))
PANDOC_TEMPLATE ?= templates/post.html
MK_HTML := engine/mk-html.sh
PANDOC := $(MK_HTML) $(PANDOC_TEMPLATE)
$(DST_DIR)/%.html: $(SRC_DIR)/%.org $(PANDOC_TEMPLATE) $(MK_HTML)
    @mkdir -p "$(dir [email protected])"
    $(PANDOC) "$<" "[email protected].tmp"
    minify --mime text/html "[email protected].tmp" > "[email protected]"
    @rm "[email protected].tmp"
.PHONY: html
html: $(DST_PANDOC_FILES)
ALL += html

So to construct DST_PANDOC_FILES this time we also need to change the extension of the file from org to html. We need to provide a template that will be passed to pandoc.

And of course, as if we change the template file we would like to regenerate all HTML files we put the template as a dependency. But importantly not at the first place. Because we use $< that will be the first dependency.

I also have a short script instead of directly using pandoc. It is easier to handle toc using the metadatas in the file. And if someday I want to put the template in the metas, this will be the right place to put that.

The mk-html.sh is quite straightforward:

#!/usr/bin/env bash
set -eu

# put me at the top level of my project (like Makefile)
cd "$(git rev-parse --show-toplevel)" || exit 1
template="$1"
orgfile="$2"
htmlfile="$3"

# check if there is the #+OPTIONS: toc:t
tocoption=""
if grep -ie '^#+options:' "$orgfile" | grep 'toc:t'>/dev/null; then
    tocoption="--toc"
fi

set -x
pandoc $tocoption \
       --template="$template" \
       --mathml \
       --from org \
       --to html5 \
       --standalone \
       $orgfile \
       --output "$htmlfile"

Once generated I also minify the html file. And, that's it. But the important part is that now, if I change my script or the template or the file, it will generate the dependencies.

Indexes

One of the goal I have is to be as agnostic as possible regarding format. I know that the main destination format will be html. So as much as possible, I would like to use this format. So for every generated html file I will generate a clean XML file (via hxclean) so I will be able to get specific node of my HTML files. These XML files will constitute my "index". Of course this is not the most optimized index (I could have used sqlite for example) but it will already be quite helpful as the same index files will be used to build the homepage with the list of articles, and the RSS file.

# INDEXES
SRC_POSTS_DIR ?= $(SRC_DIR)/posts
DST_POSTS_DIR ?= $(DST_DIR)/posts
SRC_POSTS_FILES ?= $(shell find $(SRC_POSTS_DIR) -type f -name "*$(EXT)")
RSS_CACHE_DIR ?= $(CACHE_DIR)/rss
DST_XML_FILES ?= $(patsubst %.org,%.xml, \
                        $(patsubst $(SRC_POSTS_DIR)/%,$(RSS_CACHE_DIR)/%, \
                            $(SRC_POSTS_FILES)))
$(RSS_CACHE_DIR)/%.xml: $(DST_POSTS_DIR)/%.html
    @mkdir -p "$(dir [email protected])"
    hxclean "$<" > "[email protected]"
.PHONY: indexcache
indexcache: $(DST_XML_FILES)
ALL += indexcache

So to resume this rule will generate for every file in site/posts/*.html a corresponding xml file (hxclean takes an HTML an try its best to make an XML out of it).

HTML Index

So now we just want to generate the main index.html page at the root of the site. This page should list all articles by date in reverse order.

So the first step is to take advantage of the cache index. For every XML file I generated before I should generate the small HTML block I want for every entry. For this I use a script mk-index-entry.sh. He will use hxclean to retrieve the date and the title from the cached XML files. Then generate a small file just containing the date and the link.

Here is the block in the Makefile:

DST_INDEX_FILES ?= $(patsubst %.xml,%.index, $(DST_XML_FILES))
MK_INDEX_ENTRY := ./engine/mk-index-entry.sh
INDEX_CACHE_DIR ?= $(CACHE_DIR)/rss
$(INDEX_CACHE_DIR)/%.index: $(INDEX_CACHE_DIR)/%.xml $(MK_INDEX_ENTRY)
    @mkdir -p $(INDEX_CACHE_DIR)
    $(MK_INDEX_ENTRY) "$<" "[email protected]"

which reads, for every .xml file generate a .index file with mk-index-entry.sh.

#!/usr/bin/env zsh

# prelude
cd "$(git rev-parse --show-toplevel)" || exit 1
xfic="$1"
dst="$2"
indexdir=".cache/rss"

# HTML Accessors (similar to CSS accessors)
dateaccessor='.yyydate'
# title and keyword shouldn't be changed
titleaccessor='title'
finddate(){ < $1 hxselect -c $dateaccessor | sed 's/\[//g;s/\]//g;s/ .*$//' }
findtitle(){ < $1 hxselect -c $titleaccessor }

autoload -U colors && colors

blogfile="$(echo "$xfic"|sed 's#.xml$#.html#;s#^'$indexdir'/#posts/#')"
printf "%-30s" $blogfile
d=$(finddate $xfic)
echo -n " [$d]"
rssdate=$(formatdate $d)
title=$(findtitle $xfic)
keywords=( $(findkeywords $xfic) )
printf ": %-55s" "$title ($keywords)"
{ printf "\\n<li>"
  printf "\\n<span class=\"pubDate\">%s</span>" "$d"
  printf "\\n<a href=\"%s\">%s</a>" "${blogfile}" "$title"
  printf "\\n</li>\\n\\n"
} >> ${dst}

echo " [${fg[green]}OK${reset_color}]"

Then I use these intermediate file to generate a single bigger index file.

HTML_INDEX := $(DST_DIR)/index.html
MKINDEX := engine/mk-index.sh
INDEX_TEMPLATE ?= templates/index.html
$(HTML_INDEX): $(DST_INDEX_FILES) $(MKINDEX) $(INDEX_TEMPLATE)
    @mkdir -p $(DST_DIR)
    $(MKINDEX)
.PHONY: index
index: $(HTML_INDEX)
ALL += index

This script is a big one, but it is not that complex. For every file, I generate a new file DATE-dirname, I sort them in reverse order and put their content in the middle of an HTML file.

The important part is that it is only generated if the index change. So first part of the script handle the creation of file using the date in their file name which will help us sort them later.

#!/usr/bin/env zsh

autoload -U colors && colors
cd "$(git rev-parse --show-toplevel)" || exit 1
# Directory
webdir="_site"
indexfile="$webdir/index.html"
indexdir=".cache/rss"
tmpdir=$(mktemp -d)

echo "Publishing"

dateaccessor='.pubDate'
finddate(){ < $1 hxselect -c $dateaccessor }
# generate files with <DATE>-<FILENAME>.index
for fic in $indexdir/**/*.index; do
    d=$(finddate $fic)
    echo "${${fic:h}:t} [$d]"
    cp $fic $tmpdir/$d-${${fic:h}:t}.index
done

Then I use these files to generate a file that will contain the body of the HTML.

# for every post in reverse order
# generate the body (there is some logic to group by year)
previousyear=""
for fic in $(ls $tmpdir/*.index | sort -r); do
    d=$(finddate $fic)
    year=$( echo "$d" | perl -pe 's#(\d{4})-.*#$1#')
    if (( year != previousyear )); then
        if (( previousyear > 0 )); then
            echo "</ul>" >> $tmpdir/index
        fi
        previousyear=$year
        echo "<h3 name=\"${year}\" >${year}</h3><ul>" >> $tmpdir/index
    fi
    cat $fic >> $tmpdir/index
done
echo "</ul>" >> $tmpdir/index

And finally, I render the HTML using a template within a shell script:

title="Y"
description="Most recent articles"
author="Yann Esposito"
body=$(< $tmpdir/index)
date=$(LC_TIME=en_US date +'%Y-%m-%d')

# A neat trick to use pandoc template within a shell script
# the pandoc templates use $x$ format, we replace it by just $x
# to be used with envsubst
template=$(< templates/index.html | \
    sed 's/\$\(header-includes\|table-of-content\)\$//' | \
    sed 's/\$if.*\$//' | \
    perl -pe 's#(\$[^\$]*)\$#$1#g' )
{
    export title
    export author
    export description
    export date
    export body
    echo ${template} | envsubst
} > "$indexfile"

rm -rf $tmpdir
echo "* HTML INDEX [done]"

RSS

So for my RSS generation this is quite similar to the system I use to generate my index file. I just slightly improved the rules.

The makefile blocks look like:

# RSS
DST_RSS_FILES ?= $(patsubst %.xml,%.rss, $(DST_XML_FILES))
MK_RSS_ENTRY := ./engine/mk-rss-entry.sh
$(RSS_CACHE_DIR)/%.rss: $(RSS_CACHE_DIR)/%.xml $(MK_RSS_ENTRY)
    @mkdir -p $(RSS_CACHE_DIR)
    $(MK_RSS_ENTRY) "$<" "[email protected]"

RSS := $(DST_DIR)/rss.xml
MKRSS := engine/mkrss.sh
$(RSS): $(DST_RSS_FILES) $(MKRSS)
    $(MKRSS)

.PHONY: rss
rss: $(RSS)
ALL += rss

Gemini

I wrote a minimal script to transform my org files to gemini files. I also need to generate an index and an atom file for gemini:

# ORG -> GEMINI
EXT := .org
SRC_GMI_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_GMI_FILES ?= $(subst $(EXT),.gmi, \
                        $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
                            $(SRC_GMI_FILES)))
GMI := engine/org2gemini.sh
$(DST_DIR)/%.gmi: $(SRC_DIR)/%.org $(GMI) engine/org2gemini_step1.sh
    @mkdir -p $(dir [email protected])
    $(GMI) "$<" "[email protected]"
ALL += $(DST_GMI_FILES)
.PHONY: gmi
gmi: $(DST_GMI_FILES)

# GEMINI INDEX
GMI_INDEX := $(DST_DIR)/index.gmi
MK_GMI_INDEX := engine/mk-gemini-index.sh
$(GMI_INDEX): $(DST_GMI_FILES) $(MK_GMI_INDEX)
    @mkdir -p $(DST_DIR)
    $(MK_GMI_INDEX)
ALL += $(GMI_INDEX)
.PHONY: gmi-index
gmi-index: $(GMI_INDEX)

# RSS
GEM_ATOM := $(DST_DIR)/gem-atom.xml
MK_GEMINI_ATOM := engine/mk-gemini-atom.sh
$(GEM_ATOM): $(DST_GMI_FILES) $(MK_GEMINI_ATOM)
    $(MK_GEMINI_ATOM)
ALL += $(GEM_ATOM)
.PHONY: gmi-atom
gmi-atom: $(GMI_ATOM)

.PHONY: gemini
gemini: $(DST_GMI_FILES) $(GMI_INDEX) $(GEM_ATOM)

Images

For images, I try to convert all of them with imagemagick to compress them.

# Images
SRC_IMG_FILES ?= $(shell find $(SRC_DIR) -type f -name "*.jpg" -or -name "*.jpeg" -or -name "*.gif" -or -name "*.png")
DST_IMG_FILES ?= $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, $(SRC_IMG_FILES))

$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpg
    @mkdir -p $(dir [email protected])
    convert "$<" -quality 50 -resize 800x800\> "[email protected]"

$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpeg
    @mkdir -p $(dir [email protected])
    convert "$<" -quality 50 -resize 800x800\> "[email protected]"

$(DST_DIR)/%.gif: $(SRC_DIR)/%.gif
    @mkdir -p $(dir [email protected])
    convert "$<" -quality 50 -resize 800x800\> "[email protected]"

$(DST_DIR)/%.png: $(SRC_DIR)/%.png
    @mkdir -p $(dir [email protected])
    convert "$<" -quality 50 -resize 800x800\> "[email protected]"

.PHONY: img
img: $(DST_IMG_FILES)
ALL += $(DST_IMG_FILES)

Deploy

A nice bonus is that I also deploy my website using make. And note I protect myself from Makefile temporary bugs for the clean rule.

# DEPLOY
.PHONY: site
site: $(ALL)

.PHONY: deploy
deploy: $(ALL)
    engine/sync.sh

.PHONY: clean
clean:
    -[ ! -z "$(DST_DIR)" ] && rm -rf $(DST_DIR)/*
    -[ ! -z "$(CACHE_DIR)" ] && rm -rf $(CACHE_DIR)/*