This article will dig a bit deeper about my Makefile
based static website generator. In a previous
article I just gave the rationale and an overview to do it yourself.
Mainly it is very fast and portable.
A few goals reached by my current build system are:
- Be fast and make the minimal amount of work as possible. I don't
want to rebuild all the html pages if I only change one file.
- Source file format agnostic. You can use markdown, org-mode or even
directly write html.
- Support gemini
- Optimize size: minify HTML, CSS, images
- Generate an index page listing the posts
- Generate RSS/atom feed (for both gemini and http)
make
will take care of handling the
dependency graph to minimize the amount of effort when a change occurs
in the sources. For some features, I built specifics small shell
scripts. For example to be absolutely agnostic in the source format for
my articles I generate the RSS out of a tree of HTML files. But taking
advantage of make
, I generate an index
cache to transform those HTML into XML which will be faster to use to
build different indexes. To make those transformations I use very short
a shell scripts.
Makefile
overview
A Makefile is made out of rules. The first rule of your Makefile will
be the default rule. The first rule of my Makefile is called all
.
A rule as the following format:
target: file1 file2
cmd --input file1 file2 \
--output target
if target
does not exists, then make
will look at its dependencies. If any of
its dependencies need to be updated, it will run all the rules in the
correct order to rebuild them and finally run the script to build target
. A file needs to be updated if one of its
dependency needs to be updated or is newer.
The usual use case of make
is about
building a single binary out of many source files. But for a static
website, we need to generate a lot of files from a lot of files. So we
construct the rules like this:
all: site
# build a list of files that will need to be build
DST_FILES := ....
# RULES TO GENERATE DST_FILES
ALL += $(DST_FILES)
# another list of files
DST_FILES_2 := ....
# RULES TO GENERATE DST_FILES_2
ALL += $(DST_FILES_2)
site: $(ALL)
In my Makefile
I have many similar
block with the same pattern.
- I retrieve a list of source files
- I construct the list of destination files (change the directory, the
extension)
- I declare a rule to construct these destination files
- I add the destination files to the
ALL
variable.
I have a block for:
- raw assets I just want copied
- images I would like to compress for the web
html
I would like to generate from org
mode files via pandocgmi
I would like to generate from org
mode filesxml
files I use as cache to build
different index filesindex.html
file containing a list of
my postsrss.xml
file containing a list of my
postsgemini-atom.xml
file containing a list
of my posts
Assets
The rules to copy assets will be a good first example.
- find all assets in
src/
directory - generate all assets from these files in
_site/
directory - make this rule a dependency on the
all
rule.
SRC_ASSETS := $(shell find src -type f)
DST_ASSETS := $(patsubst src/%,_site/%,$(SRC_ASSETS))
_site/% : src/%
@mkdir -p "$(dir $@)"
cp "$<" "$@"
.PHONY: assets
assets: $(DST_ASSETS)
ALL += assets
OK, this looks terrible. But mainly:
SRC_ASSETS
will contain the result of the command
find
.DST_ASSETS
will contain the files of
SRC_ASSETS
but we replace the src/
by _site/
.- We create a generic rule; for all files matching the following
pattern
_site/%
, look for the file src/%
and if it is newer (in our case) then
execute the following commands:- create the directory to put
_site/%
in - copy the file
About the line @mkdir -p "$(dir $@)"
:
- the
@
at the start of the command
simply means that we make this execution silent. - The
$@
is replaced by the target
string. - And
$(dir $@)
will generate the folder
name of $@
.
For the line with cp
, you just need to know that ~$<~
will represent the first dependency.
My Makefile is composed of similar blocks, where I replace the first
find command to match specific files and where I use different building
rules. An important point is that the rules must be the most specific
possible. This is because make
will use
the most specific rule in case of ambiguity. For example, the matching
rule _site/%: src/%
will match all files
in the src/
dir. But if we want to treat
CSS
files with another rule we could
write:
_site/%.css: src/%.css
minify "$<" "$@"
And if the selected file is a CSS
file,
this rule will be selected.
Prelude
I start with variables declarations:
all: site
# directory containing the source files
SRC_DIR ?= src
# directory that will contain the site files
DST_DIR ?= _site
# a directory that will contain a cache to speedup indexing
CACHE_DIR ?= .cache
# options to pass to find to prevent matching files in the src/drafts
# directory
NO_DRAFT := -not -path '$(SRC_DIR)/drafts/*'
# option to pass to find to not match org files
NO_SRC_FILE := ! -name '*.org'
CSS
Here we go; the same simple pattern for CSS files.
# CSS
SRC_CSS_FILES := $(shell find $(SRC_DIR) -type f -name '*.css')
DST_CSS_FILES := $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%,$(SRC_RAW_FILES))
$(DST_DIR)/%.css : $(SRC_DIR)/%.css
@mkdir -p "$(dir $@)"
minify "$<" > "$@"
.PHONY: css
css: $(DST_CSS_FILES)
ALL += css
This is very similar to the block for raw assets. The difference is
just that instead of using cp
we use the
minify
command.
ORG → HTML
Now this one is more complex but is still follow the same
pattern.
# ORG -> HTML
EXT ?= .org
SRC_PANDOC_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_PANDOC_FILES ?= $(patsubst %$(EXT),%.html, \
$(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
$(SRC_PANDOC_FILES)))
PANDOC_TEMPLATE ?= templates/post.html
MK_HTML := engine/mk-html.sh
PANDOC := $(MK_HTML) $(PANDOC_TEMPLATE)
$(DST_DIR)/%.html: $(SRC_DIR)/%.org $(PANDOC_TEMPLATE) $(MK_HTML)
@mkdir -p "$(dir $@)"
$(PANDOC) "$<" "$@.tmp"
minify --mime text/html "$@.tmp" > "$@"
@rm "$@.tmp"
.PHONY: html
html: $(DST_PANDOC_FILES)
ALL += html
So to construct DST_PANDOC_FILES
this
time we also need to change the extension of the file from org
to html
. We
need to provide a template that will be passed to pandoc.
And of course, as if we change the template file we would like to
regenerate all HTML files we put the template as a dependency. But
importantly not at the first place. Because we use
$<
that will be the first
dependency.
I also have a short script instead of directly using pandoc
. It is easier to handle toc
using the metadatas in the file. And if
someday I want to put the template in the metas, this will be the right
place to put that.
The mk-html.sh
is quite
straightforward:
#!/usr/bin/env bash
set -eu
# put me at the top level of my project (like Makefile)
cd "$(git rev-parse --show-toplevel)" || exit 1
template="$1"
orgfile="$2"
htmlfile="$3"
# check if there is the #+OPTIONS: toc:t
tocoption=""
if grep -ie '^#+options:' "$orgfile" | grep 'toc:t'>/dev/null; then
tocoption="--toc"
fi
set -x
pandoc $tocoption \
--template="$template" \
--mathml \
--from org \
--to html5 \
--standalone \
$orgfile \
--output "$htmlfile"
Once generated I also minify the html file. And, that's it. But the
important part is that now, if I change my script or the template or the
file, it will generate the dependencies.
Indexes
We often need indexes to build a website. Typically to list the
latest articles, build the RSS file. So for sake of simplicity, I
decided to build my index as a set of XML files. Of course, this could
be optimizide, by using SQLite for example. But this will already be
really fast.
For every generated html file I will generate a clean XML file with
hxclean
. Once cleaned, it will be easy to
access a specific node of in these XML files.
# INDEXES
SRC_POSTS_DIR ?= $(SRC_DIR)/posts
DST_POSTS_DIR ?= $(DST_DIR)/posts
SRC_POSTS_FILES ?= $(shell find $(SRC_POSTS_DIR) -type f -name "*$(EXT)")
RSS_CACHE_DIR ?= $(CACHE_DIR)/rss
DST_XML_FILES ?= $(patsubst %.org,%.xml, \
$(patsubst $(SRC_POSTS_DIR)/%,$(RSS_CACHE_DIR)/%, \
$(SRC_POSTS_FILES)))
$(RSS_CACHE_DIR)/%.xml: $(DST_POSTS_DIR)/%.html
@mkdir -p "$(dir $@)"
hxclean "$<" > "$@"
.PHONY: indexcache
indexcache: $(DST_XML_FILES)
ALL += indexcache
This rule will generate for every file in site/posts/*.html
a corresponding xml
file (hxclean
takes an HTML an try its best to make an XML out of it).
HTML Index
Now we just want to generate the main index.html
page at the root of the site. This
page should list all articles by date in reverse order.
The first step is to take advantage of the cache index. For every XML
file I generated before I should generate the small HTML block I want
for every entry. For this I use a script mk-index-entry.sh
. He will use hxselect
to retrieve the date and the title from
the cached XML files. Then generate a small file just containing the
date and the link.
Here is the block in the Makefile:
DST_INDEX_FILES ?= $(patsubst %.xml,%.index, $(DST_XML_FILES))
MK_INDEX_ENTRY := ./engine/mk-index-entry.sh
INDEX_CACHE_DIR ?= $(CACHE_DIR)/rss
$(INDEX_CACHE_DIR)/%.index: $(INDEX_CACHE_DIR)/%.xml $(MK_INDEX_ENTRY)
@mkdir -p $(INDEX_CACHE_DIR)
$(MK_INDEX_ENTRY) "$<" "$@"
It means: for every .xml
file generate
a .index
file with mk-index-entry.sh
.
#!/usr/bin/env zsh
# prelude
cd "$(git rev-parse --show-toplevel)" || exit 1
xfic="$1"
dst="$2"
indexdir=".cache/rss"
# HTML Accessors (similar to CSS accessors)
dateaccessor='.yyydate'
# title and keyword shouldn't be changed
titleaccessor='title'
finddate(){ < $1 hxselect -c $dateaccessor | sed 's/\[//g;s/\]//g;s/ .*$//' }
findtitle(){ < $1 hxselect -c $titleaccessor }
autoload -U colors && colors
blogfile="$(echo "$xfic"|sed 's#.xml$#.html#;s#^'$indexdir'/#posts/#')"
printf "%-30s" $blogfile
d=$(finddate $xfic)
echo -n " [$d]"
rssdate=$(formatdate $d)
title=$(findtitle $xfic)
keywords=( $(findkeywords $xfic) )
printf ": %-55s" "$title ($keywords)"
{ printf "\\n<li>"
printf "\\n<span class=\"pubDate\">%s</span>" "$d"
printf "\\n<a href=\"%s\">%s</a>" "${blogfile}" "$title"
printf "\\n</li>\\n\\n"
} >> ${dst}
echo " [${fg[green]}OK${reset_color}]"
Then I use these intermediate files to generate a single bigger index
file.
HTML_INDEX := $(DST_DIR)/index.html
MKINDEX := engine/mk-index.sh
INDEX_TEMPLATE ?= templates/index.html
$(HTML_INDEX): $(DST_INDEX_FILES) $(MKINDEX) $(INDEX_TEMPLATE)
@mkdir -p $(DST_DIR)
$(MKINDEX)
.PHONY: index
index: $(HTML_INDEX)
ALL += index
This script is a big one, but it is not that complex. For every file,
I generate a new file DATE-dirname
. I sort
them in reverse order and put their content in the middle of an HTML
file.
Important note: this file updates only if the index change.
The first part of the script creates files with the creation date in
their metadatas. The created file name will contain the creation date,
this will be helpful later.
#!/usr/bin/env zsh
autoload -U colors && colors
cd "$(git rev-parse --show-toplevel)" || exit 1
# Directory
webdir="_site"
indexfile="$webdir/index.html"
indexdir=".cache/rss"
tmpdir=$(mktemp -d)
echo "Publishing"
dateaccessor='.pubDate'
finddate(){ < $1 hxselect -c $dateaccessor }
# generate files with <DATE>-<FILENAME>.index
for fic in $indexdir/**/*.index; do
d=$(finddate $fic)
echo "${${fic:h}:t} [$d]"
cp $fic $tmpdir/$d-${${fic:h}:t}.index
done
Then I use these files to generate a file that will contain the body
of the HTML.
# for every post in reverse order
# generate the body (there is some logic to group by year)
previousyear=""
for fic in $(ls $tmpdir/*.index | sort -r); do
d=$(finddate $fic)
year=$( echo "$d" | perl -pe 's#(\d{4})-.*#$1#')
if (( year != previousyear )); then
if (( previousyear > 0 )); then
echo "</ul>" >> $tmpdir/index
fi
previousyear=$year
echo "<h3 name=\"${year}\" >${year}</h3><ul>" >> $tmpdir/index
fi
cat $fic >> $tmpdir/index
done
echo "</ul>" >> $tmpdir/index
And finally, I render the HTML using a template within a shell
script:
title="Y"
description="Most recent articles"
author="Yann Esposito"
body=$(< $tmpdir/index)
date=$(LC_TIME=en_US date +'%Y-%m-%d')
# A neat trick to use pandoc template within a shell script
# the pandoc templates use $x$ format, we replace it by just $x
# to be used with envsubst
template=$(< templates/index.html | \
sed 's/\$\(header-includes\|table-of-content\)\$//' | \
sed 's/\$if.*\$//' | \
perl -pe 's#(\$[^\$]*)\$#$1#g' )
{
export title
export author
export description
export date
export body
echo ${template} | envsubst
} > "$indexfile"
rm -rf $tmpdir
echo "* HTML INDEX [done]"
My RSS generation is similar to the system I used to generate the
index file. I just slightly improved the rules.
The Makefile
blocks look like:
# RSS
DST_RSS_FILES ?= $(patsubst %.xml,%.rss, $(DST_XML_FILES))
MK_RSS_ENTRY := ./engine/mk-rss-entry.sh
$(RSS_CACHE_DIR)/%.rss: $(RSS_CACHE_DIR)/%.xml $(MK_RSS_ENTRY)
@mkdir -p $(RSS_CACHE_DIR)
$(MK_RSS_ENTRY) "$<" "$@"
RSS := $(DST_DIR)/rss.xml
MKRSS := engine/mkrss.sh
$(RSS): $(DST_RSS_FILES) $(MKRSS)
$(MKRSS)
.PHONY: rss
rss: $(RSS)
ALL += rss
Gemini
I wrote a minimal script to transform my org files to gemini files. I
also need to generate an index and an atom file for gemini:
# ORG -> GEMINI
EXT := .org
SRC_GMI_FILES ?= $(shell find $(SRC_DIR) -type f -name "*$(EXT)" $(NO_DRAFT))
DST_GMI_FILES ?= $(subst $(EXT),.gmi, \
$(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, \
$(SRC_GMI_FILES)))
GMI := engine/org2gemini.sh
$(DST_DIR)/%.gmi: $(SRC_DIR)/%.org $(GMI) engine/org2gemini_step1.sh
@mkdir -p $(dir $@)
$(GMI) "$<" "$@"
ALL += $(DST_GMI_FILES)
.PHONY: gmi
gmi: $(DST_GMI_FILES)
# GEMINI INDEX
GMI_INDEX := $(DST_DIR)/index.gmi
MK_GMI_INDEX := engine/mk-gemini-index.sh
$(GMI_INDEX): $(DST_GMI_FILES) $(MK_GMI_INDEX)
@mkdir -p $(DST_DIR)
$(MK_GMI_INDEX)
ALL += $(GMI_INDEX)
.PHONY: gmi-index
gmi-index: $(GMI_INDEX)
# RSS
GEM_ATOM := $(DST_DIR)/gem-atom.xml
MK_GEMINI_ATOM := engine/mk-gemini-atom.sh
$(GEM_ATOM): $(DST_GMI_FILES) $(MK_GEMINI_ATOM)
$(MK_GEMINI_ATOM)
ALL += $(GEM_ATOM)
.PHONY: gmi-atom
gmi-atom: $(GMI_ATOM)
.PHONY: gemini
gemini: $(DST_GMI_FILES) $(GMI_INDEX) $(GEM_ATOM)
Images
For images, I try to compress them all with imagemagick.
# Images
SRC_IMG_FILES ?= $(shell find $(SRC_DIR) -type f -name "*.jpg" -or -name "*.jpeg" -or -name "*.gif" -or -name "*.png")
DST_IMG_FILES ?= $(patsubst $(SRC_DIR)/%,$(DST_DIR)/%, $(SRC_IMG_FILES))
$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpg
@mkdir -p $(dir $@)
convert "$<" -quality 50 -resize 800x800\> "$@"
$(DST_DIR)/%.jpg: $(SRC_DIR)/%.jpeg
@mkdir -p $(dir $@)
convert "$<" -quality 50 -resize 800x800\> "$@"
$(DST_DIR)/%.gif: $(SRC_DIR)/%.gif
@mkdir -p $(dir $@)
convert "$<" -quality 50 -resize 800x800\> "$@"
$(DST_DIR)/%.png: $(SRC_DIR)/%.png
@mkdir -p $(dir $@)
convert "$<" -quality 50 -resize 800x800\> "$@"
.PHONY: img
img: $(DST_IMG_FILES)
ALL += $(DST_IMG_FILES)
Deploy
A nice bonus is that I also deploy my website using make.
# DEPLOY
.PHONY: site
site: $(ALL)
.PHONY: deploy
deploy: $(ALL)
engine/sync.sh
.PHONY: clean
clean:
-[ ! -z "$(DST_DIR)" ] && rm -rf $(DST_DIR)/*
-[ ! -z "$(CACHE_DIR)" ] && rm -rf $(CACHE_DIR)/*