Other labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
Přeložit do češtiny pomocí Google Translate ...Lab #11 (May 4 – May 8)
Before class
- Learn about Pandoc.
Topic
- Build systems.
Exercises
make
subdirectory.
Before diving into build systems, we will play a little bit with Pandoc. Pandoc is a universal document converter that can convert between various formats, including HTML, Markdown, Docbook, LaTeX, Word, LibreOffice or PDF.
Please, install it on your machine (package name is pandoc
) first.
Start with running
pandoc index.md
As you can see, the output is conversion of the Markdown file into HTML, though without HTML header.
If you add --standalone
, it generates a full HTML page. Let’s try it
(both invocations will have the same end result).
pandoc --standalone index.md >index.html
pandoc --standalone -o index.html index.md
Try opening index.html
in your web browser too.
As mentioned, Pandoc can create OpenDocument too (the format used mostly in OpenOffice/LibreOffice suite). And it is so easy.
pandoc -o index.odt index.md
We omitted here the --standalone
as it is not needed for anything else
than HTML output.
Install OpenOffice/LibreOffice to check what the output looks like.
rules.md
to HTML and ODT.
As a side-note, do you know that LibreOffice can be used from command-line too? For example, we can ask LibreOffice to convert a document to PDF via following command.
soffice --headless --convert-to pdf rules.odt
The --headless
prevents opening any GUI and --convert-to
is self-explanatory.
So, with three commands we are able to create HTML page and PDF output from single source. Quite a neat trick if you need to submit printed documentation and you only have HTML, for example.
Let’s get back to Pandoc. Without any other options, it uses its own default template for the final HTML. But we can change this template too. Yes, this is similar to the SSG example you already know but we are attacking the problem from a different angle here. Note that both approaches can be combined.
Open template.html
.
It looks very like the Jinja one you have seen in Task 04 but instead of
{{
there are things in dollars.
If you have not yet tackled Task 04, the template is normal HTML with
placeholders, enclosed in dollars.
So when the template is expanded (or rendered), the parts between dollars
would be replaced with actual content.
Let’s try it with Pandoc.
pandoc --template template.html rules.md >out/rules.html
Check what the output looks like. Notice that we create the result in separate directory to stop cluttering the current one.
index.md
into out/
.
Copy main.css
to out/
too.
One more side-step :-)
The web server that is running on port 8080 on unixadmin.ms.mff.cuni.cz
supports personal web pages too.
All you need to do is to create public_html
directory in your home dir
and then you can access it via /~LOGIN
URL.
To try it, copy the generated files from out
to unixadmin.ms.mff.cuni.cz
into $HOME/public_html
.
Note that you can use Midnigh Commander easily for that (use Shell link
menu and if you have setup linux-intro
alias in your .ssh/config
, it
would work here as well).
Also create the tunnel via
ssh -L 8090:localhost:8080 -N LOGIN@unixadmin.ms.mff.cuni.cz
(but you already
know this command, right?).
Open http://localhost:8090/~LOGIN in your web browser.
It will show Forbidden web page.
Why?
Because the webpage is now served over HTTP (wrapped inside SSH but that
is irrelevant now) by the web server.
This web server is running under user apache
and this user could not access
your files. That is fine – you do not want to have your files
readable for everybody.
But we can allow the web server access to the public_html
directory.
So we can simply run
setfacl -m u:apache:x public_html
to fix this (recall what meaning has x
bit on directories
and why it is sufficient here).
But this still will not work as our $HOME
is not readable
by Apache too (the operating system checks the permissions for
each directory on the absolute path).
Run the setfacl
on your $HOME
too.
Refresh your browser – you should see your index.html
displayed.
You probably noticed that there is link to teams.html
but
no teams.md
.
That is because we generate that page from a CSV teams.csv
.
Look into bin/make_teams_page.sh
and generate out/teams.html
by yourself.
To generate the whole website (let’s call our 3 small pages a website) we need to execute several commands.
As these commands need to be run after any change to the input files
(either *.md
or teams.csv
), let’s put them into a script
to simplify things for us.
By yourself, create a script that generates the whole website into
the out
directory.
The script is nice but it overwrites all files even if there was no change. In our small example, it is no big deal. You have a fast computer, after all.
But in bigger project where we, for example, compile thousands
of files (e.g. look at source tree of Linux kernel, Firefox
or LibreOffice), it is a big deal.
If input file was not changed (e.g. we modified only rules.md
)
we do not need to regenerate its output (e.g. we do not need
to re-create index.html
).
Let’s extend our script a little bit.
Instead of
pandoc --template template.html index.md >out/index.html
we use (man test
if you have never seen -nt
)
[ "index.md" -nt "out/index.html" ] \
&& pandoc --template template.html index.md >out/index.html
We can do that for every command to speed-up web generation.
But.
That is a lot of work. And probably the time-saved would be all wasted by rewriting our script. Not talking about the fact that the result looks horrible. And is expensive to maintain.
Luckily, there is better way.
Open Makefile
now.
This file is a control file for a build system named make
that
does exactly what we tried to imitate in the previous example.
It contains so called dependencies and actions to execute when the dependants are out-of-date (i.e. dependency is newer than the target).
We will start with the following fragment:
out/rules.html: rules.md template.html
pandoc --template template.html rules.md >out/rules.html
Important: the indenting in Makefile
s have to be done
with tabs so make sure your editor does not expand tabs to spaces.
It is also a common issue when copying fragments from web-browser.
(Usually, your editor will recognize that Makefile
is a special
filename and switch to tabs-only policy by itself.)
The fragment has three parts.
Before the colon is the name of the target.
That is usually a filename and describes what we want to build.
Here it is out/rules.html
.
The second part is after the colon till the end of the line.
It lists dependencies.
make
looks at the dependencies and if they are newer than the
target, it means that the target is out-of-date and needs to be
rebuild.
The third part are following lines that has to be indented by
tab and contains commands that has to be executed for the target
to be build.
Here, it is the call to pandoc
.
Together, we can read it as a rule that describes when it is needed to build a target and how.
The rest of the Makefile
is similar.
There are rules for other files and also several special rules.
The special rules are all
, clean
and .PHONY
.
all
is a traditional name for the very first rule in the file.
Note that it lists as its dependencies all generated files.
The first rule is also called default rule and is executed by default. As you have probably guessed, by default we want to build everything (more precisely: update everything that needs to be updated).
clean
is a special rule that has no dependencies but instead
has only commands that remove everything in out
.
It is a useful service-style rule for removing generated files
(e.g. to start with fresh state, save disk space etc.).
As make
expects that target name is filename, we need to tell
it that all
and clean
are actually not filenames (i.e. we
are not creating file all
as one could expect) via the
special target .PHONY
.
This weird approach is basically a desing flaw of make
that
was originally created as a one-shot utility and somehow survived
for more than 40 years.
Note that despite the age, make
is still used even in new projects
and is also often used as a backend. That is, you have something
smarter that generates Makefile and let make
do the actual work.
So far good. But we have not yet seen how to run Makefile
.
That is actually simple: install make
package first and run
make
Depending on your other changes, perhaps nothing was done or some commands were executed.
Let’s run
make clean
to clean all files in out/
.
As you can see, make
prints what it is doing to stderr.
Run make
again.
Now make a change to index.md
and run make
again.
What is the difference?
Solution.On your own, add rules for creating out/teams.html
.
Do not forget to add removal of teams.md
to the clean
target.
Add a target upload
that copies the generated web to your
public_html
directory.
That is something where Midnight Commander is not the right choice
but we can use scp
(or sftp
).
scp
is like cp
(i.e. it copies files) but the s
was taken from SSH
;-).
Thus simple
scp out/index.html LOGIN@unixadmin.ms.mff.cuni.cz:public_html/
copies out/index.html
to your $HOME/public_html
.
With proper alias in .ssh/config
, you can even use
the following shorter form:
scp out/index.html linux-intro:public_html/
With -r
it copies directory recursively.
rules.pdf
to the rules.md
page and generate
the rules.pdf
automatically in Makefile
too.
Solution.The Makefile
starts to have too much of repeated code.
But make
can help you with that too.
Let’s remove all the rules for generating out/*.html
from *.md
and replace them with:
out/%.html: %.md template.html
pandoc --template template.html -o $@ $<
That is a pattern rule that captures the idea that HTML is generated from Markdown. Here, the percent sign represents so called stem – the variable part of the pattern.
In the command part, we use make
variables (they start with dollar
as in shell) $@
and $<
.
$@
is the actual target and $<
is the first dependency.
Run make clean && make
to verify that even with pattern rules,
the web is still generated.
Generate also contact.html
. Do not forget to add it to the
menu.
Run make
after the changes. What was rebuild?
Generating the teams web page is nice but we pollute current directory with a temporary file.
Let’s do a small change and use tmp/
for that:
...
out/%.html: tmp/%.md template.html
pandoc --template template.html -o $@ $<
tmp/teams.md: teams.csv bin/make_teams_page.sh
bin/make_teams_page.sh <$< >$@
...
clean:
rm -f out/* tmp/*
tasks
page with only list of PDFs for download.
Add it to the menu too.
Last thing we will do with make
is to improve the readability of
Makefile
a little bit with variables (actually, they cannot be changed
and constants would be a better name).
Let’s start with a simple change:
PAGES = \
out/contact.html \
out/index.html \
out/rules.html \
out/teams.html
all: $(PAGES) out/main.css ...
...
Not much but we at least keep the list of web pages together and
the all:
line is a bit shorter.
Note that \
at the end of line denotes that the line continues.
Why we have each page on a separate line?
Solution.We will finish the simplification with another bit that is often useful when dealing with more complex path names.
Note that there are several variants of make
: so far, our Makefile
is fully standard compliant.
The last addition will work in GNU make only (but that is the default
on Linux so there shall not be any problem).
We will change the Makefile
as follows:
PAGES = \
contact \
index \
rules \
teams
PAGES_TMP := $(addsuffix .html, $(PAGES))
PAGES_HTML := $(addprefix out/, $(PAGES_TMP))
We keep only the basename of each page and we compute the output
path. Note that there is :=
used in the computation and
the $(addsuffix
and $(addprefix
are function calls.
Arguments are separated by comma but they operate as if $(PAGES)
was an array.