Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
This lab focuses on build systems – tools that streamline the process from a source code to a publishable artefact. This includes creating an executable binary from a set of C sources, HTML generation from Markdown sources or creating thumbnails in different resolutions from the same photo.
Sidenote: programs xargs
and find
(and parallel
)
The following two programs can often come handy but we were unable to squeeze them into previous labs about shell scripting. Hence they come here as the build-system topic is quite short and simple.
xargs
xargs
in its simplest form reads standard input and converts it to
program arguments for a user-specified program.
Assume we have the following files in a directory:
2022-04-10.txt 2022-04-16.txt 2022-04-22.txt 2022-04-28.txt
2022-04-11.txt 2022-04-17.txt 2022-04-23.txt 2022-04-29.txt
2022-04-12.txt 2022-04-18.txt 2022-04-24.txt 2022-04-30.txt
2022-04-13.txt 2022-04-19.txt 2022-04-25.txt
2022-04-14.txt 2022-04-20.txt 2022-04-26.txt
2022-04-15.txt 2022-04-21.txt 2022-04-27.txt
The following script removes files that are older than 20 days:
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 2022-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo rm "$filename"
fi
done
This means that the program rm
would be called several times, always
removing just one.
Note that we have echo rm
there to not actually remove the files but to
demonstrate the operation.
The overhead of starting a new process could become a serious bottleneck
for larger scripts (think about thousands of files, for example).
It would be much better if we would call rm
just once, giving it a list
of files to remove (i.e., as multiple arguments).
xargs
is the solution here. Let’s modify the program a little bit:
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 2022-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo "$filename"
fi
done | xargs echo rm
Instead of removing the file right away, we just print its name and pipe the
whole loop to xargs
where any normal arguments refer to the program to be
launched.
Instead of many lines with rm ...
we will se just one long line with single
invocation of rm
.
Of course, tricky filenames can still cause issues as xargs
assumes that
arguments are delimited by whitespace.
(Note that for above, we were safe as the filenames were reasonable.)
That can be changed with --delimiter
.
If you are piping input to xargs
from your program, consider delimiting
items with zero byte (i.e., the C string terminator, \0
).
That is the safest option as this character cannot appear anywhere inside any
argument.
And tell xargs
about it via -0
or --null
.
Note that xargs
is smart enough to realize when the command-line would be
too long and splits it automatically (see manual for details).
It is also good to remember that xargs
can execute the command in parallel
(i.e., split the stdin into multiple chunks and call the program multiple times
with different chunks) via -P
.
If your shell scripts are getting slow but you have plenty of CPU power, this
may speed things up quite a lot for you.
parallel
This program can be used to execute multiple commands in parallel, hence speeding up the execution.
parallel
behaves almost exactly as xargs
but runs the individual jobs
(commands) in parallel.
Please, refer to parallel_tutorial(1)
(yes, that is a man page) and
for parallel(1)
for more details.
find
While ls(1)
and wild-card expansion are powerful, sometimes we need to select
files using more sophisticated criteria.
There comes the find(1)
program useful.
Without any arguments, it lists all files in current directory, including files
in nested directories.
Do not run it on root directory (/
) unless you know what you are doing
(and definitely not on the shared linux.ms.mff.cuni.cz
machine).
With -name
parameter you can limit the search to files matching given wildcard
pattern.
Following command finds all alpha.txt
files in current directory and in any
subdirectory (regardless of depth).
find -name alpha.txt
Why the following command for finding all *.txt
files would not work?
find -name *.txt
find
has many options – we will not duplicate its manpage here but mention
those that are worth remembering.
-delete
immediately deletes the found files.
Very useful and very dangerous.
-exec
runs a given program on every found file.
You have to use {}
to specify the found filename and terminate the command
with ;
(since ;
terminates commands in shell too, you will need to escape it).
find -name '*.md' -exec wc -l {} \;
Note that for each found file, new invocation of wc
happens. This can be altered
by changing the command terminator (\;
) to +
. See the difference between
invocation of the following two commands:
find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +
Caveats
By default, find
prints one filename per-line.
However, filename can even contain the newline character (!) and thus the
following idiom is not 100% safe.
find -options-for-find | while read filename; do
do_some_complicated_things_with "$filename"
done
If you want to be really safe, use -print0
and IFS= read -r -d $'\0' filename
as that would use the only safe delimiter – \0
(recall what you have header about C strings – and how they are terminated –
in your Arduino course).
Alternatively, you can pipe the output of find -print0
to xargs --null
However, if you are working with your own files or the pattern is safe,
the above loop is fine (just do not forget that directories are
files too and they can contain \n
in their names too).
Note that shell allows you to export a function and call back to it from
inside xargs
.
#!/bin/bash
my_function() {
echo ""
echo "\$0 = $0"
echo "\$@ =" "$@"
}
export -f my_function
find . -print0 | xargs -0 -n 1 bash -c 'my_function "$@"' arg_zero arg_one
make
Move into 13/make
directory first, please.
The files in this directory are virtually the same, but there is one
extra file: Makefile
.
Notice that Makefile
is written with capital M to be easily distinguishable
(ls
in non-localized setup sorts uppercase letters first).
This file is a control file for a build system called make
that
does exactly what we tried to imitate in the previous example.
It contains a sequence of rules for building files.
We will get to the exact syntax of the rules soon, but let us play with them first. Execute the following command:
make
You will see the following output (if you have executed some of the commands manually, the output may differ):
pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html
make
prints the commands it executes and runs them.
It has built the website for us: notice that the HTML files
were generated.
Execute make
again.
make: Nothing to be done for 'all'.
As you can see, make
was smart enough to recognize that since
no file was changed, there is no need to run anything.
Update index.md
(touch index.md
would work too) and run make
again.
Notice how index.html
was rebuilt while rules.html
remained
untouched.
pandoc --template template.html index.md >index.html
This is called an incremental build (we build only what was needed instead of building everything from scratch).
As we mentioned above: this is not much interesting in our tiny example. However, once there are thousands of input files, the difference is enormous.
It is also possible to execute make index.html
to ask for rebuilding
just index.html
. Again, the build is incremental.
If you wish to force a rebuild, execute make with -B
.
Often, this is called an unconditional build.
In other words, make
allows us to capture the simple
individual commands needed for a project build
(no matter if we are compiling and linking C programs or generating
a web site) into a coherent script.
It takes care of dependencies and executes only commands
which are really needed.
Makefile
explained
Makefile
is a control file for the build system named make
.
In essence, it is a domain-specific language to simplify setting
up the script with the [ ".." -nt ".." ]
constructs we
mentioned above.
Important:
Unlike typical programming languages, make
makes a difference
between tabs and spaces. All indentation in the Makefile must be done
using tabs. You have to make sure that your editor does not expand tabs to spaces.
It is also a common issue when copying fragments from a web-browser.
(Usually, your editor will recognize that Makefile
is a special
file name and switch to tabs-only policy by itself.)
If you use spaces instead, you will typically get an error like
Makefile:LINE_NUMBER: *** missing separator. Stop.
.
The Makefile contains a sequence of rules. A rule looks like this:
rules.html: rules.md template.html
pandoc --template template.html rules.md >rules.html
The name before the colon is the target of the rule.
That is usually a file name that we want to build.
Here, it is rules.html
.
The rest of the first line is the list of dependencies – files from
which the target is built.
In our example, the dependencies are rules.md
and template.html
.
The third part are the following lines that has to be indented by tab.
They contain the commands that have to be executed for the target to be built.
Here, it is the call to pandoc
.
make
runs the commands if the target is out of date. That is, either the
target file is missing, or one or more dependencies are newer than the target.
The rest of the Makefile
is similar.
There are rules for other files and also several special rules.
Special rules
The special rules are all
, clean
, and .PHONY
.
They do not specify files to be built, but rather special actions.
all
is a traditional name for the very first rule in the file.
It is called a default rule and it is built if you run make
with
no arguments. It usually has no commands and it depends on all files
which should be built by default.
clean
is a special rule that has only commands, but no dependencies.
Its purpose is to remove all generated files if you want to clean up
your work space.
Typically, clean
removes all files that are not versioned
(i.e., under Git control).
This can be considered misuse of make
, but one with a long tradition.
From the point of view of make
, the targets all
and clean
are
still treated as file names. If you create a file called clean
, the
special rule will stop working, because the target will be considered
up to date (it exists and no dependency is newer).
To avoid this trap, you should explicitly tell make
that the target is not
a file. This is done by listing it as a dependency of the special target
.PHONY
(note the leading dot).
Generally, you can see that make
has a plenty of idiosyncrasies.
It is often so with programs which started as a simple tool and underwent
40 years of incremental development, slowly accruing features. Still,
it is one of the most frequently used build systems. Also, it often serves
as a back-end for more advanced tools – they generate a Makefile
from a more friendly specification and let make
do the actual work.
Exercise
One. On your own, extend the Makefile
to call the generating script
(the script is described in the before-class text).
Do not forget to update the all
and clean
rules.
Two. Notice that there is an empty out/
subdirectory
(it contains only .gitignore
that specifies
that all files in this directory shall be ignored
by Git and thus not shown by git status
).
Update the Makefile
to generate files into this directory.
The reasons are obvious:
- The generated files will not clutter your working directory (you do not want to commit them anyway).
- When syncing to a webserver, we can specify the whole directory to be copied (instead of specifying individual files).
Three. Add a phony target upload
that will copy
the generated files to a machine in Rotunda.
Create (manually) a directory there ~/WWW
.
Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.
Note that you will need to add the proper permissions for the AFS
filesystem using the fs setacl
command.
Four. Add generation of PDF from rules.md
(using LibreOffice).
Note that soffice
supports a --outdir
parameter.
Think about the following:
- Where to place the intermediate ODT file?
- Shall there be a special rule for the generation of the ODT file or shall it be done with a single rule with two commands?
Improving the maintainability of the Makefile
The Makefile
starts to have too much of a repeated code.
But make
can help you with that too.
Let’s remove all the rules for generating out/*.html
from *.md
and replace them with:
out/%.html: %.md template.html
pandoc --template template.html -o $@ $<
That is a pattern rule that captures the idea that HTML is generated from Markdown. Here, the percent sign represents so called stem – the variable (i.e., changing) part of the pattern.
In the command part, we use make
variables (they start with dollar
as in shell) $@
and $<
.
$@
is the actual target and $<
is the first dependency.
Run make clean && make
to verify that even with pattern rules,
the web is still generated.
Apart from pattern rules, make
also understands variables.
They can improve readability as you can separate configuration from
commands. For example:
PAGES = \
out/index.html \
out/rules.html \
out/teams.html
all: $(PAGES) ...
...
Note that unlike in the shell, variables are expanded by the $(VAR)
construct.
Non-portable extensions
make
is a very old tool that exists in many different implementations.
The features mentioned so far should work with any version of make
.
(At least a reasonably recent one. Old make
s did not have .PHONY
or pattern rules.)
The last addition will work in GNU make only (but that is the default on Linux so there shall not be any problem).
We will change the Makefile
as follows:
PAGES = \
index \
rules \
teams
PAGES_TMP=$(addsuffix .html, $(PAGES))
PAGES_HTML=$(addprefix out/, $(PAGES_TMP))
We keep only the basename of each page and we compute the output
path. $(addsuffix ...)
and $(addprefix ...)
are calls to built-in
functions. Formally, all function arguments are strings, but in this case,
comma-separated names are treated as a list.
Note that we added PAGES_TMP
only to improve readability when
using this feature for the first time.
Normally, you would only have PAGES_HTML
assigned directly to this.
PAGES_HTML=$(addprefix out/, $(addsuffix .html, $(PAGES)))
This will prove even more useful when we want to generate a PDF for each page, too.
We can add a pattern rule and build the list of PDFs using $(addsuffix .pdf, $(PAGES))
.
Graded tasks (deadline: May 22)
13/Makefile
(100 points)
UPDATE #1: the variable is supposed to be named COURSES
(sorry for
the typo).
UPDATE #2: we expect you would use GNU Make and use its extensions
(such as addprefix
).
UPDATE #3: you may find automatic variable
$*
useful when generating course webpage.
Convert the web page generation in 13/graded
into a Makefile-driven project.
The whole web generation is driven by a bin/build.sh
script. Convert it
to a Makefile
to ensure that pages are not recreated needlessly.
Feel free to create helper shell scripts where it makes sense.
Your Makefile
must also support the clean
target to remove all generated
files (public_html
directory can be committed with .gitignore
and
does not need to be created automatically).
You must ensure that modification of one of the pages does not trigger a rebuild of all of them. However, when metadata are modified for any of the pages, the menu is rebuilt (and hence the whole website needs to be rebuilt).
Copy the src/
folder as-is to your project (e.g. you will have file
13/src/NSWI177.meta
in your repository).
To allow testing, specify the list of courses in a variable inside
the Makefile
.
With that in place, it would be possible to rebuild the website with
a subset of courses with simple make "COURSES=NSWI177"
(just use the variable
and if you keep with the simplest solution, it will probably work out of
the box without extra work).
The list of courses is expected to be specified manually inside your
Makefile
.
That is, feel free to start your Makefile
with the following
(note that the list can be generated with a bit of ls ... | sed ...
so that you do not have to copy all the codes manually):
COURSES = \
NAIL062 \
NDBI025 \
NDMI002 \
...
Even if you are an experienced make
-r, use this form to simplify
testing (or make sure your solution works with our tests).
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
name few steps that are required to make sources into a distributable software
-
explain why making these steps reproducible (i.e. codifying them in some way) is useful
-
explain why it makes sense to have a special programming language for such codification
-
explain basic concepts of such programing languages
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
use Pandoc to convert between various text formats
-
use
xargs
-
use
find
-
build make-based projects with default options
-
create Makefile that builds web pages from Markdown sources
-
create Makefile that use wildcard rules
-
improve readability of Makefile by using variables (optional)
-
use create a trivial template for Pandoc for customized conversion (optional)
-
use basic GNU Make extensions to simplify the Makefiles (optional)
-
use
parallel