Cvičení: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Toto cvičení se zaměřuje na tzv. build systémy – nástroje pro sestavení – čili program, který zjednodušují proces od zdrojáku k něčemu, co si nainstalují uživatelé. To zahrnuje vytváření binárek z C zdrojáků, generování HTML z Markdownu nebo vytváření náhledů stejné fotky v různém rozlišení.
Sidenote: programs xargs
and find
(and parallel
)
The following two programs can often come handy but we were unable to squeeze them into previous labs about shell scripting. Hence they come here as the build-system topic is quite short and simple.
xargs
xargs
in its simplest form reads standard input and converts it to program
arguments for a user-specified program.
Assume we have the following files in a directory:
2022-04-10.txt 2022-04-16.txt 2022-04-22.txt 2022-04-28.txt
2022-04-11.txt 2022-04-17.txt 2022-04-23.txt 2022-04-29.txt
2022-04-12.txt 2022-04-18.txt 2022-04-24.txt 2022-04-30.txt
2022-04-13.txt 2022-04-19.txt 2022-04-25.txt
2022-04-14.txt 2022-04-20.txt 2022-04-26.txt
2022-04-15.txt 2022-04-21.txt 2022-04-27.txt
The following script removes files that are older than 20 days:
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 2022-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo rm "$filename"
fi
done
This means that the program rm
would be called several times, always
removing just one. Note that we have echo rm
there to not actually remove
the files but to demonstrate the operation. The overhead of starting a new
process could become a serious bottleneck for larger scripts (think about
thousands of files, for example).
It would be much better if we would call rm
just once, giving it a list of
files to remove (i.e., as multiple arguments).
xargs
is the solution here. Let’s modify the program a little bit:
cutoff_date="$( date -d "20 days ago" '+%Y%m%d' )"
for filename in 2022-[01][0-9]-[0-3][0-9].txt; do
date_num="$( basename "$filename" .txt | tr -d '-' )"
if [ "$date_num" -lt "$cutoff_date" ]; then
echo "$filename"
fi
done | xargs echo rm
Instead of removing the file right away, we just print its name and pipe the
whole loop to xargs
where any normal arguments refer to the program to be
launched.
Instead of many lines with rm ...
we will se just one long line with
single invocation of rm
.
Of course, tricky filenames can still cause issues as xargs
assumes that
arguments are delimited by whitespace. (Note that for above, we were safe
as the filenames were reasonable.) That can be changed with --delimiter
.
If you are piping input to xargs
from your program, consider delimiting
items with zero byte (i.e., the C string terminator, \0
). That is the
safest option as this character cannot appear anywhere inside any argument.
And tell xargs
about it via -0
or --null
.
Note that xargs
is smart enough to realize when the command-line would be
too long and splits it automatically (see manual for details).
It is also good to remember that xargs
can execute the command in parallel
(i.e., split the stdin into multiple chunks and call the program multiple
times with different chunks) via -P
. If your shell scripts are getting
slow but you have plenty of CPU power, this may speed things up quite a lot
for you.
parallel
This program can be used to execute multiple commands in parallel, hence speeding up the execution.
parallel
behaves almost exactly as xargs
but runs the individual jobs
(commands) in parallel.
Please, refer to parallel_tutorial(1)
(yes, that is a man page) and for
parallel(1)
for more details.
find
While ls(1)
and wild-card expansion are powerful, sometimes we need to
select files using more sophisticated criteria. There comes the find(1)
program useful. Without any arguments, it lists all files in current
directory, including files in nested directories. Do not run it on root
directory (/
) unless you know what you are doing (and definitely not on
the shared linux.ms.mff.cuni.cz
machine).
With -name
parameter you can limit the search to files matching given
wildcard pattern. Following command finds all alpha.txt
files in current
directory and in any subdirectory (regardless of depth).
find -name alpha.txt
Why the following command for finding all *.txt
files would not work?
find -name *.txt
find
has many options – we will not duplicate its manpage here but
mention those that are worth remembering.
-delete
immediately deletes the found files. Very useful and very
dangerous.
-exec
runs a given program on every found file. You have to use {}
to
specify the found filename and terminate the command with ;
(since ;
terminates commands in shell too, you will need to escape it).
find -name '*.md' -exec wc -l {} \;
Note that for each found file, new invocation of wc
happens. This can be
altered by changing the command terminator (\;
) to +
. See the difference
between invocation of the following two commands:
find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +
Caveats
By default, find
prints one filename per-line. However, filename can even
contain the newline character (!) and thus the following idiom is not 100%
safe.
find -options-for-find | while read filename; do
do_some_complicated_things_with "$filename"
done
If you want to be really safe, use -print0
and IFS= read -r -d $'\0' filename
as that would use the only safe delimiter – \0
(recall what you
have header about C strings – and how they are terminated – in your
Arduino course). Alternatively, you can pipe the output of find -print0
to xargs --null
However, if you are working with your own files or the pattern is safe, the
above loop is fine (just do not forget that directories are files too and
they can contain \n
in their names too).
Note that shell allows you to export a function and call back to it from
inside xargs
.
#!/bin/bash
my_function() {
echo ""
echo "\$0 = $0"
echo "\$@ =" "$@"
}
export -f my_function
find . -print0 | xargs -0 -n 1 bash -c 'my_function "$@"' arg_zero arg_one
make
Move into 13/make
directory first, please. The files in this directory
are virtually the same, but there is one extra file: Makefile
. Notice
that Makefile
is written with capital M to be easily distinguishable (ls
in non-localized setup sorts uppercase letters first).
This file is a control file for a build system called make
that does
exactly what we tried to imitate in the previous example. It contains a
sequence of rules for building files.
We will get to the exact syntax of the rules soon, but let us play with them first. Execute the following command:
make
You will see the following output (if you have executed some of the commands manually, the output may differ):
pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html
make
prints the commands it executes and runs them. It has built the
website for us: notice that the HTML files were generated.
Execute make
again.
make: Nothing to be done for 'all'.
As you can see, make
was smart enough to recognize that since no file was
changed, there is no need to run anything.
Update index.md
(touch index.md
would work too) and run make
again.
Notice how index.html
was rebuilt while rules.html
remained untouched.
pandoc --template template.html index.md >index.html
This is called an incremental build (we build only what was needed instead of building everything from scratch).
As we mentioned above: this is not much interesting in our tiny example. However, once there are thousands of input files, the difference is enormous.
It is also possible to execute make index.html
to ask for rebuilding just
index.html
. Again, the build is incremental.
If you wish to force a rebuild, execute make with -B
. Often, this is
called an unconditional build.
In other words, make
allows us to capture the simple individual commands
needed for a project build (no matter if we are compiling and linking C
programs or generating a web site) into a coherent script. It takes care of
dependencies and executes only commands which are really needed.
Makefile
explained
Makefile
is a control file for the build system named make
. In essence,
it is a domain-specific language to simplify setting up the script with the
[ ".." -nt ".." ]
constructs we mentioned above.
Important:
Unlike typical programming languages, make
makes a difference
between tabs and spaces. All indentation in the Makefile must be done
using tabs. You have to make sure that your editor does not expand tabs to spaces.
It is also a common issue when copying fragments from a web-browser.
(Usually, your editor will recognize that Makefile
is a special
file name and switch to tabs-only policy by itself.)
If you use spaces instead, you will typically get an error like
Makefile:LINE_NUMBER: *** missing separator. Stop.
.
The Makefile contains a sequence of rules. A rule looks like this:
rules.html: rules.md template.html
pandoc --template template.html rules.md >rules.html
The name before the colon is the target of the rule. That is usually a
file name that we want to build. Here, it is rules.html
.
The rest of the first line is the list of dependencies – files from which
the target is built. In our example, the dependencies are rules.md
and
template.html
.
The third part are the following lines that has to be indented by tab.
They contain the commands that have to be executed for the target to be
built. Here, it is the call to pandoc
.
make
runs the commands if the target is out of date. That is, either the
target file is missing, or one or more dependencies are newer than the
target.
The rest of the Makefile
is similar. There are rules for other files and
also several special rules.
Special rules
The special rules are all
, clean
, and .PHONY
. They do not specify
files to be built, but rather special actions.
all
is a traditional name for the very first rule in the file. It is
called a default rule and it is built if you run make
with no
arguments. It usually has no commands and it depends on all files which
should be built by default.
clean
is a special rule that has only commands, but no dependencies. Its
purpose is to remove all generated files if you want to clean up your work
space. Typically, clean
removes all files that are not versioned (i.e.,
under Git control).
This can be considered misuse of make
, but one with a long tradition.
From the point of view of make
, the targets all
and clean
are still
treated as file names. If you create a file called clean
, the special rule
will stop working, because the target will be considered up to date (it
exists and no dependency is newer).
To avoid this trap, you should explicitly tell make
that the target is not
a file. This is done by listing it as a dependency of the special target
.PHONY
(note the leading dot).
Generally, you can see that make
has a plenty of idiosyncrasies. It is
often so with programs which started as a simple tool and underwent 40 years
of incremental development, slowly accruing features. Still, it is one of
the most frequently used build systems. Also, it often serves as a back-end
for more advanced tools – they generate a Makefile
from a more friendly
specification and let make
do the actual work.
Cvičení
One. On your own, extend the Makefile
to call the generating script
(the script is described in the before-class text).
Do not forget to update the all
and clean
rules.
Two. Notice that there is an empty out/
subdirectory
(it contains only .gitignore
that specifies
that all files in this directory shall be ignored
by Git and thus not shown by git status
).
Update the Makefile
to generate files into this directory.
The reasons are obvious:
- The generated files will not clutter your working directory (you do not want to commit them anyway).
- When syncing to a webserver, we can specify the whole directory to be copied (instead of specifying individual files).
Three. Add a phony target upload
that will copy
the generated files to a machine in Rotunda.
Create (manually) a directory there ~/WWW
.
Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.
Note that you will need to add the proper permissions for the AFS filesystem
using the fs setacl
command.
Four. Add generation of PDF from rules.md
(using LibreOffice).
Note that soffice
supports a --outdir
parameter.
Think about the following:
- Where to place the intermediate ODT file?
- Shall there be a special rule for the generation of the ODT file or shall it be done with a single rule with two commands?
Improving the maintainability of the Makefile
The Makefile
starts to have too much of a repeated code.
But make
can help you with that too.
Let’s remove all the rules for generating out/*.html
from *.md
and
replace them with:
out/%.html: %.md template.html
pandoc --template template.html -o $@ $<
That is a pattern rule that captures the idea that HTML is generated from Markdown. Here, the percent sign represents so called stem – the variable (i.e., changing) part of the pattern.
In the command part, we use make
variables (they start with dollar
as in shell) $@
and $<
.
$@
is the actual target and $<
is the first dependency.
Run make clean && make
to verify that even with pattern rules, the web is
still generated.
Apart from pattern rules, make
also understands variables. They can
improve readability as you can separate configuration from commands. For
example:
PAGES = \
out/index.html \
out/rules.html \
out/teams.html
all: $(PAGES) ...
...
Note that unlike in the shell, variables are expanded by the $(VAR)
construct.
Non-portable extensions
make
is a very old tool that exists in many different implementations.
The features mentioned so far should work with any version of make
. (At
least a reasonably recent one. Old make
s did not have .PHONY
or pattern
rules.)
The last addition will work in GNU make only (but that is the default on Linux so there shall not be any problem).
We will change the Makefile
as follows:
PAGES = \
index \
rules \
teams
PAGES_TMP=$(addsuffix .html, $(PAGES))
PAGES_HTML=$(addprefix out/, $(PAGES_TMP))
We keep only the basename of each page and we compute the output
path. $(addsuffix ...)
and $(addprefix ...)
are calls to built-in
functions. Formally, all function arguments are strings, but in this case,
comma-separated names are treated as a list.
Note that we added PAGES_TMP
only to improve readability when using this
feature for the first time. Normally, you would only have PAGES_HTML
assigned directly to this.
PAGES_HTML=$(addprefix out/, $(addsuffix .html, $(PAGES)))
This will prove even more useful when we want to generate a PDF for each
page, too. We can add a pattern rule and build the list of PDFs using
$(addsuffix .pdf, $(PAGES))
.
Hodnocené úlohy (deadline: 22. května)
13/Makefile
(100 bodů)
AKTUALIZACE č. 1: proměnná se má jmenovat COURSES
(omlouvám se za překlep).
AKTUALIZACE č. 2: očekáváme, že použijete GNU Make včetně rozšíření
(jako např. addprefix
).
AKTUALIZACE č. 3:
možná se vám bude hodit tzv. automatická proměnná (automatic variable)
$*
pro pravidlo na generování stránky pro předmět.
Převeďte generování webu v 13/graded
do projektu řízeného Makefilem.
Celé vytvoření webu je řízeno skriptem bin/build.sh
.
Převeďte ho na Makefile
, aby se stránky nevytvářely zbytečně.
Klidně si vytvořte pomocné shellové skripty na části, kde to bude dávat smysl.
Váš Makefile
musí také podporovat cíl clean
, který vymaže vygenerované
soubory
(adresář public_html
může být commitnut s .gitignore
a není nutné řešit
jeho vytváření).
Musíte ale zajistit, že změna jedné stránky nezpůsobí přegenerování všech stránek. Nicméně, když je změněn soubor s metadaty libovolné stránky, menu je přegenerováno (takže se vytváří celý web).
Zkopírujte si složku src/
tak jak je do vašeho projektu (tj. budeme mít
třeba soubor 13/src/NSWI177.meta
ve vašem repozitáři).
Abychom mohli úkol dobře testovat, uložte seznam předmětů do proměnné v
Makefile
.
Potom bude možné totiž vytvořit web jen pro podmnožinu předmětu pomocí make "COURSES=NSWI177"
(prostě jen použijte tuhle proměnnou a to nejjednodušší řešení by mělo začít
rovnou fungovat).
Očekáváme, že seznam kurzů bude určen ručně uvnitř vašeho Makefile
.
Čili, je v pořádku začít váš Makefile
následujícím
(všimněte si, že seznam může být vytvořen s pomocí ls ... | sed ...
takže
nemusíte seznam kódů kopírovat ručně):
COURSES = \
NAIL062 \
NDBI025 \
NDMI002 \
...
I pokud jste zkušený make
r, použijte výše zmíněné přiřazené, aby se dalo
řešení snáze testovat (a nebo se ujistěte, že řešení skutečně prochází
testy).
Učební výstupy
Znalosti konceptů
Znalost konceptů znamená, že rozumíte významu a kontextu daného tématu a jste schopni témata zasadit do většího rámce. Takže, jste schopni …
-
vyjmenovat některé kroky, které jsou potřeba pro vytvoření distribuovatelného softwaru ze zdrojáků
-
vysvětlit, proč je opakovatelnost těchto kroků (tj. kodifikovatelnost) užitečná
-
vysvětlit, proč dává smysl mít pro tento účel speciální jazyk
-
vysvětlit základní koncepty takového programovacího jazyka
Praktické dovednosti
Praktické dovednosti se obvykle týkají použití daných programů pro vyřešení různých úloh. Takže, dokážete …
-
použít Pandoc pro převod mezi různými textovými formáty
-
používat
xargs
-
použít
find
-
sestavit projekty používající
make
-
vytvořit Makefile, který vytvoří webové stránky z Markdownových zdrojů
-
vytvořit Makefile, který použít pravidla s wildcardy
-
zlepšit čitelnost Makefilu použitím proměnných (volitelné)
-
vytvořit jednoduchou šablonu pro Pandoc (volitelné)
-
používat základní rozšíření GNU Make pro zjednodušení Makefilů (volitelné)
-
použít
parallel