Lab #10 | NSWI177 | D3S

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Please, see latest news in issue #332 (from June 24).

In this lab we will see how to simplify building complex software and how to effectively do search (and replace) in textual data.

This lab also contains a mini homework for two points.

Reading network configuration

Before diving into the main topics we will do a small detour to a practical thing that comes very useful. And that is how to view network configuration of your machine from the command-line.

We have already seen nmcli but there are other tools. Among them is also ip (from the iproute2 package) that can be used to configure networking as well (though rather on servers than on workstations where NetworkManager is usually the default).

For the following text we will assume your machine is connected to the Internet (this includes your virtualized installation of Linux).

The basic command for setting and reading network configuration is ip.

Probably the most useful one for us at the moment is ip addr.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
    link/ether 54:e1:ad:9f:db:36 brd ff:ff:ff:ff:ff:ff
3: wlp58s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 44:03:2c:7f:0f:76 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.105/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp58s0
       valid_lft 6209sec preferred_lft 6209sec
    inet6 fe80::9ba5:fc4b:96e1:f281/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
8: vboxnet0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff

It lists four interfaces (lo, enp0s31f6, wlp58s0 and vboxnet0) that are available on the machine. Your list will differ as well as the naming.

The name signifies interface type.

lo is the loopback device that will be always present. With loopback device, you can test network applications even without having a “real” connectivity.
enp0s31f6 (often also eth*) is a wired ethernet.
wlp58s0 is a wireless adapter.
vboxnet0 is a virtual network card used by VirtualBox when you create a virtual subnet for your virtual machines (you will probably not have this one there).

If you are connected via VPN, you might also see a tun0 interface.

The state of the interface (up and running or not) is at the same line as the adapter name.

The link/ denotes the MAC address of the adapter. Lines with inet specify the IP address assigned to this interface, including the network. In this example, lo has 127.0.0.1/8 (obviously), enp0s31f6 is without an address (state DOWN) and wlp58s0 has address 192.168.0.105/24 (i.e., 192.168.0.105 with netmask 255.255.255.0).

Your addresses will be slightly different, but typically you will see also a private address (behind a NAT), as you are probably connecting through a router to your ISP.

Regular expressions (a.k.a. regexes)

We already mentioned that systems from the Unix family are built on top of text files. The utilities we have seen so far offered basic operations, but none of them was really powerful. Use of regular expressions will change that.

We will not cover the theoretical details – see the course on Automata and grammars for that. We will view regular expressions as simple tools for matching of patterns in text.

For example, we might be interested in:

lines starting with date and containing HTTP code 404,
files containing our login,
or a line preceding a line with a valid filename.

While regular expressions are very powerful, their use is complicated by the unfortunate fact that different tools use slightly different syntax. Keep this in mind when using grep and sed, for example. Libraries for matching regular expressions are also available in most programming languages, but again beware of variations in their syntax.

The most basic tool for matching files against regular expressions is called grep. If you run grep regex file, it prints all lines of file which match the given regex (with -F, the pattern is considered a fixed string, not a regular expression).

There is a legend that the g in the name stands for “globally”, meaning the whole file, while re is regex, and p is print.

Regex syntax

In its simplest form, a regex searches for the given string (usually in case-sensitive manner).

system

This matches all substrings system in the text. In grep, this means that all lines containing system will be printed.

If we want to search lines starting with this word, we need to add an anchor ^.

^system

If the line is supposed to end with a pattern, we need to use the $ anchor. Note that it is safer to use single quotes in the shell to prevent any variable expansion.

system$

Moreover, we can find all lines starting with either r, s or t using the [...] list.

^[rst]

This looks like a wildcard, but regexes are more powerful and the syntax differs a bit.

For actual searching, we obviously need to pass this regular expression to grep like this (here we search in /etc/passwd):

grep '^[rst]' /etc/passwd

Let us find all three-digit numbers:

[0-9][0-9][0-9]

This matches all three-digit numbers, but also four-digit ones: regular expressions without anchors do not care about surrounding characters at all.

We can also find lines not starting with any of letter between r and z. (The first ^ is an anchor, while the second one negates the set in [].)

^[^r-z]

The quantifier * denotes that the previous part of the regex can appear multiple times or never at all. For example, this finds all lines which consist of digits only (and captures empty lines too!):

^[0-9]*$

Note that this does not require that all digits are the same.

A dot . matches any single character (except newline). So the following regex matches lines starting with super and ending with ious:

^super.*ious$

When we want to apply the * to a more complex subexpression, we can surround it with (...). The following regex matches bana, banana, bananana, and so on:

ba(na)*na

If we use + instead of *, at least one occurrence is required. So this matches all decimal numbers:

[0-9]+

The vertical bar ("|" a.k.a. the pipe) can separate alternatives. For example, we can match lines composed of Meow and Quork:

^(Meow|Quork)*$

The [abc] construct is therefore just an abbreviation for (a|b|c).

Another useful shortcut is the {N} quantifier: it specifies that the preceding regex is to be repeated N times. We can also use {N,M} for a range. For example, we can match lines which contain 4 to 10 lower-case letters enclosed in quotation marks:

^"[a-z]{4,10}"$

Finally, the backslash character changes whether the next character is considered special. The \. matches a literal dot, \* a literal asterisk. Beware that many regex dialects (including grep without further options) require +, (, |, and { to be escaped to make them recognized as regex operators. (You can run grep -E or egrep to activate extended regular expressions, which have all special characters recognized as operators without backslashes.)

grep will terminate with zero exit code only if it matched at least one line.

Therefore, it can be used like this:

if ! echo "$input" | grep 'regex'; then
    echo "Input is not in correct format." >&2
    ...
fi

While regular expressions share some similarities with shell wildcards, they are different beasts. Regular expressions are much more powerful and also much more complicated.

Shell uses only wildcard matching (unless you are using Bash extensions).

Text substitution

The full power of regular expressions is unleashed when we use them to substitute patterns. We will show this on sed (a stream editor) which can perform regular expression-based text transformations.

sed and grep use a slightly different regex syntax. Always check with the man page if you are not sure. Generally, the biggest differences across tools/languages are in handling of special characters for repetition or grouping ((), {}).

In its simplest form, sed replaces one word by another. The command reads: substitute (s), then a single-character delimiter, followed by the text to be replaced (the left-hand side of the substitution), again the same delimiter, then the replacement (the right-hand side), and one final occurrence of the delimiter. (The delimiter is typically :, /, or #, but generally it can be any character that is not used without escaping in the rest of the command.)

sed 's:magna:angam:' lorem.txt

Note that this replaces only the first occurrence on each line. Adding a g modifier (for global) at the end of the command causes it to replace all occurrences:

sed 's:magna:angam:g' lorem.txt

The text to be replaced can be any regular expression, for example:

sed 's:[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]:DATE-REDACTED-OUT:g' lorem.txt

The right-hand side can refer to the text matched by the left-hand side. We can use & for the whole left-hand side or \N for the N-th group (...) in the left-hand side.

The following example transforms the date into the Czech form (DD. MM. YYYY). We have to escape the ( and ) characters to make them act as grouping operators instead of literal ( and ).

sed 's:\([0-9][0-9][0-9][0-9]\)-\([0-9][0-9]\)-\([0-9][0-9]\):\3. \2. \1:g'

Running example for the rest of the lab

We will return again to our website generation example and use it as a running example for the rest of this lab.

We will again use the simpler version that looked like this:

#!/bin/bash

set -ueo pipefail

pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html
./table.py <score.csv | pandoc --template template.html --metadata title="Score" - >score.html

Notice that for index and rules, there are Markdown files to generate HTML from. Page score is generated from a CSV data file.

Setup

Please, create a fork of the web repository so that you can try the examples yourself (we will reuse this repository in one of the next labs, so do not remove it yet).

Motivation for using build systems

In our running example, the whole website is build in several steps where HTML pages are generated from different sources. That is actually very similar to how software is build from sources (consider sources in C language that are compiled and linked together).

While the above steps do not build an executable from sources (as is the typical case for software development), they represent a typical scenario.

Building a software usually consists of many steps that can include actions as different as:

compiling source files to some intermediate format
linking the final executable
creating bitmap graphics in different resolution from a single vector image
generating source-code documentation
preparing localization files with translation
creating a self-extracting archive
deploying the software on a web server
publishing an artefact in a package repository
…

Almost all of them are simple by themselves. What is complex is their orchestration. That is, how to run them in the correct order and with the right options (parameters).

For example, before an installer can be prepared, all other files have to be prepared. Localization files often depend on precompilation of some sources but have to be prepared before final executable is linked. And so on.

Even for small-size projects, the amount of steps can be quite high yet they are – in a sense – unimportant: you do not want to remember these, you want to build the whole thing!

Note that your IDE can often help you with all of this – with a single click. But not everybody uses the same IDE and you may even not have a graphical interface at all.

Furthermore, you typically want to run the build as part of each commit – the GitLab pipelines we use for tests are a typical example: they execute without GUI yet we want to build the software (and test it too). Codifying this in a build script simplifies this for virtually everyone.

Our build.sh script mentioned above is actually pretty nice. It is easy to understand, contains no complex logic and a new member of the team would not need to investigate all the tiny details and can just run the single build.sh script.

The script is nice but it overwrites all files even if there was no change. In our small example, it is no big deal (you have a fast computer, after all).

But in a bigger project where we, for example, compile thousands of files (e.g. look at source tree of Linux kernel, Firefox, or LibreOffice), it matters. If an input file was not changed (e.g. we modified only rules.md) we do not need to regenerate the other files (e.g., we do not need to re-create index.html).

Let’s extend our script a bit.

...

should_generate() {
    local barename="$1"
    # File does not exist ... we should generate it
    if ! [ -e "${barename}.html" ]; then
        return 0
    fi
    # Markdown is newer than HTML ... we should regenerate it
    if [ "${barename}.md" -nt "${barename}.html" ]; then
        return 0
    else
        return 1
    fi
}

...

should_generate index && pandoc --template template.html index.md >index.html
should_generate rules && pandoc --template template.html rules.md >rules.html

...

We can do that for every command to speed-up the web generation.

But.

That is a lot of work. And probably the time saved would be all wasted by rewriting our script. Not mentioning the fact that the result looks horrible. And it is rather expensive to maintain.

Also, we often need to build just a part of the project: e.g., regenerate documentation only (without publishing the result, for example). Although extending the script along the following way is possible, it certainly is not viable for large projects.

if [ -z "${1:-}" ]; then
    ... # build here
elif [ "${1:-}" = "clean" ]; then
    rm -f index.html rules.html score.html
elif [ "${1:-}" = "publish" ]; then
    cp index.html rules.html score.html /var/www/web-page/
else
    ...

Luckily, there is a better way.

There are special tools, usually called build systems that have a single purpose: to orchestrate the build process. They provide the user with a high-level language for capturing the above-mentioned steps for building software.

In this lab, we will focus on make. make is a relatively old build system, but it is still widely used. It is also one of the simplest tools available: you need to specify most of the things manually but that is great for learning. You will have full control over the process and you will see what is happening behind the scene.

`make`

Move into root directory of (the local clone of your fork of) the web example repository first, please.

The files in this directory are virtually the same as in our shell script above, but there is one extra file: Makefile. Notice that Makefile is written with capital M to be easily distinguishable (ls in non-localized setup sorts uppercase letters first).

This file is a control file for a build system called make that does exactly what we tried to imitate in the previous example. It contains a sequence of rules for building files.

We will get to the exact syntax of the rules soon, but let us play with them first. Execute the following command:

make

You will see the following output (if you have executed some of the commands manually, the output may differ):

pandoc --template template.html index.md >index.html
pandoc --template template.html rules.md >rules.html

make prints the commands it executes and runs them. It has built the website for us: notice that the HTML files were generated.

For now, we do not generate the version.inc.html file at all.

Execute make again.

make: Nothing to be done for 'all'.

As you can see, make was smart enough to recognize that since no file was changed, there is no need to run anything.

Update index.md (touch index.md would work too) and run make again. Notice how index.html was rebuilt while rules.html remained untouched.

pandoc --template template.html index.md >index.html

This is called an incremental build (we build only what was needed instead of building everything from scratch).

As we mentioned above: this is not much interesting in our tiny example. However, once there are thousands of input files, the difference is enormous.

It is also possible to execute make index.html to ask for rebuilding just index.html. Again, the build is incremental.

If you wish to force a rebuild, execute make with -B. Often, this is called an unconditional build.

In other words, make allows us to capture the simple individual commands needed for a project build (no matter if we are compiling and linking C programs or generating a web site) into a coherent script.

It rebuilds only things that need rebuilding and, more interestingly, it takes care of dependencies. For example, if scores.html is generated from scores.md that is build from scores.csv, we need only to specify how to build scores.md from scores.csv and how to create scores.html from scores.md and make will ensure the proper ordering.

`Makefile` explained

Makefile is a control file for the build system named make. In essence, it is a domain-specific language to simplify setting up the script with the should_generate constructs we mentioned above.

Unlike most programming languages, make distinguishes tabs and spaces. All indentation in the Makefile must be done using tabs. You have to make sure that your editor does not expand tabs to spaces. It is also a common issue when copying fragments from a web-browser.

(Usually, your editor will recognize that Makefile is a special file name and switch to tabs-only policy by itself.) If you use spaces instead, you will typically get an error like Makefile:LINE_NUMBER: *** missing separator. Stop..

The Makefile contains a sequence of rules. A rule looks like this:

index.html: index.md template.html
    pandoc --template template.html index.md >index.html

The name before the colon is the target of the rule. That is usually a file name that we want to build. Here, it is index.html.

The rest of the first line is the list of dependencies – files from which the target is built. In our example, the dependencies are index.md and template.html. In other words: when these files (index.md and template.html) are modified we need to rebuild index.html.

The third part are the following lines that has to be indented by tab. They contain the commands that have to be executed for the target to be built. Here, it is the call to pandoc.

make runs the commands if the target is out of date. That is, either the target file is missing, or one or more dependencies are newer than the target.

The rest of the Makefile is similar. There are rules for other files and also several special rules.

Special rules

The special rules are all, clean, and .PHONY. They do not specify files to be built, but rather special actions.

all is a traditional name for the very first rule in the file. It is called a default rule and it is built if you run make with no arguments. It usually has no commands and it depends on all files which should be built by default.

clean is a special rule that has only commands, but no dependencies. Its purpose is to remove all generated files if you want to clean up your work space. Typically, clean removes all files that are not versioned (i.e., under Git control).

This can be considered misuse of make, but one with a long tradition. From the point of view of make, the targets all and clean are still treated as file names. If you create a file called clean, the special rule will stop working, because the target will be considered up to date (it exists and no dependency is newer).

To avoid this trap, you should explicitly tell make that the target is not a file. This is done by listing it as a dependency of the special target .PHONY (note the leading dot).

Generally, you can see that make has a plenty of idiosyncrasies. It is often so with programs which started as a simple tool and underwent 40 years of incremental development, slowly accruing features. Still, it is one of the most frequently used build systems. Also, it often serves as a back-end for more advanced tools – they generate a Makefile from a more friendly specification and let make do the actual work.

Exercise

Extend the Makefile to call the generating script for the score.html page. Do not forget to update the all and clean rules.

Solution.

There is an empty out/ subdirectory (it contains only .gitignore that specifies that all files in this directory shall be ignored by Git and thus not shown by git status).

Update the Makefile to generate files into this directory. The reasons are obvious:

The generated files will not clutter your working directory (you do not want to commit them anyway).
When syncing to a webserver, we can specify the whole directory to be copied (instead of specifying individual files).

Solution.

Add a phony target upload that will copy the generated files to a machine in Rotunda. Create (manually) a directory there ~/WWW. Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.

Note that you will need to add the proper permissions for the AFS filesystem using the fs setacl command (recall lab 09).

Solution.

Add generation of PDF from rules.md (using LibreOffice). Note that soffice supports a --outdir parameter.

Think about the following:

Where to place the intermediate ODT file?
Shall there be a special rule for the generation of the ODT file or shall it be done with a single rule with two commands?

Solution.

Improving the maintainability of the `Makefile`

The Makefile starts to have too much of a repeated code.

But make can help you with that too.

Let’s remove all the rules for generating out/*.html from *.md and replace them with:

out/%.html: %.md template.html
      pandoc --template template.html -o $@ $<

That is a pattern rule that captures the idea that HTML is generated from Markdown. the percent sign in the dependencies and target specification represents so called stem – the variable (i.e., changing) part of the pattern.

In the command part, we use make variables. make variables start with dollar as in shell but they are not the same.

$@ is the actual target and $< is the first dependency.

Run make clean && make to verify that even with pattern rules, the web is still generated.

Apart from pattern rules, make also understands (user) variables. They can improve readability as you can separate configuration from commands. For example:

PAGES = \
      out/index.html \
      out/rules.html \
      out/score.html

all: $(PAGES) ...
...

Note that unlike in the shell, variables are expanded by the $(VAR) construct. (Except for the special variables such as $<.)

Non-portable extensions

make is a very old tool that exists in many different implementations. The features mentioned so far should work with any version of make. (At least a reasonably recent one. Old makes did not have .PHONY or pattern rules.)

The last addition will work in GNU make only (but that is the default on Linux so there shall not be any problem).

We will change the Makefile as follows:

PAGES = \
      index \
      rules \
      score

PAGES_TMP=$(addsuffix .html, $(PAGES))
PAGES_HTML=$(addprefix out/, $(PAGES_TMP))

We keep only the basename of each page and we compute the output path. $(addsuffix ...) and $(addprefix ...) are calls to built-in functions. Formally, all function arguments are strings, but in this case, comma-separated names are treated as a list.

Note that we added PAGES_TMP only to improve readability when using this feature for the first time. Normally, you would only have PAGES_HTML assigned directly to this.

PAGES_HTML=$(addprefix out/, $(addsuffix .html, $(PAGES)))

This will prove even more useful when we want to generate a PDF for each page, too. We can add a pattern rule and build the list of PDFs using $(addsuffix .pdf, $(PAGES)).

Tasks to check your understanding

We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.

Graded mini-homework

Note: it might make more sense to start with easier examples below and then return to this task.

Your task is to create a filter for preprocessing a simplifed TAP output. We recommend to use Python but we will accept solutions in Bash (with sed but not AWK or PERL) too.

The input will be of this format:

1..6
ok 1 One - Smoke test (ok=1)
ok 2 One - Works with 0 (ok=1,fail=-2)
ok 3 Two - Smoke test (ok=1)
not ok 4 Two - Empty file (fail=-1)
ok 5 Two - One-byte long file (fail=-2)
not ok 6 Two - Non-existent file (ok=3)

The filter will convert it to the following format.

passed:1:One:Smoke test
passed:1:One:Works with 0
passed:1:Two:Smoke test
failed:-1:Two:Empty file
passed:0:Two:One-byte long file
failed:0:Two:Non-existent file

As you can see, the following transformations took place:

ok and not ok prefix was translated to passed or failed.
Test number is no longer needed (as is no longer needed the very first line).
Test name was split around dash (-) into two columns (i.e. a test suite and the actual test).
Information in parenthesis was used to compute the score in the second column.
- The score can be specified for both passed and failed status (and is considered =0 when missing).
- ok=N means that if the test passed, we assign N points.
- fail=M means that if the test failed, we assign M points (M will be usually negative).

You can assume that the input is well mostly formatted: unrecognized lines are supposed to be silently skipped. Otherwise, we will adhere to the format above. Obviously, the suite and test names will differ as well as the point specification but there will be no skipped tests or extra information (as is in our tests in GitLab CI).

We expect you will use regular expressions in Python together with .split() function on strings.

As a matter of fact, we use a very similar format for some of our tests when grading your homework or exams. We use a library to parse the TAP output (please, do not do that in this example) to handle the extended information but extracting points from test case names is done through a regex.

And just by the way, running your script as ./10/tapscore < input.tap | cut -d : -f 2 | paste -sd+ | bc will nicely sum the total score.

The deadline for this task is May 4.

Please, submit your solution as 10/tapscore into your GitLab repository student-LOGIN. Note that there is intentionally no extension as it is possible to submit Python or Shell implementation.

This task can be checked via GitLab automated tests.

Find all lines in /etc/passwd that contain the digit 9.

Accounts with /sbin/nologin in /etc/passwd are generally system accounts not used by a human user. Print the list of these accounts.

Solution.

Find all lines in /etc/passwd that start with any of the letters A, B, C or D (case-insensitive).

Solution.

Find all lines which contain an even number of characters.

Solution.

Find all e-mail addresses. Assume that a valid e-mail address has a format <s1>@<s2>.<s3>, where each sequence <sN> is a non-empty string of characters from English alphabet and sequences <s1> and <s2> may also contain digits or a dot ..

Solution.

Print all lines containing a word (in English alphabet) which begins with capital letter and all other letters are lowercase. Test that the word TeX will not be matched.

Solution.

Remove all trailing spaces and tabulators.

Solution.

Put every word (non-empty sequence of characters of the English alphabet) in parentheses.

Solution.

Replace “Name Surname” by “Surname, N.”.

Solution.

Delete all empty lines. Hint.

Solution.

Reformat input to contain each sentence on a separate line. Assume that each sentence begins with a capital English letter and ends with ., !, or ?; there may be any number of spaces between sentences. Hint.

Solution.

Write a filter for the output of ip addr that prints device name followed by its IPv4 address and network prefix length.

For an interface which has no IPv4 address assigned, print a special address 0.0.0.0/0 instead.

The example from Lab 08 would be processed into the following output:

lo 127.0.0.1/8
enp0s31f6 0.0.0.0/0
wlp58s0 192.168.0.105/24
vboxnet0 0.0.0.0/0

The case for a missing IP address will probably complicate the control flow of your script a lot. Start with a version that assumes all interfaces have the address assigned.

This example can be checked via GitLab automated tests. Store your solution as 10/netcfg.sh and commit it (push it) to GitLab.

Write a script that normalizes a given path.

The script will accept single argument: the path to normalize. You can safely assume that the argument will be always provided.

The script will normalize the provided path in the following way:

references to the current directory ./ will be removed as they are redundant
references to the parent directory will be removed in such way not to change the actual meaning of the path (possibly repeatedly)
the script will not convert relative to absolute path or vice versa
the script will not check whether the file actually exists

Following examples illustrates the expected behaviour.

/etc/passwd ⇒ /etc/passwd
a/b/././c/d ⇒ a/b/c/d
/a/b/../c ⇒ /a/c
/usr/../etc/ ⇒ /etc/

You can assume that components of the path will not contain new-lines or other special characters such as :, ", ' or any kind of escape sequences.

Hint: sed ':x; s/abb/ba/; tx' causes that s/abb/ba/ is called repeatedly as long as substitution is performed (:x defines a label while tx is a conditional jump to that label if the previous substitution changed the input). Try with echo 'abbbb' | sed ....

The point of the exercises is to check your regex skills, not use of realpath or anything similar.

This example can be checked via GitLab automated tests. Store your solution as 10/normalize.sh and commit it (push it) to GitLab.

Rewrite the netcfg.sh task into Python to learn how regular expressions are used in Python.

This example can be checked via GitLab automated tests. Store your solution as 10/netcfg.py and commit it (push it) to GitLab.

The aspell package provides a spell checker that can be used from the command-line.

Running aspell list --master en will read standard input and print all words with typos to standard output.

Extend your web project to check that there are no typos in the source pages.

This task will extend the running example that we have used through this lab.

We expect you will copy the files to your submission repository where the automated tests are (simply copying the whole directory is fine).

You already did the following (but it is also part of the automated tests of this task):

Generate index.html and rules.html from respective *.md files.
Store the generated files in out/ subdirectory.
clean target removes all files in out/ (except for .gitignore).

As a new feature, we expect you will extend the example with the following:

Move source files to src/ subdirectory. This is a mandatory part, without this move none of the tests will work. We expect you will move the files yourself, i.e. not during the build. The purpose is to make the directory structure a bit cleaner. There should be thus file 10/web/src/index.md committed in your repository.
Generate pages from *.csv files. There is already generation of the score.html from score.csv. We expect you would add your own group-a.csv and group-b.csv files that will be generated to group-a.html and group-b.html files (using the table.py script as for score.csv). group-a.html and group-b.html should be generated by default.
Generate pages from *.bin files. We expect that the file would have the same basename as the resulting .html and it will take care of complete content generation. The test creates from-news.bin script for testing this, your solution must use pattern rules with proper stems.

The example below is intentionally named differently (news.bin) so it does not collide with the file prepared by the test.
Add a phony target spelling that list typos in Markdown files. We expect you will use aspell for this task and use English as the master language.

Hint #1: use PAGES variable to set list of generated files as it simplifies maintenance of your Makefile.

Hint #2: the following is a simple example of a dynamically generated webpage that can be stored inside src/news.bin. The script is a little bit tricky as it contains data as part of the script and uses $0 to read itself (similar trick is often used when creating self-extracting archives for Linux).

#!/bin/bash

set -ueo pipefail

sed '1,/^---NEWS-START-HERE---/d' "$0" | while read -r date comment; do
    echo "<dt>$date</dt>"
    echo "<dd>$comment</dd>"
done | pandoc --template template.html --metadata title="News" -

exit 0

# Actual news are stored below.
# Add each news item on a separate line
#
# !!! Do not modify the line NEWS-START-HERE !!!
#

---NEWS-START-HERE---
2025-05-01 Website running
2025-05-02 Registration open

This example can be checked via GitLab automated tests. Store your solution as 10/web/Makefile and commit it (push it) to GitLab.

Convert the shell builder of an executable (built from C sources) into a make-based build.

The sources are in the examples repository (in 10/cc).

The Makefile you create shall offer the following:

Default target all builds the example executable.
Special target clean removes all intermediary files (*.o) as well as the final executable (example).
Object files (.o) are built for each file separately, we recommend to use a pattern rule.
Object files must depend on the source file (corresponding .c file) as well as on the header file.

Please, commit the source files to your repository as well.

For more complex projects in C the Makefile is often semi-generated (including proper dependencies on included header files). In this task we expect you will specify all the dependencies manually to demonstrate your understanding of make.

There is only one header file so the dependency list will be actually quite short.

This example can be checked via GitLab automated tests. Store your solution as 10/cc/Makefile and commit it (push it) to GitLab.

Learning outcomes and after class checklist

This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

name several steps that are often required to create distributable software (e.g. a package or an installer) from source code and other basic artifacts
explain why software build should be a reproducible process
explain how it is possible to capture a software build
explain concepts of languages that are used for capturing steps needed for software build (distribution)
explain what is a regular expression (regex)

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

build make-based project with default settings
create Makefile that drives build of a simple project
use wildcard rules in Makefile
optional: use variables in a Makefile
optional: use basic GNU extensions to simplify complex Makefiles
create and use simple regular expressions to filter text with grep
perform pattern substitution using sed

This page changelog

2025-04-08: Update AFS permission command.

all: index.html rules.html score.html

...

score.html: score.csv template.html
    ./table.py <score.csv | pandoc --template template.html --metadata title="Score" - >score.html

...

clean:
    rm -f index.html rules.html score.html

Note that you may consider depending on the script too.

score.html: score.csv template.html table.py

tr '\n' ' ' | sed -E 's:([.?!]) *([A-Z]):\1\n\2:g'; echo

We note that it can be done by sed only, since it provides even commands to work with several lines at once, but it is more complicated.

all: out/index.html out/rules.html out/score.html out/main.css

.PHONY: all clean

clean:
    rm -f out/*.html out/*.css

out/index.html: index.md template.html
    pandoc --template template.html index.md >out/index.html

out/rules.html: rules.md template.html
    pandoc --template template.html rules.md >out/rules.html

out/score.html: score.csv template.html table.py
    ./table.py <score.csv | pandoc --template template.html --metadata title="Score" - >out/score.html

out/main.css: main.css
    cp main.css out/

Recall the fs setactl commands that are needed to be executed (just once, no need to put them into the Makefile).

fs setacl ~/WWW www rl
fs setacl ~/. www l

Changes to Makefile are rather minimal.

.PHONY: all clean upload

...

upload:
    scp out/* LOGIN@u-pl1.ms.mff.cuni.cz:WWW/make/

We generate the PDF in two steps. Because the intermediate file has no other dependencies, it would be possible to generate it inside a single rule without any issues. We decided for this mostly because we believe it is a more readable solution.

all: out/index.html out/rules.html out/score.html out/main.css out/rules.pdf

...

out/rules.pdf: tmp/rules.odt
    soffice --headless --convert-to pdf tmp/rules.odt --outdir out/

tmp/rules.odt: rules.md
    mkdir -p tmp/
    pandoc -o tmp/rules.odt rules.md

...

clean:
    rm -rf tmp/ out/*

grep -Ei '[a-z0-9\.]+@[a-z0-9\.]+\.[a-z]+'

Note that this expression covers just the most basic cases. Go to the bottom of this page to see how the full expression according to the RFC 5322 standard looks like.

Reading network configuration

Regular expressions (a.k.a. regexes)

Regex syntax

Text substitution

Running example for the rest of the lab

Setup

Motivation for using build systems

make

Makefile explained

Special rules

Exercise

Improving the maintainability of the Makefile

Non-portable extensions

Tasks to check your understanding

Graded mini-homework

Learning outcomes and after class checklist

Conceptual knowledge

Practical skills

This page changelog

`make`

`Makefile` explained

Improving the maintainability of the `Makefile`