Lab #9 | NSWI177 | D3S

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Please, see latest news in issue #332 (from June 24).

Preflight checklist
Running example
Configuration loading (. and source)
Control flow in shell scripts
Script parameters and getopts
The read command
Bigger exercise I
Sidenote: how web pages are published
SCP & rsync
Bigger exercise II
Tasks to check your understanding
Learning outcomes and after class checklist
This page changelog

We will extend our knowledge about shell scripting in this lab. We will introduce control flow constructs and other bits to make our shell scripts more powerful.

Again, we will use a running example for learning the new constructs.

The topic of the second on-site test will also include constructs shown in this lab. The task for the on-site test will be much smaller than our running example but you can expect that you will need to use some constructs shown in this lab.

While this is probably the longest lab based on the number of words, the amount of new concepts is relatively small. So, please, do not be scared by the length of the scrollbar :-).

We will be explicitly using Bash in many of the examples because we are using local variables in most functions. All the samples will work even when your shell does not support this construct: simply drop the local keyword – while we do not create any clashing variable names we believe the local nicely emphasizes the intent of the variable.

Preflight checklist

You can use shell variables and command substitution.
You remember what is the purpose of (program) exit code and you know what the value 0 signifies.
You remember what Pandoc was used for.

Running example

We will return to our example with web generation again. The sources are again in our examples repository but as you can see in 09/web, there are many more pages now and we have split the files into multiple subdirectories.

There is content with input files in Markdown, there is static with the CSS file and possibly other files that would be copied as-is to the web server and there is also templates with Pandoc templates for our pages.

We will now build a decent shell script that would be able to build this web but also copy it to a web server so it is publicly available.

We acknowledge that there are special tools for exactly that. They are called static site generators (or just SSG) and there is a huge amount of them available. But this task offers the right playground to show what shell is capable of :-).

We will start with the trivial builder that is basically a copy of the one from one of the previous labs.

We highly recommend that you copy the fragments from our repository to your repository (feel free to use the submission one) and commit each version. Use git from the command line and use proper commit messages.

And if you create a separate issue for each part that you later close from a commit message, you will also practice good software engineering skills.

Configuration loading (`.` and `source`)

We have already seen that we can modify behavior of our scripts through parameters (recall $1, $2, … variables) or by setting variables before starting them (recall our script from lab 07 and call in the form of html_dir=out_www ./build.sh).

But what if we wanted more such settings? Can we store them in a file?

Actually, that is a pretty common scenario. So let us store the configuration of html_dir into config.rc (the .rc extension is quite common and might refer to runtime configuration).

html_dir=out_www

If we now add the following line to our build.sh script, it would behave as if the contents of config.rc would be part of the main script. The variable would be set and can be used in the rest of the script.

# Both lines are equal, only one of them would be used in reality
. config.rc
source config.rc

The truth is that we give the user much more than a plain configuration file: our config.rc will be executed as a shell script. If commands were in config.rc, they would be executed. If that is not okay, we can always resort to cut and grep to read the configuration but for shell scripts, it is usually fine.

This is actually how your shell is configured. Recall that we have updated ~/.bashrc with EDITOR variable. This file is also sourced when Bash is started and it should by clear by now why it often contains the following snippet:

if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi

If the given file exists (we will get to the proper syntax later in this lab), we source it, i.e. import its content here. Therefore we import the global Bash configuration stored in /etc directory.

The source behaves as if the content of the included file was really pasted instead of the source line. Bash does not have any fancy namespace support or similar.

The . and source special commands can be also used to load library code. For example, you might have the following function in logging.sh file and then you can load it to other scripts, without needing to define the msg function again and again.

msg() {
    echo "$( date '+%Y-%m-%d %H:%M:%S |' )" "$@" >&2
}

Usually files that are expected to be included via source do not have a shebang specified and are usually not executable. That is mostly to emphasize the fact that they are not standalone executables but rather “libraries”.

The same also applies to Python modules: you will usually see shebang in the main program (and x bit set) while actual modules (that you import) are often shebang-less and are rw- only.

Advancing the running example

We will rework our example to a versatile solution where the user will provide a site configuration that our script will read.

We will create the following ssg.rc inside our directory with the webpage.

# My site configuration

site_title="My site"

build_page "index.md"
build_page "rules.md"
build_page "alpha.md"

And we will modify our main script to look like this.

#!/bin/bash

set -ueo pipefail

msg() {
    echo "$( date '+%Y-%m-%d %H:%M:%S | SSG |' )" "$@" >&2
}

get_version() {
    git rev-parse --short HEAD 2>/dev/null || echo unknown
}

build_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".md" ).html"
    msg "Generating $input_file => $output_file"
    pandoc \
        --template templates/main.html \
        --metadata site_title="$site_title" \
        --metadata page_version="$( get_version )" \
        "src/$input_file" >"$output_file"
}

site_title="$( whoami )'s site"

mkdir -p public

source ssg.rc

cp -R static/* public/

What we have created? Our configuration file ssg.rc actually contains a trivial domain-specific language (DSL) that drives website generation. Our main script provides the build_page function that is called from the configuration file.

Inside this function we compute the output filename (try what basename input.md .md does!) and run Pandoc.

Actually, it is a very straightforward piece of code but we managed to split configuration and actual generation into separate files and create a reusable tool. Compare how much work this would be in a different language. Just imagine how much work would it be to parse the configuration file…

Before moving on make sure you understand how the above code works. You should be able to answer the following questions:

Why ssg.rc is not executable (and does not need to be)?
Why is $site_title variable set before sourcing ssg.rc?
What would happen if we would source ssg.rc before the call to mkdir -p public?

Control flow in shell scripts

Before diving into control flow in shell scripts, let us mention that multiple commands can be separated by ; (the semicolon). While in the shell scripts it is preferable to write one command per line, interactive users often find it easier to have multiple commands on one line (even if only to allow faster history browsing with the up arrow).

We will see semicolons at various places in the control flow structures, serving as a separator.

We will introduce the control flow statements rather briefly: their purpose is the same as in other languages. But pay attention how they are controlled: what looks like a condition is actually often a program execution.

`for` loops

For loops in the shell always iterate over a set of values provided when the loop starts.

The general format is as follows:

for VARIABLE in VAL1 VAL2 VAL3; do
    body of the loop
done

Typical uses include iterating over a list of files, often generated by expanding wildcards.

Let us see an example that counts the number of digits in all *.txt files:

for i in *.txt; do
    echo -n "$i: "
    tr -c -d '0-9' <"$i" | wc -c
done

Notice that the for statement is given the variable name i without a $. We also see that variable expansion can be used in redirection of stdin (or stdout).

When writing this in the shell, the prompt would change to plain > (probably, depending on your configuration) to signal that you are expected to enter the rest of the loop.

Squeezing the whole loop into one line is also possible (but useful only for fire-and-forget type of scripts):

for i in *.txt; do echo -n "$i: "; tr -c -d '0-9' <"$i" | wc -c; done

When we want to iterate over values with spaces, we need to quote them. Wildcards expansion is safe in this respect and would work regardless of spaces in the filename.

for i in one "two three"; do
    echo "$i";
done

What does the following code print assuming there is no *.txxt file in the current directory?

for i in *.txxt; do
    echo "$i"
done

Hint.

Answer.

`if` and `else`

The if condition in the shell is a bit more tricky.

The essential thing to remember is that the condition is always a command to be executed and its outcome (i.e., the exit code) determines the result.

So the condition is actually never in the traditional format of a equals b as it is always the exit code that controls the flow.

The general syntax of if-then-else is this:

if command_to_control_the_condition; then
    success
elif another_command_for_else_if_branch; then
    another_success
else
    the_else_branch_commands
fi

Note that if has to be terminated by fi and that elif and else branches are optional.

Simple conditions can be evaluated using the test command that we already know. See man test to inspect what things can be tested.

Let us see how to use if with test to check whether we are inside a Git project:

if test -d .git; then
    echo "We are in the root of a Git project"
fi

In fact, there exists a more elegant syntax: [ (left bracket) is a synonym for test which does the same, except that it requires that the last argument is ]. Using this syntax, our example can look as follows:

if [ -d .git ]; then
    echo "We are in the root of a Git project"
fi

Still, [ is just a regular command whose exit code determines what if shall do.

By the way, look into /usr/bin to see that the application file is really named [. Note that Bash also implements [ as a builtin, so it is a little bit faster than executing an external program.

You can also encounter the following snippet:

if [[ -d .git ]]; then
    echo "We are in the root of a Git project"
fi

This [[ ... ]] is a different construct, closely related to the $(( ... )) syntax for arithmetic expressions. The condition is evaluated by Bash itself. This syntax is a little bit more powerful, but it is limited to recent versions of Bash, so it is unlikely to work in other shells.

We will be using the traditional variant with [ only.

`while` loops

While loops have the following form:

while command_to_control_the_loop; do
    commands_to_be_executed
done

Again, the condition is true if the command_to_control_the_loop returns with exit code 0.

The following example finds the first available name for a log file. Note that this code is not immune against races when executed concurrently. That is, it assumes it can be run multiple times, but never in more processes at the same time.

counter=1
while [ -f "/var/log/myprog/main.$counter.log" ]; do
    counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/main.$counter.log"
echo "Will log into $logfile" >&2

To make the program race-resistant (i.e., against concurrent execution), we would need to use mkdir that fails when the directory already exists (i.e., it is atomic enough to distinguish if we were successful and are not just stealing someone else’s file).

Note that it uses exclamation mark ! to invert the program outcome.

counter=1
while ! mkdir "/var/log/myprog/log.$counter"; do
    counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/log.$counter/main.log"
echo "Will log into $logfile" >&2

`break` and `continue`

As in other languages, the break command is available to terminate the currently executing loop. You can use continue as usual, too.

Switch (a.k.a. `case ... esac`)

When we need to branch our program based on a variable value, shell offers the case construct. It is somehow similar to the switch construct in other languages, but it has a bit of shell specifics mixed in.

The overall syntax is the following:

case value_to_branch_on in
    option1) commands_for_option_one ;;
    option2) commands_for_option_two ;;
    *) the_default_branch ;;
esac

Notice that like with if, we terminate with the same keyword reversed and that there are two semicolons ;; to terminate the commands for a particular option.

The options can contain wildcards and | to make the matching a bit more flexible.

A simple example can look like this:

case "$EDITOR" in
    mc|mcedit) echo 'Midnight Commander rocks' ;;
    joe) echo 'Small but powerful' ;;
    emacs|vi*) echo 'Wow :-)' ;;
    *) echo "Someone really uses $EDITOR?" ;;
esac

Advancing the running example

Armed with the knowledge about control flow available, we can make our script for site generation even better.

We will remove the burden of specifying the list of files manually and find the files ourselves. Therefore, the user configuration file would be completely optional.

We will change our script as follows.

build_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".md" ).html"
    msg "Generating $input_file => $output_file"
    pandoc \
        --template templates/main.html \
        --metadata site_title="$site_title" \
        --metadata page_version="$( get_version )" \
        "$input_file" >"$output_file"
}

...

if [ -f ssg.rc ]; then
    source ssg.rc
fi

for page in src/*.md; do
    if ! [ -f "$page" ]; then
        continue
    fi

    build_page "$page"
done

We have modified build_page to not prepend src when running pandoc and we iterate over the Markdown files by ourselves.

The ! reverts the meaning of the exit code, i.e. behaves as boolean not.

Why do we test for -f inside the loop? Answer.

And yes: we have modified the script quite a lot. That is normal. You will often have just a vague idea on what you need to have. You build from simple scenarios, extending on the way as needed.

Redirection of bigger shell portions

The whole control structure (e.g, for, if, or while with all the commands inside) behaves as a single command. So you can apply redirection to the whole structure.

To illustrate this, we can transform the message to upper case like this.

if test -d .git; then
    echo "We are in a root of a Git project"
else
    echo "This is not a root of a Git project"
fi | tr 'a-z' 'A-Z'

Script parameters and `getopts`

Recall that when a shell script receives parameters, we can access them via special variables $1, $2, $3. There is also $@ for accessing all parameters (recall that $@ must be quoted to work properly (the explanation is beyond the scope of this course)).

The special variable $# contains the number of arguments on the command-line and $0 refers to the actual script name.

`getopts`

When our script needs one argument, accessing $1 directly is fine. When you want to recognize options, parsing of arguments becomes more complicated. Shell offers a getopts command that is able to handle command-line parsing for you.

While getopts is the standard way to handle switches, it is not a very user-friendly one. On the other hand, using getopts is not something you are supposed to remember: it is exactly the piece of code that you will copy from script to script and just update it when needed.

Unfortunately, getopts is unable to handle long options (e.g., --version). There is a non-standard extension called getopt (without the S at the end) that supports long options too but it is not available in all environments.

Let us show you how getopts can be used on a simple script: it will accept list of files and converts them via Pandoc into HTML (to standard output). It will also support -V to prints its version and -o to specify alternate output file (instead of stdout).

getopts does not allow mixing of switches/options (e.g., -o) and normal arguments. When executing the script as ./example.sh *.txt -o out.html, the parsing will stop at first file and -o and out.html will be treated as normal filenames.

The specification of the getopts switches is simple. We list switch names, those that require an argument are followed by a colon. The last argument is a variable name (without dollar!) where the option would be stored.

getopts "Vho:" opt

The command will be returning 0 return value (exit code) as long as there are switches to be processed. Once finished, $OPTIND variable will tell us how many parameters were actually processed.

#!/bin/sh

usage() {
    echo "Usage: $1 [-V] [-o filename] [-h] input [files]"
    echo " -V          Print program version and terminate."
    echo " -o filename Store output into filename instead to stdout."
    echo " -h          Print this help and exit."
}

output_file="/dev/stdout"
print_version=false

while getopts "Vho:" opt; do
    case "$opt" in
        h)
            usage "$0"
            exit 0
            ;;
        V)
            print_version=true
            ;;
        o)
            output_file="$OPTARG"
            ;;
        *)
            usage "$0" >&2;
            exit 1
            ;;
    esac
done
shift $(( OPTIND - 1))

if $print_version; then
    echo "My script, version 0.0.1"
    exit 0
fi

cat "$@" | pandoc -t html >"$output_file"

Several parts of the script deserve explanation.

true and false are not boolean values, but they can be used as such. Recall how we have used them in lab 07 (there really are /bin/true and /bin/false).

exit immediately terminates a shell script. The optional parameter denotes the exit code of the script.

shift is a special command that shifts the variables $1, $2, … by one. After shift, $3 becomes $2, $2 becomes $1 and $1 is lost. "$@" is modified accordingly. With a parameter, it shifts multiple times.

We then pass all the arguments to cat. Note that this will work even when no parameters are passed and the script will read standard input. Why? Answer.

Advancing the running example

We will modify our script to accept -w so that the program keeps watching the src/*.md for modifications and regenerates the web on each such change.

If you are not using the disk from us, you will need to install inotifywait program that is usually part of package named inotify-tools.

We first include the change to use getopts and then we add the support for -w.

#!/bin/bash

set -ueo pipefail

usage() {
    echo "Usage: ..."
}

msg() {
    echo "$( date '+%Y-%m-%d %H:%M:%S | SSG |' )" "$@" >&2
}

get_version() {
    git rev-parse --short HEAD 2>/dev/null || echo unknown
}

build_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".md" ).html"
    $LOGGER "Generating $input_file => $output_file"
    pandoc \
        --template templates/main.html \
        --metadata site_title="$site_title" \
        --metadata page_version="$( get_version )" \
        "$input_file" >"$output_file"
}

generate_web() {
    for page in src/*.md; do
        if ! [ -f "$page" ]; then
            continue
        fi
        build_page "$page"
    done

    cp -R static/* public/
}

LOGGER=:
watch_for_changes=false

while getopts "hvw" opt; do
    case "$opt" in
        h)
            usage "$0"
            exit 0
            ;;
        v)
            LOGGER=msg
            ;;
        w)
            watch_for_changes=true
            ;;
        *)
            usage "$0" >&2;
            exit 1
            ;;
    esac
done
shift $(( OPTIND - 1))

site_title="$( whoami )'s site"

mkdir -p public

if [ -f ssg.rc ]; then
    source ssg.rc
fi

generate_web

To actually support -w for will use inotifywait which is a special program that receives list of files and terminates when one of the files is modified. Therefore, the script will effectively do nothing until file is modified as inotifywait will “block” its execution.

We will add the following to our script to run indefinitely, watching for changes and rebuilding the web automatically. Hit Ctrl-C to actually terminate the execution when started with -w.

...

if [ -f ssg.rc ]; then
    source ssg.rc
fi

generate_web

if $watch_for_changes; then
    while true; do
        $LOGGER "Waiting for file change..."
        inotifywait -e modify src/* src static static/*
        generate_web
    done
fi

getopts is perhaps not very user friendly but it is a good enough tool for most scripts.

Start adding the minimal skeleton we have shown into all of your scripts. The advantage of running ./your-script.sh -h to see a short help is well worth the effort.

The `read` command

So far our scripts either did not needed standard input at all or they passed it completely to other programs.

But it is possible to also read standard input line by line in shell if you need to process lines separately.

When a shell script needs to read from stdin into a variable, there is the read built-in command:

read FIRST_LINE <input.txt
echo "$FIRST_LINE"

Typically, read is used in a while loop to iterate over the whole input. read is also able to split the line to fields on white space and assign each field in a different variable.

Considering we have an input of this format, the following loop computes the average of the numbers.

/dev/sdb 1008
/dev/sdb 1676
/dev/sdc 1505
/dev/sdc 4115
/dev/sdd 999

count=0
total=0
while read device duration; do
    count=$(( count + 1 ))
    total=$(( total + duration ))
done
echo "Average is about $(( total / count ))."

As you can guess from the above snippet, read returns 0 as long as it is able to read into the variables. Reaching the end of the file is announced by a non-zero exit code (return value).

read can be sometimes too smart about certain inputs. For example, it interprets backslashes. You can use read -r to suppress this behavior.

Other notable parameters are -t or -p: use read --help to see their description.

If we want to read from a specific file (assuming its filename is stored in variable $input), we can also redirect input to the whole loop and write the script like this:

while read device duration; do
    count=$(( count + 1 ))
    total=$(( total + duration ))
done <"$input"

That is actually quite common use for the while read pattern.

Check you understand how `read` works

Assume we have the following text file data.txt.

ONE
TWO

We also have the following script reader.sh:

#!/bin/sh

set -ueo pipefail

read -r data_one <data.txt
read -r data_two <data.txt
read -r stdin_one
read -r stdin_two

echo "data_one=${data_one}"
echo "data_two=${data_two}"
echo "stdin_one=${stdin_one}"
echo "stdin_two=${stdin_two}"

Select all true statements about output of the following invocation.

./reader.sh <data.txt

You need to have enabled JavaScript for the quiz to work.

Bigger exercise I

We will use the implementation later on in our running example (but not yet).

Imagine we have an input data with match results in the following format (team, goals shot, colon, goals shot by the other team, other team).

alpha 2 : 0 bravo
bravo 0 : 1 charlie
alpha 5 : 4 charlie

Write a shell script that prints a table with summarized results.

Assign 3 points for victory, 1 point for a tie. Your program does not need to handle the situation when two teams have the same amount of points.

Solution

We start with a function that receives two arguments – goals shot by each side – and prints the amount of points assigned.

get_points() {
    local goals_mine="$1"
    local goals_opponent="$2"
    if [ "$goals_mine" -eq "$goals_opponent" ]; then
        echo 1
    elif [ "$goals_mine" -gt "$goals_opponent" ]; then
        echo 3
    else
        echo 0
    fi
}

Other function then computes points for each match.

preprocess_scores() {
    local team_one team_two
    local goals_one goals_two

    while read -r team_one goals_one colon goals_two team_two; do
        if [ "$colon" != ":" ]; then
            echo "WARNING: ignoring invalid line $team_one $goals_one $colon $goals_two $team_two" >&2
            continue
        fi
        echo "$team_one" "$( get_points "$goals_one" "$goals_two" )"
        echo "$team_two" "$( get_points "$goals_two" "$goals_one" )"
    done
}

These two functions together transform the input into the following:

alpha 3
bravo 0
bravo 0
charlie 3
alpha 3
charlie 0

On this, we can call our well-known group_sum.py script or write it in shell ourselves. For shell implementation, we will expect that the data are already sorted by key to simplify the implementation.

sum_by_sorted_keys() {
    local key value
    local prev_key=""
    local sum=0

    while read -r key value; do
        if [ "$key" != "$prev_key" ]; then
            if [ -n "$prev_key" ]; then
                echo "$prev_key $sum"
            fi
            prev_key="$key"
            sum=0
        fi
        sum=$(( sum + value ))
    done
    if [ -n "$prev_key" ]; then
        echo "$prev_key $sum"
    fi
}

Why do we need to expect data sorted? Can’t we just sort them ourselves? Would the following modification (only this one line changed) work?

    # replacing "while read -r key value; do"
    sort | while read -r key value; do

Answer.

What change inside this function would work then? Answer.

Together these functions provide the building blocks to solve the whole puzzle:

preprocess_scores | sum_by_keys | sort -n -k 2 -r | column -t

It is a matter of opinion if this task would be better solved in a different programming language. It all depends on the context and on other requirements.

Shell usually excels in situations where we need to combine data from multiple files that are in some textual (preferably line-oriented) format.

The advantage of shell is in its interactivity. Even the functions can be defined interactively (i.e. not stored in any file first) and one can easily build the final pipeline incrementally, checking the output after adding each step.

Sidenote: how web pages are published

We will now perform a small detour to the area of (history of) website publishing. Publishing a website today generally means renting a webspace where you can either upload your HTML (or PHP) files or even renting a configured instance of your web application, such as Wordpress.

Traditionally you often also received a webspace as part of your unix account on some shared machine. The setup was usually done in such way that whatever appeared in your $HOME/public_html was available under the page example.com/~LOGIN.

You might have encountered such pages, typically for university pages of individual professors.

With the advance of virtualization (and cloud) it became easier to not give users access as real users but insert another layer where user can manipulate only certain files without having shell access at all.

Web pages on lab machines

Our lab machines (e.g. u-pl* ones) also offer this basic functionality.

SSH into one of these (recall the list from 05) and create a directory ~/WWW.

Create a simple HTML file in WWW (skip if you already uploaded some files before).

echo '<html><head><title>Hello, World!</title><body><h1>Hello, World!</h1></body></html>' >index.html

Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.

Note that you will need to add the proper permissions for the AFS filesystem using the fs setacl command.

fs setacl ~/WWW www rl
fs setacl ~/. www l

SCP & rsync

In order to copy files between two Linux machines, we can use scp. Internally, it establishes a SSH connection and copies the files over it.

The syntax is very simple and follows the semantics of a plain cp:

scp local_source_file.txt user@remote_machine:remote_destination_file.txt
scp user@remote_machine:remote_source_file.txt local_destination_file.txt

SCP issues

For those who care about security (everyone, right?) we should note that that the SCP protocol has some security vulnerabilities in several (but rather very specific) scenarios. These can be used to attack your local computer while connecting to a malicious server.

SCP is actually a very old protocol, which is showing its age. Better replacements include SFTP (beware that this is different from FTPS – FTP over SSL/TLS) and Rsync.

More information on this topic can be found on LWN.net and in this StackOverflow thread.

Rsync

A much more powerful tool for copying of files is rsync. Similarly to scp, it runs over a SSH connection, but it has to be installed at both sides (but usually that is not a problem).

It can copy whole directory trees, handle symlinks, access rights, and other file attributes. It can also detect that some of the files are already present at the other side (either exactly or approximately) and transfer just the differences.

The syntax of a simple copy follows cp and scp, too:

rsync local_source_file.txt user@remote_machine:remote_destination_file.txt
rsync local_source_file.txt user@remote_machine:remote_destination_directory/

Advancing the running example

Use the rsync manual page and extend the running example to include -u that will upload the generated files to a remote server specified in $rsync_target.

You can use LOGIN@u-pl1.ms.mff.cuni.cz:WWW/nswi177 as a reasonable target for uploading to the Rotunda machines.

Solution.

Bigger exercise II

We have created the script to compute the scoring table. It would be nice to generate it during web generation.

Extend our running example with the following functionality. Each *.bin file in src/ would be treated as a script that will be executed and its output stored to HTML file with the same name.

Try this on your own first before looking at our solution.

Recall that file extension is not important and .bin is generic enough to hide any (interpreted) programming language (as long as the script has proper shebang). As a matter of fact, it will work for a compiled (C, Rust and similar) programs too.

The following script is somehow touching the borderline between where a shell script is good enough and where using a more sophisticated language (in terms of offered data structures and types) might be better.

Some parts are clearly those where shell excels – working with many files, calling external programs etc. In some areas the solution is somewhat fragile.

Sometimes the best approach is to sketch a quick prototype in shell like we have done to actually learn what functions we really need.

Solution

The change is relatively simple. We have also renamed build_page to build_markdown_page for better clarity.

build_dynamic_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".bin" ).html"
    $LOGGER "Generating $input_file => $output_file"
    "$input_file" >"$output_file"
}

generate_web() {
    local page
    for page in src/*.md; do
        if ! [ -f "$page" ]; then
            continue
        fi
        build_markdown_page "$page"
    done

    local script
    for script in src/*.bin; do
        if ! [ -f "$script" -a -x "$script" ]; then
            continue
        fi
        build_dynamic_page "$script"
    done

    cp -R static/* public/
}

And we can extend our table generation script into the following.

...

as_markdown_table() {
    echo
    echo '| Team | Points |'
    echo '| ---- | -----: |'
    while read team score; do
        echo '|' "$team" '|' "$score" '|'
    done
    echo
}

. ssg.rc

(
    echo '---'
    echo 'title: Scoring table'
    echo '---'

    echo '# Scoring table'

    preprocess_scores <scores.txt | sum_by_keys | sort -n -k 2 -r | as_markdown_table
)  | pandoc \
        --template templates/main.html \
        --metadata site_title="$site_title" \
        --metadata page_version="$( git rev-parse --short HEAD 2>/dev/null || echo unknown )"

Improving the script further

Does it look good?

Hardly. There is the repeated fragment of calling Pandoc. Our site generator is not perfect.

Let us improve it.

As a second version, extend it so that it distinguishes *.md.bin and *.html.bin scripts and those with .html.bin extension are expected to generate HTML directly while .md.bin will generate Markdown that we will process ourselves.

Solution

...

pandoc_as_filter() {
    pandoc \
        --template templates/main.html \
        --metadata site_title="$site_title" \
        --metadata page_version="$( get_version )" \
        "$@"
}

build_markdown_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".md" ).html"
    $LOGGER "Generating $input_file => $output_file"
    pandoc_as_filter "$input_file" >"$output_file"
}

build_dynamic_html_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".html.bin" ).html"
    $LOGGER "Generating $input_file => $output_file"
    "$input_file" >"$output_file"
}

build_dynamic_markdown_page() {
    local input_file="$1"
    local output_file="public/$( basename "$input_file" ".md.bin" ).html"
    $LOGGER "Generating $input_file => $output_file"
    "$input_file" | pandoc_as_filter >"$output_file"
}

generate_web() {
    local page
    for page in src/*.md; do
        if ! [ -f "$page" ]; then
            continue
        fi
        build_markdown_page "$page"
    done

    local script
    for script in src/*.md.bin; do
        if ! [ -f "$script" -a -x "$script" ]; then
            continue
        fi
        build_dynamic_markdown_page "$script"
    done
    for script in src/*.html.bin; do
        if ! [ -f "$script" -a -x "$script" ]; then
            continue
        fi
        build_dynamic_html_page "$script"
    done

    cp -R static/* public/
}

...

And our table generation script table.md.bin can be significantly simplified.

...

echo '---'
echo 'title: Scoring table'
echo '---'

echo '# Scoring table'

preprocess_scores <scores.txt | sum_by_keys | sort -n -k 2 -r | as_markdown_table

Last improvement

As a last exercise, extend our script to support building reusable scripts. Before we run the *.bin scripts, we should extend $PATH with bin/ directory in our SSG directory.

Why do we want to do that? At this moment, the scoring table path is hard-coded inside the script and the script is not usable for multiple tables (imagine there are two groups running). If we would have the script in $PATH, we can store the scores as a script with the following shebang and thus reuse the script for multiple tables.

#!/usr/bin/env score_table.sh
alpha 2 : 0 bravo
bravo 0 : 1 charlie
alpha 5 : 4 charlie

Certainly this is bordering on the abuse of shebang as we are turning a data-file into a script but there might be other use cases than our primitive SSG where such extension would make sense.

Here, take it as an exercise to refresh your memory about env, shebangs and $PATH.

Solution

The changes are actually trivial.

build_dynamic_html_page() {
    ...
    env PATH="$PATH:$PWD/bin" "$input_file" >"$output_file"
}

build_dynamic_markdown_page() {
    ...
    env PATH="$PATH:$PWD/bin" "$input_file" | pandoc_as_filter >"$output_file"
}

And the bin/score_table.sh would be modified on single line too.

grep -v '#' "$1" | preprocess_scores | sum_by_keys | sort -n -k 2 -r | as_markdown_table

We drop all lines containing # which certainly drops the shebang and we do not expect team name to contain hash sign (later on, we will see regular expressions that would allow more precise filtering but this is fine for now).

Tasks to check your understanding

We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.

The program magick from ImageMagick can convert images between formats using magick source.png target.jpg (with almost any file extensions).

Note that older versions used the command convert that is now depracated.

Convert all PNG images (with extension .png) in the current directory to JPEG (extension .jpg).

Solution.

Extend the previous example to not overwrite existing files.

Solution.

Last example with ImageMagick. With -resize 800x600 it can resize the image to fit in the given envelope.

Create a tool that creates thumbnails from files provided on the command-line, transforming file name from dir/filename.ext to dir/filename.thumb.ext.

Solution.

Extend our scoring table script from the running example to handle correctly situations when the amount of points is the same and we need to distinguish by the amount of shot goals (which is quite a common rule in many tournaments).

Therefore, from the following data we would like to compute a slightly different table.

alpha 2 : 0 bravo
bravo 7 : 1 charlie
alpha 1 : 4 charlie

| Team | Points | Goals |
| ---- | -----: | ----: |
| bravo | 3 | 7 |
| charlie | 3 | 5 |
| alpha | 3 | 3 |

Solution.

Write a shell script for drawing a labeled barplot. The user would provide data in the following format:

12  First label
120 Second label
1 Third label

The script will print graph like this:

First label (12)   | #
Second label (120) | #######
Third label (1)    |

The script will accept input filename as the first argument and will adjust the width of the output to the current screen width. It will also align the labels as can be seen in the plot above.

You can safely assume that the input file will always exist and that it will be possible to read it multiple times. No other arguments need to be recognized.

Hints

Screen width is stored in the variable $COLUMNS. Default to 80 if the variable is not set. (You can assume it will be either empty (not set) or contain a valid number).

The plot should be scaled to fill the whole width of the screen (i.e. scaled up or down).

You can squeeze all consecutive spaces to one (even for labels), the first and second column are separated by space(s).

See what wc -L does.

Note that the first tests use labels of the same length to simplify writing the first versions of the script.

Consider using printf for printing the aligned labels.

The following ensures that bc computes with fractional numbers but the result is displayed as an integer (which is useful for further shell computations).

echo 'scale=0; (5 * 2.45) / 1' | bc -l

Examples

2 Alpha
4 Bravo
# COLUMNS=20
Alpha (2) | ####
Bravo (4) | ########

2 Alpha
4 Bravo
16 Delta
# COLUMNS=37
Alpha (2)  | ###
Bravo (4)  | ######
Delta (16) | ########################

This example can be checked via GitLab automated tests. Store your solution as 09/barplot.sh and commit it (push it) to GitLab.

Create a script for listing file sizes.

The script would partially mimic behaviour of ls: without arguments it lists information about files in the current directory, when some arguments are provided, they are treated as list of files to print details about.

Example run can look like this:

./09/dir.sh /dev/random 09/dir.sh 09

/dev/random  <special>
09/dir.sh          312
09               <dir>

The second column will display file size for normal files, <dir> for directories and <special> for any other file. File size can be read through the stat(1) utility.

Nonexistent files should be announced as FILENAME: no such file or directory. to stderr.

You can safely assume that you will have access to all files provided on the command-line.

You will probably find the column utility useful, especially the following invocation:

column --table --table-noheadings --table-columns FILENAME,SIZE --table-right SIZE

You can assume that these filenames will be reasonable (e.g. without spaces). To simplify things, we will not check exit code to be different when some of the files were not found.

This example can be checked via GitLab automated tests. Store your solution as 09/dir.sh and commit it (push it) to GitLab.

Recall the ping tool we discussed in lab 05.

Your task is to create a tool that accepts the following arguments and prints host status based on ping (of course, you need to use ping in mode when it sends single request only and timeouts quickly).

-d that accepts string used to delimit the output columns, defaults to space
-v when it prints output of ping to standard error output (by default the output of ping is not printed at all)
-w to specify a different timeout than the default of one second (this parameter is not covered by the automated tests)

Normal parameters are DNS names or IP address to contact via ping and print their status.

The tool exit code denotes the amount of DOWN machines (you can safely assume that there will never be more than 126 of parameters and you do not have to handle whether exit code is a signed or unsigned byte etc.).

We expect you will use getopts to handle the command-line options.

Following examples show invocation with different parameters and expected output.

Default execution

09/ping.sh seznam.cz google.com google.comx

seznam.cz UP
google.com UP
google.comx DOWN

Use of `-d` and `-v`

09/ping.sh seznam.cz -d : -v google.com

Note that the output mixes stdout and stderr.

PING seznam.cz (77.75.77.222) 56(84) bytes of data.
64 bytes from www.seznam.cz (77.75.77.222): icmp_seq=1 ttl=56 time=4.46 ms

--- seznam.cz ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 4.460/4.460/4.460/0.000 ms
seznam.cz:UP
PING google.com (142.251.36.78) 56(84) bytes of data.
64 bytes from prg03s10-in-f14.1e100.net (142.251.36.78): icmp_seq=1 ttl=114 time=3.64 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 3.642/3.642/3.642/0.000 ms
google.com:UP

This example can be checked via GitLab automated tests. Store your solution as 09/ping.sh and commit it (push it) to GitLab.

Learning outcomes and after class checklist

This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain how program exit code is used to drive control flow in shell scripts
explain what commands are executed and how is evaluated a shell construct if true; then echo "true"; fi
explain what considerations are important when deciding between use of shell vs Python

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

use control flow in shell scripts (for, while, if, case)
use read command
use getopt for parsing command line arguments
use . and source to load functions from different files
use scp to copy individual files to (or from) a remote machine
optional: use rsync to synchronize whole directories

This page changelog

2025-04-08: Update AFS permission command.
2025-04-14: Warning about --delete with rsync.
2025-04-23: Fix wrong example in 09/ping.sh.

for filename in "$@"; do
    ext="$( echo "$filename" | rev | cut -d . -f 1 | rev )"
    base="$( echo "$filename" | rev | cut -d . -f 2- | rev )"
    target="${base}.thumb.${ext}"
    magick "$filename" -resize 800x600 "$target"
done

#!/bin/bash

set -ueo pipefail

get_points() {
    local goals_mine="$1"
    local goals_opponent="$2"
    if [ "$goals_mine" -eq "$goals_opponent" ]; then
        echo 1
    elif [ "$goals_mine" -gt "$goals_opponent" ]; then
        echo 3
    else
        echo 0
    fi
}

preprocess_scores() {
    local team_one team_two
    local goals_one goals_two

    while read -r team_one goals_one colon goals_two team_two; do
        if [ "$colon" != ":" ]; then
            echo "WARNING: igoring invalid line $team_one $goals_one $colon $goals_two $team_two" >&2
            continue
        fi
        echo "data points $team_one" "$( get_points "$goals_one" "$goals_two" )"
        echo "data goals $team_one $goals_one"
        echo "data points $team_two" "$( get_points "$goals_two" "$goals_one" )"
        echo "data goals $team_two $goals_two"
    done
}

sum_by_keys() {
    local key value
    local prev_key=""
    local sum=0

    sort | (
        while read -r key value; do
            if [ "$key" != "$prev_key" ]; then
                if [ -n "$prev_key" ]; then
                    echo "$prev_key $sum"
                fi
                prev_key="$key"
                sum=0
            fi
            sum=$(( sum + value ))
        done
        if [ -n "$prev_key" ]; then
            echo "$prev_key $sum"
        fi
    )
}

as_markdown_table() {
    echo
    echo '| Team | Points | Goals |'
    echo '| ---- | -----: | ----: |'
    while read -r team points goals; do
        echo '|' "$team" '|' "$points" '|' "$goals" '|'
    done
    echo
}

echo '---'
echo 'title: Scoring table'
echo '---'

echo '# Scoring table'

my_temp="$( mktemp -d )"
grep -v '#' "$1" | preprocess_scores >"$my_temp/scores.txt"
grep "data points " "$my_temp/scores.txt" | cut -d ' ' -f 3- | sum_by_keys >"$my_temp/points.txt"
grep "data goals " "$my_temp/scores.txt" | cut -d ' ' -f 3- | sum_by_keys >"$my_temp/goals.txt"

join "$my_temp/points.txt" "$my_temp/goals.txt" | sort -k 2,3 -r | as_markdown_table

rm -rf "$my_temp"

We need to ensure the variables are not lost when the subshell terminates. The only solution is to prolong the lifespan of the subshell.

sum_by_keys() {
    local key value
    local prev_key=""
    local sum=0

    sort | (
        while read -r key value; do
            if [ "$key" != "$prev_key" ]; then
                if [ -n "$prev_key" ]; then
                    echo "$prev_key $sum"
                fi
                prev_key="$key"
                sum=0
            fi
            sum=$(( sum + value ))
        done
        if [ -n "$prev_key" ]; then
            echo "$prev_key $sum"
        fi
    )
}

rsync in the script below uses --delete to actually create a mirror of local files on the remote server and removes extra files (i.e. those only on remove but not on local machine).

Feel free to delete the --delete flag for testing.


...

upload_web() {
    if [ -z "$rsync_target" ]; then
        msg "WARNING: \$rsync_target not set, not uploading."
        return 0
    fi
    rsync -avz --delete public/ "$rsync_target"
}

LOGGER=:
watch_for_changes=false
upload=false

while getopts "hvwu" opt; do
    case "$opt" in
        h)
            usage "$0"
            exit 0
            ;;
        v)
            LOGGER=msg
            ;;
        w)
            watch_for_changes=true
            ;;
        u)
            upload=true
            ;;
        *)
            usage "$0" >&2;
            exit 1
            ;;
    esac
done
shift $(( OPTIND - 1))


site_title="$( whoami )'s site"
rsync_target=""

mkdir -p public

if [ -f ssg.rc ]; then
    source ssg.rc
fi

generate_web
$upload && upload_web

if $watch_for_changes; then
    while true; do
        $LOGGER "Waiting for file change..."
        inotifywait -e modify src/* src static static/*
        generate_web
        $upload && upload_web
    done
fi

for img in *.png; do
    target="$(basename "$img" .png).jpg"
    magick "$img" "$target"
done

If you do not care for double extensions, the body of the for could be as simple as magick "$img" "$img.jpg".

for img in *.png; do
    target="$(basename "$img" .png).jpg"
    if [ -e "$target" ]; then
        echo "Refusing to overwrite $target." >&2
    else
        magick "$img" "$target"
    fi
done

Note that theoretically the file can be created between the check and the execution of magick. This would be solvable only by explicit support from magick that would need to call open with x attribute for exclusive creation. Our solution is on best-effort basis.

for filename in "$@"; do
    ext="$( echo "$filename" | rev | cut -d . -f 1 | rev )"
    base="$( echo "$filename" | rev | cut -d . -f 2- | rev )"
    target="${base}.thumb.${ext}"
    magick "$filename" -resize 800x600 "$target"
done

#!/bin/bash

set -ueo pipefail

get_points() {
    local goals_mine="$1"
    local goals_opponent="$2"
    if [ "$goals_mine" -eq "$goals_opponent" ]; then
        echo 1
    elif [ "$goals_mine" -gt "$goals_opponent" ]; then
        echo 3
    else
        echo 0
    fi
}

preprocess_scores() {
    local team_one team_two
    local goals_one goals_two

    while read -r team_one goals_one colon goals_two team_two; do
        if [ "$colon" != ":" ]; then
            echo "WARNING: igoring invalid line $team_one $goals_one $colon $goals_two $team_two" >&2
            continue
        fi
        echo "data points $team_one" "$( get_points "$goals_one" "$goals_two" )"
        echo "data goals $team_one $goals_one"
        echo "data points $team_two" "$( get_points "$goals_two" "$goals_one" )"
        echo "data goals $team_two $goals_two"
    done
}

sum_by_keys() {
    local key value
    local prev_key=""
    local sum=0

    sort | (
        while read -r key value; do
            if [ "$key" != "$prev_key" ]; then
                if [ -n "$prev_key" ]; then
                    echo "$prev_key $sum"
                fi
                prev_key="$key"
                sum=0
            fi
            sum=$(( sum + value ))
        done
        if [ -n "$prev_key" ]; then
            echo "$prev_key $sum"
        fi
    )
}

as_markdown_table() {
    echo
    echo '| Team | Points | Goals |'
    echo '| ---- | -----: | ----: |'
    while read -r team points goals; do
        echo '|' "$team" '|' "$points" '|' "$goals" '|'
    done
    echo
}

echo '---'
echo 'title: Scoring table'
echo '---'

echo '# Scoring table'

my_temp="$( mktemp -d )"
grep -v '#' "$1" | preprocess_scores >"$my_temp/scores.txt"
grep "data points " "$my_temp/scores.txt" | cut -d ' ' -f 3- | sum_by_keys >"$my_temp/points.txt"
grep "data goals " "$my_temp/scores.txt" | cut -d ' ' -f 3- | sum_by_keys >"$my_temp/goals.txt"

join "$my_temp/points.txt" "$my_temp/goals.txt" | sort -k 2,3 -r | as_markdown_table

rm -rf "$my_temp"

We need to ensure the variables are not lost when the subshell terminates. The only solution is to prolong the lifespan of the subshell.

sum_by_keys() {
    local key value
    local prev_key=""
    local sum=0

    sort | (
        while read -r key value; do
            if [ "$key" != "$prev_key" ]; then
                if [ -n "$prev_key" ]; then
                    echo "$prev_key $sum"
                fi
                prev_key="$key"
                sum=0
            fi
            sum=$(( sum + value ))
        done
        if [ -n "$prev_key" ]; then
            echo "$prev_key $sum"
        fi
    )
}

rsync in the script below uses --delete to actually create a mirror of local files on the remote server and removes extra files (i.e. those only on remove but not on local machine).

Feel free to delete the --delete flag for testing.


...

upload_web() {
    if [ -z "$rsync_target" ]; then
        msg "WARNING: \$rsync_target not set, not uploading."
        return 0
    fi
    rsync -avz --delete public/ "$rsync_target"
}

LOGGER=:
watch_for_changes=false
upload=false

while getopts "hvwu" opt; do
    case "$opt" in
        h)
            usage "$0"
            exit 0
            ;;
        v)
            LOGGER=msg
            ;;
        w)
            watch_for_changes=true
            ;;
        u)
            upload=true
            ;;
        *)
            usage "$0" >&2;
            exit 1
            ;;
    esac
done
shift $(( OPTIND - 1))


site_title="$( whoami )'s site"
rsync_target=""

mkdir -p public

if [ -f ssg.rc ]; then
    source ssg.rc
fi

generate_web
$upload && upload_web

if $watch_for_changes; then
    while true; do
        $LOGGER "Waiting for file change..."
        inotifywait -e modify src/* src static static/*
        generate_web
        $upload && upload_web
    done
fi

for img in *.png; do
    target="$(basename "$img" .png).jpg"
    magick "$img" "$target"
done

If you do not care for double extensions, the body of the for could be as simple as magick "$img" "$img.jpg".

for img in *.png; do
    target="$(basename "$img" .png).jpg"
    if [ -e "$target" ]; then
        echo "Refusing to overwrite $target." >&2
    else
        magick "$img" "$target"
    fi
done

Preflight checklist

Running example

Configuration loading (. and source)

Advancing the running example

Control flow in shell scripts

for loops

if and else

while loops

break and continue

Switch (a.k.a. case ... esac)

Advancing the running example

Redirection of bigger shell portions

Script parameters and getopts

getopts

Advancing the running example

The read command

Check you understand how read works

Bigger exercise I

Solution

Sidenote: how web pages are published

Web pages on lab machines

SCP & rsync

SCP issues

Rsync

Advancing the running example

Bigger exercise II

Solution

Improving the script further

Solution

Last improvement

Solution

Tasks to check your understanding

Hints

Examples

Default execution

Use of -d and -v

Learning outcomes and after class checklist

Conceptual knowledge

Practical skills

This page changelog

Configuration loading (`.` and `source`)

`for` loops

`if` and `else`

`while` loops

`break` and `continue`

Switch (a.k.a. `case ... esac`)

Script parameters and `getopts`

`getopts`

The `read` command

Check you understand how `read` works

Use of `-d` and `-v`