Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
- Reading network configuration
- Running example
-
Script modularization and configuration loading (
.
andsource
) - Control flow in shell scripts
- Script parameters and getopt
-
The
read
command - Bigger exercise I
- Sidenote: how web pages are published
- SCP & rsync
- Bigger exercise II
- Source code linting with ShellCheck
- More examples
- Before-class tasks (deadline: start of your lab, week April 3 - April 7)
- Post-class tasks (deadline: April 23)
- Learning outcomes
We will extend our knowledge about shell scripting in this lab. We will introduce control flow constructs and other bits to make our shell scripts more powerful. But we will also learn how to detect bugs in our scripts without even running them.
Again, we will use a running example for learning new constructs in shell, but we will also learn about tools that can help us detect bugs in our programs and we will have a look at some networking tools too.
Reading network configuration
Before diving into the main topic we will do a small detour to a practical thing that comes very useful. And that is how to view network configuration of your machine from the command-line.
For the following text we will assume your machine is connected to the Internet (this includes your virtualized installation of Linux).
The basic command for setting and reading network configuration is ip
.
Probably the most useful one for us at the moment is ip addr
.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s31f6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
link/ether 54:e1:ad:9f:db:36 brd ff:ff:ff:ff:ff:ff
3: wlp58s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 44:03:2c:7f:0f:76 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.105/24 brd 192.168.0.255 scope global dynamic noprefixroute wlp58s0
valid_lft 6209sec preferred_lft 6209sec
inet6 fe80::9ba5:fc4b:96e1:f281/64 scope link noprefixroute
valid_lft forever preferred_lft forever
8: vboxnet0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff
It lists four interfaces (lo
, enp0s31f6
, wlp58s0
and vboxnet0
) that are available on the machine.
Your list will differ as well as the naming.
The name signifies interface type.
lo
is the loopback device that will be always present. With loopback device, you can test network applications even without having a “real” connectivity.enp0s31f6
(often alsoeth*
) is a wired ethernet.wlp58s0
is a wireless adapter.vboxnet0
is a virtual network card used by VirtualBox when you create a virtual subnet for your virtual machines (you will probably not have this one there).
If you are connected via VPN, you might also see a tun0
interface.
The state of the interface (up and running or not) is at the same line as the adapter name.
The link/
denotes the MAC address of the adapter.
Lines with inet
specify the IP address assigned to this interface, including the network.
In this example, lo
has 127.0.0.1/8
(obviously),
enp0s31f6
is without an address (state DOWN
)
and wlp58s0
has address 192.168.0.105/24
(i.e., 192.168.0.105
with netmask 255.255.255.0
).
Your addresses will be slightly different, but typically you will see also a private address (behind a NAT), as you are probably connecting through a router to your ISP.
Running example
We will return to our example with web generation again.
The sources are again in our
examples repository
but as you can see in 08/web
, there are much more pages now and we have
split the files into multiple subdirectories.
There is content
with input files in Markdown, there is static
with
the CSS file and possibly other files that would be copied as-is to the web
server and there is also templates
with Pandoc templates for our pages.
We will now build a decent shell script that would be able to build this web but also copy it to a web server so it is publicly available.
We acknowledge that there are special tools for exactly that. They are called static site generators (or just SSG) and there is a huge amount of them available. But their task offers the right playground to show what shell is capable of :-).
We will start with the trivial builder that is basically a copy of the one from one of the previous labs.
We highly recommend that you copy the fragments from our repository to
your repository (feel free to use the submission one) and commit each version.
Use git
from the command line and use proper commit messages.
And if you create a separate issue for each part that you later close from a commit message, you will also practice good software engineering skills.
Script modularization and configuration loading (.
and source
)
So far, our scripts were always self-contained in a single file. No surprise: they were all quite short. But sometimes it makes sense to split to code into multiple files to allow sharing code across multiple scripts.
That is fine if the shared code is a standalone script. Imagine we create
a file called msg.sh
with the following content:
#!/bin/bash
echo "$( date '+%Y-%m-%d %H:%M:%S |' )" "$@" >&2
Then in other scripts we can call into msg.sh
to print logging messages.
...
./msg.sh "Starting computation..."
...
./msg.sh "Computation done."
That might work well but such short code would probably be better as a function and probably we might want to have multiple functions inside single file. Especially for one-liners, having a separate file for each “function” is somewhat impractical.
We will thus store the function in a separate file logging.sh
.
msg() {
echo "$( date '+%Y-%m-%d %H:%M:%S |' )" "$@" >&2
}
To use the functions in a different script, we need to instruct shell
to include our file using source
or .
(yes, standalone dot) construct.
# Both lines are equal, only one of them would be used in reality
. logging.sh
source logging.sh
...
msg "Starting computation"
Why simply calling the shell script would not work?Answer.
There are other uses of source
. We have seen that we can use it to load
shared code but it is also very often used to load configuration.
Imagine we want to allow the user define a directory where to store result files.
results=/home/intro/results/
Certainly we can try to parse this file using cut
and grep
, but it is
actually much easier to simply load this file using source
and access
$results
directly.
The truth is that we give the user much more than a plain configuration file,
but we do not need to think about a specific format and advanced users can
include more shell magic, while beginners would see it as a file in var=value
format.
This is actually how your shell is configured. Recall that we have updated
~/.bashrc
with EDITOR
variable. This file is also sourced when Bash is
started and it should by clear by now why it often contains the following
snippet:
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
If the given file exists (we will get to the proper syntax later in this lab),
we source it, i.e. import its content here. Therefore we import the global
Bash configuration stored in /etc
directory.
source
behaves as if the content of the included file was really pasted
instead of the source
line. Bash does not have any fancy namespace support
or similar.
Usually files that are expected to be included via source
do not have a
shebang specified and are usually not executable. That is mostly to emphasize
the fact that they are not standalone executables but rather “libraries”.
The same also applies to Python modules: you will usually see shebang in the
main program (and x
bit set) while actual modules (that you import
) are
often shebang-less and are rw-
only.
Advancing the running example
We will rework our example to a versatile solution where the user will provide a site configuration that our script will read.
We will create the following ssg.rc
inside our directory with the webpage.
# My site configuration
site_title="My site"
build_page "index.md"
build_page "rules.md"
build_page "alpha.md"
And we will modify our main script to look like this.
#!/bin/bash
set -ueo pipefail
msg() {
echo "$( date '+%Y-%m-%d %H:%M:%S | SSG |' )" "$@" >&2
}
get_version() {
git rev-parse --short HEAD 2>/dev/null || echo unknown
}
build_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".md" ).html"
msg "Generating $input_file => $output_file"
pandoc \
--template templates/main.html \
--metadata site_title="$site_title" \
--metadata page_version="$( get_version )" \
"src/$input_file" >"$output_file"
}
site_title="$( whoami )'s site"
mkdir -p public
source ssg.rc
cp -R static/* public/
What we have created? Our configuration file ssg.rc
actually contains
a trivial domain-specific language (DSL) that drives website generation.
Our main script provides the build_page
function that is called from the
main script.
Inside this function we compute the output filename
(try what basename input.md .md
does!) and run Pandoc.
Actually, it is a very straightforward piece of code but we managed to split configuration and actual generation into separate files and create a reusable tool. Compare how much work this would be in a different language. Just imagine how much work would it be to parse the configuration file…
Control flow in shell scripts
Before diving into control flow in shell scripts, let us
mention that multiple commands can be separated by ;
(the semicolon).
While in the shell scripts it is preferable to write one command per line,
interactive users often find it easier to have multiple commands on one line
(even if only to allow faster history browsing with the up arrow).
We will see semicolons at various places in the control flow structures, serving as a separator.
for
loops
For loops in the shell always iterate over a set of values provided when the loop starts.
The general format is as follows:
for VARIABLE in VAL1 VAL2 VAL3; do
body of the loop
done
Typical uses include iterating over a list of files, often generated by expanding wildcards.
Let us see an example that counts the number of digits in all *.txt
files:
for i in *.txt; do
echo -n "$i: "
tr -c -d '0-9' <"$i" | wc -c
done
Notice that the for
statement is given the variable name i
without a $
.
We also see that variable expansion can be used in redirection of stdin (or stdout).
When writing this in the shell, the prompt would change to plain >
(probably, depending on your configuration)
to signal that you are expected to enter the rest of the loop.
Squeezing the whole loop into one line is also possible (but useful only for fire-and-forget type of scripts):
for i in *.txt; do echo -n "$i: "; tr -c -d '0-9' <"$i" | wc -c; done
When we want to iterate over values with spaces, we need to quote them. Wildcards expansion is safe in this respect and would work regardless of spaces in the filename.
for i in one "two three"; do
echo "$i";
done
if
and else
The if
condition in the shell is a bit more tricky.
So the condition is actually never in the traditional format of a equals b as it is always the exit code that controls the flow.
The general syntax of if-then-else is this:
if command_to_control_the_condition; then
success
elif another_command_for_else_if_branch; then
another_success
else
the_else_branch_commands
fi
Note that if
has to be terminated by fi
and that elif
and else
branches are optional.
Simple conditions can be evaluated using the test
command that we already
know. See man test
to inspect what things can be tested.
Let us see how to use if
with test
to check whether we are inside a Git project:
if test -d .git; then
echo "We are in the root of a Git project"
fi
In fact, there exists a more elegant syntax: [
(left bracket) is a synonym for test
which does the same, except that it requires that the last argument is ]
.
Using this syntax, our example can look as follows:
if [ -d .git ]; then
echo "We are in the root of a Git project"
fi
Still, [
is just a regular command whose exit code determines what if
shall do.
while
loops
While loops have the following form:
while command_to_control_the_loop; do
commands_to_be_executed
done
Again, the condition is true if the command_to_control_the_loop
returns
with exit code 0.
The following example finds the first available name for a log file. Note that this code is not immune against races when executed concurrently. That is, it assumes it can be run multiple times, but never in more processes at the same time.
counter=1
while [ -f "/var/log/myprog/main.$counter.log" ]; do
counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/main.$counter.log"
echo "Will log into $logfile" >&2
To make the program race-resistant (i.e., against concurrent execution),
we would need to use mkdir
that fails when the directory already
exists (i.e., it is atomic enough to distinguish if we were successful
and are not just stealing someone else’s file).
Note that it uses exclamation mark !
to invert the program outcome.
counter=1
while ! mkdir "/var/log/myprog/log.$counter"; do
counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/log.$counter/main.log"
echo "Will log into $logfile" >&2
break
and continue
As in other languages, the break
command is available to terminate the currently
executing loop. You can use continue
as usual, too.
Switch (a.k.a. case ... esac
)
When we need to branch our program based on a variable value, shell
offers the case
construct.
It is somehow similar to the switch
construct in other languages, but it
has a bit of shell specifics mixed in.
The overall syntax is the following:
case value_to_branch_on in
option1) commands_for_option_one ;;
option2) commands_for_option_two ;;
*) the_default_branch ;;
esac
Notice that like with if
, we terminate with the same keyword reversed
and that there are two semicolons ;;
to terminate the commands for a particular
option.
The options can contain wildcards and |
to make the matching a bit more
flexible.
A simple example can look like this:
case "$EDITOR" in
mc|mcedit) echo 'Midnight Commander rocks' ;;
joe) echo 'Small but powerful' ;;
emacs|vi*) echo 'Wow :-)' ;;
*) echo "Someone really uses $EDITOR?" ;;
esac
Advancing the running example
Armed with the knowledge about control flow available, we can make our script for site generation even better.
We will remove the burden of specifying the list of files manually and find the files ourselves. Therefore, the user configuration file would be completely optional.
We will change our script as follows.
build_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".md" ).html"
msg "Generating $input_file => $output_file"
pandoc \
--template templates/main.html \
--metadata site_title="$site_title" \
--metadata page_version="$( get_version )" \
"$input_file" >"$output_file"
}
...
if [ -f ssg.rc ]; then
source ssg.rc
fi
for page in src/*.md; do
if ! [ -f "$page" ]; then
continue
fi
build_page "$page"
done
We have modified build_page
to not prepend src
when running pandoc
and we iterate over the Markdown files by ourselves.
The !
reverts the meaning of the exit code, i.e. behaves as boolean not
.
Why do we test for -f
inside the loop?Answer.
Redirection of bigger shell portions
The whole control structure (e.g, for
, if
, or while
with all the commands inside)
behaves as a single command. So you can apply redirection to the whole structure.
For example:
To illustrate this, we can transform the message to upper case like this.
if test -d .git; then
echo "We are in a root of a Git project"
else
echo "This is not a root of a Git project"
fi | tr 'a-z' 'A-Z'
Script parameters and getopt
Recall that when a shell script receives parameters, we can access them via special
variables $1
, $2
, $3
. There is also $@
for accessing all parameters
(recall that $@
must be quoted to work properly (the explanation is beyond
the scope of this course)).
The special variable $#
contains the number of arguments on the command-line
and $0
refers to the actual script name.
getopt
When our script needs one argument, accessing $1
directly is fine.
When you want to recognize options, parsing of arguments becomes more complicated.
Shell offers a getopt
command that is able to handle command-line parsing
for you.
We will not describe all the details of this command. Instead, we show an example that you can modify to your own needs.
Parsing command-line options is unfortunately not very much standardized across
different flavours of unix. Approach shown here works well on any recent Linux
and provides good user experience. But it is not portable to other similar
systems. There is getopts
(yes, the differences is only the extra s
at the end) that is much more portable but much more limited in its features.
The main arguments controlling getopt
behavior are -o
and -l
, that
contain description of the switches for our program.
Let us assume that we would want to handle options --verbose
to
make our script a bit more descriptive and --output
to specify an alternate
output file.
We would also like to handle short versions of these options: -o
and -v
.
With --version
, we want to print the version of our script.
And we should not forget about --help
too.
Non-option arguments will be interpreted as names of input files.
The specification of the getopt
switches is simple:
getopt -o "vho:" -l "verbose,version,help,output:"
Single-letter switches are specified after -o
, long options after -l
, and
a colon :
after the option denotes that it expects an argument.
After that, we add --
followed by the actual parameters. Let us try:
getopt -o "vho:" -l "verbose,version,help,output:" -- --help input1.txt --output=file.txt
# prints: --help --output 'file.txt' -- 'input1.txt'
getopt -o "vho:" -l "verbose,version,help,output:" -- --help --verbose -o out.txt input2.txt
# prints: --help --verbose -o 'out.txt' -- 'input2.txt'
...
As you can see, getopt
is able to parse the input and convert the parameters
to a unified form, moving the non-option arguments to the end of the list.
The following “magical” line (you do not need to understand it to use it)
resets $1
, $2
etc. to contain the values as parsed by getopt
.
eval set -- "$( getopt -o "vho:" -l "verbose,version,help,output:" -- "$@" )"
The actual processing is then quite straightforward:
#!/bin/bash
set -ueo pipefail
opts_short="vho:"
opts_long="verbose,version,help,output:"
# Check for bad usage first (notice the ||)
getopt -Q -o "$opts_short" -l "$opts_long" -- "$@" || exit 1
# Actually parse them (we are here only if they are correct)
eval set -- "$( getopt -o "$opts_short" -l "$opts_long" -- "$@" )"
be_quiet=true
output_file=/dev/stdout
while [ $# -gt 0 ]; do
case "$1" in
-h|--help)
echo "Usage: $0 ..."
exit 0
;;
-o|--output)
output_file="$2"
shift
;;
-v|--verbose)
be_quiet=false
;;
--version)
echo "$0 version 1.0.0"
exit 0
;;
--)
shift
break
;;
*)
echo "Unknown option $1" >&2
exit 1
;;
esac
shift
done
$be_quiet || echo "Starting the script"
for inp in "$@"; do
$be_quiet || echo "Processing $inp into $output_file ..."
done
Several parts of the script deserve explanation.
true
and false
are not boolean values, but they can be used as such.
Recall how we have used them in lab 06 (there really are /bin/true
and
/bin/false
).
exit
immediately terminates a shell script.
The optional parameter denotes the exit code of the script.
shift
is a special command that shifts the variables $1
, $2
, … by
one. After shift
, $3
becomes $2
, $2
becomes $1
and $1
is lost.
"$@"
is modified accordingly.
Thus, the whole loop processes all options until encountering --
that
separates options from other arguments. It doesn’t require the user to provide
the --
option. getopt
does that when unifying the parameters (check
the output above).
The for
loop therefore iterates over the other arguments.
Advancing the running example
We will modify our script to accept -w
or --watch
so that the program keeps
watching the src/*.md
for modifications and regenerates the web on each
such change.
We forgot to include the required inotifywait
program in our Linux image.
Execute the following command (it will ask for your password first) to
install this program into your Fedora.
sudo dnf install -y inotify-tools
We first include the change to use getopt
and then we add the support for
--watch
.
#!/bin/bash
set -ueo pipefail
msg() {
echo "$( date '+%Y-%m-%d %H:%M:%S | SSG |' )" "$@" >&2
}
get_version() {
git rev-parse --short HEAD 2>/dev/null || echo unknown
}
build_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".md" ).html"
$LOGGER "Generating $input_file => $output_file"
pandoc \
--template templates/main.html \
--metadata site_title="$site_title" \
--metadata page_version="$( get_version )" \
"$input_file" >"$output_file"
}
generate_web() {
for page in src/*.md; do
if ! [ -f "$page" ]; then
continue
fi
build_page "$page"
done
cp -R static/* public/
}
opts_short="vwh"
opts_long="verbose,version,help,watch"
getopt -Q -o "$opts_short" -l "$opts_long" -- "$@" || exit 1
eval set -- "$( getopt -o "$opts_short" -l "$opts_long" -- "$@" )"
LOGGER=:
watch_for_changes=false
while [ $# -gt 0 ]; do
case "$1" in
-h|--help)
echo "Usage: $0 ..."
exit 0
;;
-v|--verbose)
LOGGER=msg
;;
-w|--watch)
watch_for_changes=true
;;
--)
;;
*)
echo "Unknown option $1" >&2
exit 1
;;
esac
shift
done
site_title="$( whoami )'s site"
mkdir -p public
if [ -f ssg.rc ]; then
source ssg.rc
fi
generate_web
To actually support --watch
for will use inotifywait
which is a special
program that receives list of files and terminates when one of the files
is modified. Therefore, the script will effectively do nothing until file
is modified as inotifywait
will “block” its execution.
We will add the following to our script to run indefinitely, watching for
changes and rebuilding the web automatically. Hit Ctrl-C
to actually
terminate the execution when started with --watch
.
...
if [ -f ssg.rc ]; then
source ssg.rc
fi
generate_web
if $watch_for_changes; then
while true; do
$LOGGER "Waiting for file change..."
inotifywait -e modify src/* src static static/*
generate_web
done
fi
The read
command
So far our scripts either did not needed standard input at all or they passed it completely to other programs.
But it is possible to also read standard input line by line in shell if you need to process lines separately.
When a shell script needs to read from stdin into a variable, there is
the read
built-in command:
read FIRST_LINE <input.txt
echo "$FIRST_LINE"
Typically, read
is used in a while
loop to iterate over the whole input.
read
is also able to split the line to fields on white space and assign each
field in a different variable.
Considering we have an input of this format, the following loop computes the average of the numbers.
/dev/sdb 1008
/dev/sdb 1676
/dev/sdc 1505
/dev/sdc 4115
/dev/sdd 999
count=0
total=0
while read device duration; do
count=$(( count + 1 ))
total=$(( total + duration ))
done
echo "Average is about $(( total / count ))."
As you can guess from the above snippet, read
returns 0 as long as it is
able to read into the variables. Reaching the end of the file is announced by
a non-zero exit code.
read
can be sometimes too smart about certain inputs. For example, it interprets
backslashes. You can use read -r
to suppress this behavior.
Other notable parameters are -t
or -p
: use read --help
to see their
description.
If we want to read from a specific file (assuming its filename is stored
in variable $input
), we can also redirect input to the whole loop and
write the script like this:
while read device duration; do
count=$(( count + 1 ))
total=$(( total + duration ))
done <"$input"
That is actually quite common use for the while read
pattern.
Check you understand how read
works
Assume we have the following text file data.txt
.
ONE
TWO
We also have the following script reader.sh
:
#!/bin/bash
set -ueo pipefail
read -r data_one <data.txt
read -r data_two <data.txt
read -r stdin_one
read -r stdin_two
echo "data_one=${data_one}"
echo "data_two=${data_two}"
echo "stdin_one=${stdin_one}"
echo "stdin_two=${stdin_two}"
Select all true statements about output of the following invocation.
./reader.sh <data.txt
You need to have enabled JavaScript for the quiz to work.
Bigger exercise I
Imagine we have an input data with match results in the following format (team, goals shot, colon, goals shot by the other team, other team).
alpha 2 : 0 bravo
bravo 0 : 1 charlie
alpha 5 : 4 charlie
Write a shell script that prints a table with summarized results.
Assign 3 points for victory, 1 point for a tie. Your program does not need to handle the situation when two teams have the same amount of points.
Solution
We start with a function that receives two arguments – goals shot by each side – and prints the amount of points assigned.
get_points() {
local goals_mine="$1"
local goals_opponent="$2"
if [ "$goals_mine" -eq "$goals_opponent" ]; then
echo 1
elif [ "$goals_mine" -gt "$goals_opponent" ]; then
echo 3
else
echo 0
fi
}
Other function then computes points for each match.
preprocess_scores() {
local team_one team_two
local goals_one goals_two
while read -r team_one goals_one colon goals_two team_two; do
if [ "$colon" != ":" ]; then
echo "WARNING: ignoring invalid line $team_one $goals_one $colon $goals_two $team_two" >&2
continue
fi
echo "$team_one" "$( get_points "$goals_one" "$goals_two" )"
echo "$team_two" "$( get_points "$goals_two" "$goals_one" )"
done
}
These two functions together transform the input into the following:
alpha 3
bravo 0
bravo 0
charlie 3
alpha 3
charlie 0
On this, we can call our well-known group_sum.py
script or write it in
shell ourselves. For shell implementation, we will expect that the data
are already sorted by key to simplify the implementation.
sum_by_sorted_keys() {
local key value
local prev_key=""
local sum=0
while read -r key value; do
if [ "$key" != "$prev_key" ]; then
if [ -n "$prev_key" ]; then
echo "$prev_key $sum"
fi
prev_key="$key"
sum=0
fi
sum=$(( sum + value ))
done
if [ -n "$prev_key" ]; then
echo "$prev_key $sum"
fi
}
Why do we need to expect data sorted? Can’t we just sort them ourselves? Would the following modification (only this one line changed) work?
# replacing "while read -r key value; do"
sort | while read -r key value; do
Answer.
What change inside this function would work then? Answer.
Together these functions provide the building blocks to solve the whole puzzle:
preprocess_scores | sum_by_keys | sort -n -k 2 -r | column -t
It is a matter of opinion if this task would be better solved in a different programming language. It all depends on the context and on other requirements.
Shell usually excels in situations where we need to combine data from multiple files that are in some textual (preferably line-oriented) format.
The advantage of shell is in its interactivity. Even the functions can be defined interactively (i.e. not stored in any file first) and one can easily build the final pipeline incrementally, checking the output after adding each step.
Sidenote: how web pages are published
We will now perform a small detour to the area of (history of) website publishing. Publishing a website today generally means renting a webspace where you can either upload your HTML (or PHP) files or even renting a configured instance of your web application, such as Wordpress.
Traditionally you often also received a webspace as part of your unix
account on some shared machine. The setup was usually done in such way that
whatever appeared in your $HOME/public_html
was available under the
page example.com/~LOGIN
.
You might have encountered such pages, typically for university pages of individual professors.
With the advance of virtualization (and cloud) it became easier to not give users access as real users but insert another layer where user can manipulate only certain files without having shell access at all.
Web pages on lab machines
Our lab machines (e.g. u-pl*
ones) also offer this basic functionality.
SSH into one of these (recall the list from 05) and create a directory ~/WWW
.
Create a simple HTML file in WWW
(skip if you already uploaded some
files before).
echo '<html><head><title>Hello, World!</title><body><h1>Hello, World!</h1></body></html>' >index.html
Its content will be available as http://www.ms.mff.cuni.cz/~LOGIN/.
Note that you will need to add the proper permissions for the AFS
filesystem using the fs setacl
command.
fs setacl ~/WWW www read
fs setacl ~/. www l
SCP & rsync
In order to copy files between two Linux machines, we can use scp
.
Internally, it establishes a SSH connection and copies the files over it.
The syntax is very simple and follows the semantics of a plain cp
:
scp local_source_file.txt user@remote_machine:remote_destination_file.txt
scp user@remote_machine:remote_source_file.txt local_destination_file.txt
Rsync
A much more powerful tool for copying of files is rsync
.
Similarly to scp
, it runs over a SSH connection, but it has to be installed
at both sides (usually that is not a problem)
It can copy whole directory trees, handle symlinks, access rights, and other file attributes. It can also detect that some of the files are already present at the other side (either exactly or approximately) and transfer just the differences.
The syntax of a simple copy follows cp
and scp
, too:
rsync local_source_file.txt user@remote_machine:remote_destination_file.txt
rsync local_source_file.txt user@remote_machine:remote_destination_directory/
Advancing the running example
Use the rsync
manual page and extend the running example to include
--upload
that will upload the generated files to a remote server specified
in $rsync_target
.
You can use LOGIN@u-plNNNN.ms.mff.cuni.cz:WWW/nswi177
as a reasonable
target for uploading to the Rotunda machines.
Bigger exercise II
We have created the script to compute the scoring table. It would be nice to generate it during web generation.
Extend our running example with the following functionality.
Each *.bin
file in src/
would be treated as a script that will be executed
and its output stored to HTML file with the same name.
Try this on your own first before looking at our solution.
Recall that file extension is not important and .bin
is generic enough to
hide any (interpreted) programming language (as long as the script has proper
shebang). As a matter of fact, it will work for a compiled (C, Rust and similar)
programs too.
Solution
The change is relatively simple. We have also renamed build_page
to
build_markdown_page
for better clarity.
build_dynamic_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".bin" ).html"
$LOGGER "Generating $input_file => $output_file"
"$input_file" >"$output_file"
}
generate_web() {
local page
for page in src/*.md; do
if ! [ -f "$page" ]; then
continue
fi
build_markdown_page "$page"
done
local script
for script in src/*.bin; do
if ! [ -f "$script" -a -x "$script" ]; then
continue
fi
build_dynamic_page "$script"
done
cp -R static/* public/
}
And we can extend our table generation script into the following.
...
as_markdown_table() {
echo
echo '| Team | Points |'
echo '| ---- | -----: |'
while read team score; do
echo '|' "$team" '|' "$score" '|'
done
echo
}
. ssg.rc
(
echo '---'
echo 'title: Scoring table'
echo '---'
echo '# Scoring table'
preprocess_scores <scores.txt | sum_by_keys | sort -n -k 2 -r | as_markdown_table
) | pandoc \
--template templates/main.html \
--metadata site_title="$site_title" \
--metadata page_version="$( git rev-parse --short HEAD 2>/dev/null || echo unknown )"
Improving the script further
Does it look good?
Hardly. There is the repeated fragment of calling Pandoc. Our site generator is not perfect.
Let us improve it.
As a second version, extend it so that it distinguishes *.md.bin
and
*.html.bin
scripts and those with .html.bin
extension are expected to
generate HTML directly while .md.bin
will generate Markdown that we will
process ourselves.
Solution
...
pandoc_as_filter() {
pandoc \
--template templates/main.html \
--metadata site_title="$site_title" \
--metadata page_version="$( get_version )" \
"$@"
}
build_markdown_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".md" ).html"
$LOGGER "Generating $input_file => $output_file"
pandoc_as_filter "$input_file" >"$output_file"
}
build_dynamic_html_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".html.bin" ).html"
$LOGGER "Generating $input_file => $output_file"
"$input_file" >"$output_file"
}
build_dynamic_markdown_page() {
local input_file="$1"
local output_file="public/$( basename "$input_file" ".md.bin" ).html"
$LOGGER "Generating $input_file => $output_file"
"$input_file" | pandoc_as_filter >"$output_file"
}
generate_web() {
local page
for page in src/*.md; do
if ! [ -f "$page" ]; then
continue
fi
build_markdown_page "$page"
done
local script
for script in src/*.md.bin; do
if ! [ -f "$script" -a -x "$script" ]; then
continue
fi
build_dynamic_markdown_page "$script"
done
for script in src/*.html.bin; do
if ! [ -f "$script" -a -x "$script" ]; then
continue
fi
build_dynamic_html_page "$script"
done
cp -R static/* public/
}
...
And our table generation script table.md.bin
can be significantly simplified.
...
echo '---'
echo 'title: Scoring table'
echo '---'
echo '# Scoring table'
preprocess_scores <scores.txt | sum_by_keys | sort -n -k 2 -r | as_markdown_table
Last improvement
As a last exercise, extend our script to support building reusable scripts.
Before we run the *.bin
scripts, we should extend $PATH
with bin/
directory in our SSG directory.
Why do we want to do that? At this moment, the scoring table path is hard-coded
inside the script and the script is not usable for multiple tables
(imagine there are two groups running). If we would have the script in $PATH
,
we can store the scores as a script with the following shebang and thus
reuse the script for multiple tables.
#!/usr/bin/env score_table.sh
alpha 2 : 0 bravo
bravo 0 : 1 charlie
alpha 5 : 4 charlie
Certainly this is bordering on the abuse of shebang as we are turning a data-file into a script but there might be other use cases than our primitive SSG where such extension would make sense.
Here, take it as an exercise to refresh your memory about env
, shebangs
and $PATH
.
Solution
The changes are actually trivial.
build_dynamic_html_page() {
...
env PATH="$PATH:$PWD/bin" "$input_file" >"$output_file"
}
build_dynamic_markdown_page() {
...
env PATH="$PATH:$PWD/bin" "$input_file" | pandoc_as_filter >"$output_file"
}
And the bin/score_table.sh
would be modified on single line too.
grep -v '#' "$1" | preprocess_scores | sum_by_keys | sort -n -k 2 -r | as_markdown_table
We drop all lines containing #
which certainly drops the shebang and we
do not expect team name to contain hash sign (later on, we will see regular
expressions that would allow more precise filtering but this is fine for now).
Source code linting with ShellCheck
You have already written quite a lot shell scripts. It is thus time to introduce you to ShellCheck.
ShellCheck is a tool that checks your shell scripts for common issues. These issues are not syntax errors nor logical errors. The issues raised by ShellCheck are patterns that are well-known to cause unexpected behavior, degrade performance, or may be even hiding some nasty surprises.
One such example could be if your script contains the following snippet.
cat input.txt | cut -d: -f 3
Do you know what could be possible wrong?
Technically, this code is correct and by itself does not contain any bug.
However, the first cat
is redundant as it prints one file only: the code
can be reduced to the following form without change of functionality:
cut -d: -f 3 <input.txt
As you can see, this is essentially harmless.
But it might mean that you wanted to cat
multiple files or that cat
is
a left-over from a previous version.
Thus ShellCheck will warn you.
Another issue where ShellCheck helps is the following code:
dir_name=results/
if [ -d $dri_name ]; then
echo "$dir_name already exists."
fi
Here ShellCheck will actually detect the typo as dri_name
was not assigned
before.
Another trap awaits in the following code:
if [ -d ]; then
echo "$dir_name already exists."
fi
Of course, this is completely wrong. But guess what: test
(or [
) will
accept this and evaluate it as true
. This looks crazy, but actually test
with exactly one argument checks if the argument is non-empty.
We have test -n
these days, but once we did not have and we must keep backward
compatibility. See this page
for details.
So this is a correct piece of shell code, but it likely does not do what you wanted it to. Here comes ShellCheck to help you.
ShellCheck is able to warn you about hundreds of possible issues as can be seen on this page. Get into the habit to run it on your shell scripts regularly.
In our practice, ShellCheck seldom gives false positives, but it saved us many times.
Some of the graded tasks that you submit will be checked by ShellCheck, too (and we might penalize your solutions if your scripts are not ShellCheck-error-free).
Running Shellcheck
Running Shellcheck is really easy.
shellcheck ssg.sh
If you want to see also style hints, add -o all
or use -i
for more
selective checks.
shellcheck -o all ssg.sh
Exercise
Go back to your submitted shell scripts and run ShellCheck on them. Fix all the errors found or reason if leaving them in is alright.
Other languages
Similar tools exist for other languages.
Pylint is such tool for Python that can detect plenty of issues and is also highly customizable.
As an exercise, find such tooling for your own language and start using it regularly. Many tools also contain IDE extensions for better user experience.
The important take away
Start using ShellCheck, Pylint or any other tool for you favorite language.
It will not detect logical errors (at least not all of them), but it will surely detect so-called code smells: places in your code that often lead to errors, undefined behaviour, or similar issues.
This is doubly important if you are new to some language: the chances are that you misunderstood some feature rather than the tool being wrong.
More examples
Before-class tasks (deadline: start of your lab, week April 3 - April 7)
The following tasks must be solved and submitted before attending your lab. If you have lab on Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at 10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of the tasks there are automated tests that can help you check completeness of your solution (see here how to interpret their results).
08/barplot.sh
(60 points, group shell
)
Write a shell script for drawing a labeled barplot. The user would provide data in the following format:
12 First label
120 Second label
1 Third label
The script will print graph like this:
First label (12) | #
Second label (120) | #######
Third label (1) |
The script will accept input filename as the first argument and will adjust the width of the output to the current screen width. It will also align the labels as can be seen in the plot above.
You can safely assume that the input file will always exist and that it will be possible to read it multiple times. No other arguments need to be recognized.
Hints
Screen width is stored in the variable $COLUMNS
. Default to 80 if the variable
is not set. (You can assume it will be either empty (not set) or contain a
valid number).
The plot should be scaled to fill the whole width of the screen (i.e. scaled up or down).
You can squeze all consecutive spaces to one (even for labels), the first and second column are separated by space(s).
See what wc -L
does.
Note that the first tests use labels of the same length to simplify writing the first versions of the script.
Consider using printf
for printing the aligned lables.
The following ensures that bc
computes with fractional numbers but the result is
displayed as an integer (which is useful for further shell computations).
echo 'scale=0; (5 * 2.45) / 1' | bc -l
Examples
2 Alpha
4 Bravo
# COLUMNS=20
Alpha (2) | ####
Bravo (4) | ########
2 Alpha
4 Bravo
16 Delta
# COLUMNS=37
Alpha (2) | ###
Bravo (4) | ######
Delta (16) | ########################
08/remote.txt
(40 points, group net
)
This task is somewhat similar to 03/local.txt
: you will simply store
a specific string into 08/remote.txt
.
This string is again printed for you by the tests when you execute them on
the remote machine linux.ms.mff.cuni.cz
.
Post-class tasks (deadline: April 23)
We expect you will solve the following tasks after attending the labs and hearing feedback to your before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of the tasks there are automated tests that can help you check completeness of your solution (see here how to interpret their results).
08/dir.sh
(40 points, group shell
)
Create a script for listing file sizes.
The script would partially mimic behaviour of ls
: without arguments it
lists information about files in the current directory, when some arguments
are provided, they are treated as list of files to print details about.
Example run can look like this:
./08/dir.sh /dev/random 08/dir.sh 08
/dev/random <special>
08/dir.sh 312
08 <dir>
The second column will display file size for normal files, <dir>
for directories and <special>
for any other file.
File size can be read through the stat(1)
utility.
Nonexistent files should be announced as FILENAME: no such file or directory.
to stderr.
You can safely assume that you will have access to all files provided on the command-line.
You will probably find the column
utility useful, especially the following
invocation:
column --table --table-noheadings --table-columns FILENAME,SIZE --table-right SIZE
You can assume that there filenames will be reasonable (e.g. without spaces). To simplify things, we will not check exit code to be different when some of the files were not found.
08/ping.sh
(60 points, group net
)
ping
is a tool that sends ICMP packets and is often used as a basic test
that remote machine is up.
The truth is that a machine may decide to filter ICMP requests and not
respond to them at all (hence behave as being down) and vice versa,
machine responding to ping
might have all other services down.
But it is still a useful tool to check and debug basic connectivity issues.
Try running ping d3s.mff.cuni.cz
to see its output. The tool sends the
packets forever, terminate it with Ctrl-C
.
Your task is to create a tool that accepts the following arguments and
prints host status based on ping
(of course, you need to use ping
in
mode when it sends single request only and timeouts quickly).
-d
or--delimiter
that accepts string used to delimit the output columns, defaults to space-v
or--verbose
when it prints output ofping
to standard error output (by default the output ofping
is not printed at all)-w
to specify a different timeout than the default of one second
Normal parameters are DNS names or IP address to contact via ping
and
print their status.
The tool exit code denotes the amount of DOWN
machines (you can safely assume
that there will never be more than 126 of parameters and you do not have to
handle whether exit code is a signed or unsigned byte etc.).
We expect you will use getopt
to handle the command-line options.
Following examples show invocation with different parameters and expected output.
Default execution
08/ping.sh seznam.cz google.com google.comx
seznam.cz UP
google.com UP
google.comx DOWN
Use of -d
and --verbose
08/ping.sh seznam.cz -d : google.com --verbose
Note that the output mixes stdout and stderr.
PING seznam.cz (77.75.77.222) 56(84) bytes of data.
64 bytes from www.seznam.cz (77.75.77.222): icmp_seq=1 ttl=56 time=4.46 ms
--- seznam.cz ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 4.460/4.460/4.460/0.000 ms
seznam.cz:UP
PING google.com (142.251.36.78) 56(84) bytes of data.
64 bytes from prg03s10-in-f14.1e100.net (142.251.36.78): icmp_seq=1 ttl=114 time=3.64 ms
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 3.642/3.642/3.642/0.000 ms
google.com:UP
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what is a linter and style checker
-
explain what kind of issues can be detected by style checkers
-
explain concurrency issues that can occur when using temporary files
-
explain how program exit code is used to drive control flow in shell scripts
-
explain what commands are executed and how is evaluated a shell construct
if true; then echo "true"; fi
-
explain what considerations are important when deciding between use of shell vs Python
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
use temporary files securely in shell scripts
-
use control flow in shell scripts (
for
,while
,if
,case
) -
use
read
command -
use
getopt
for parsing command line arguments -
use
.
andsource
to load functions from different files -
use and interpret results of ShellCheck
-
use
scp
to copy individual files to (or from) a remote machine -
optional: use
rsync
to synchronize whole directories