Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Do not forget that the Before class reading is mandatory and there is a quiz that you are supposed to complete before coming to the labs.
Testing in Shell with BATS
In this section we will briefly describe BATS – the testing system that we use for automated tests that are run on every push to GitLab.
Generally, automated tests are the only reasonable way to ensure your software is not slowly rotting and decaying. Good tests will capture regressions, ensure bugs are not reappearing and often serve as documentation of the expected behavior.
The motto write tests first may often seem exaggerated and difficult, but it contains a lot of truth (several reasons are listed for example in this article).
BATS is a system written in shell that targets shell scripts or any programs with CLI interface. If you are familiar with other testing frameworks (e.g. Python Nose), you will find BATS probably very similar and easy to use.
Generally, every test case is one shell function and BATS offers several helper functions to structure your tests.
Let us look at the example from BATS homepage:
#!/usr/bin/env bats
@test "addition using bc" {
result="$(echo 2+2 | bc)"
[ "$result" -eq 4 ]
}
The @test "addition using bc"
is a test definition. Internally, BATS
translates this into a function
(indeed, you can imagine it as running simple sed
script over the
input and piping it to sh
)
and the body is a normal shell code.
BATS uses set -e
to terminate the code whenever any program
terminates with non-zero exit code.
Hence, if [
terminates with non-zero, the test fails.
Apart from this, there is nothing more about it in its basic form. Even with this basic knowledge, you can start using BATS to test your CLI programs.
Executing the tests is simple – make the file executable and run it.
You can choose from several outputs and with -f
you can filter which
tests to run.
Look at bats --help
or here for
more details.
Commented example
Let’s write a test for our factor.py
program. We will use the version that
reads the number from argv
.
#!/usr/bin/env bats
@test "Factorize 7" {
run ./factor.py 7
[ "$output" = "7" ]
}
@test "Factorize 17" {
run ./factor.py 17
[ "$output" = "17" ]
}
We use a special BATS command run
to execute our program that also captures
its stdout into a variable named $output
.
And then we simply verify the correctness.
Let’s add another test case:
@test "Factorize 8" {
run ./factor.py 8
[ "$output" = "2 2 2" ]
}
This will fail, but the error message is not very helpful.
(in test file factor.bats, line 15)
`[ "$output" = "2 2 2" ]' failed
This is because BATS is a very thin framework that basically checks only the exit codes and not much more.
But we can improve that.
#!/usr/bin/env bats
check_it() {
run ./factor.py "$1"
[ "$output" = "$2" ]
}
@test "Factorize 7" {
check_it 7 7
}
@test "Factorize 17" {
check_it 17 17
}
@test "Factorize 8" {
check_it 8 "2 2 2"
}
The error message is not much better but the test is much more readable this way.
Let’s improve the check_it
function a bit more.
check_it() {
run ./factor.py "$1"
if [ "$output" = "$2" ]; then
return 0
fi
echo >&2
echo "-- Actual output --" >&2
echo "$output" >&2
echo "-- Expected output --" >&2
echo "$2" >&2
return 1
}
Let’s run the test again:
(from function `check_it' in file factor.bats, line 13,
in test file factor.bats, line 25)
`check_it 8 "2 2 2"' failed
-- Actual output --
2
2
2
-- Expected output --
2 2 2
So basically our test was wrong all the time :-).
But this is actually usable for debugging our program.
We simply need to change our test a bit:
@test "Factorize 8" {
check_it 8 "2
2
2"
}
Yes, shell strings can span multiple lines just fine.
Adding more test cases is now a piece of cake. After this trivial update, our test suite will actually start making sense. And it will be useful to us.
Better assertions
BATS offers extensions for writing more readable tests.
Thus, instead of calling test
directly, we can use assert_equal
that
produces nicer message.
assert_equal "expected-value" "$actual"
NSWI177 tests
Our tests are packed with the assert extension plus several of our own.
All of them are part of the
repository that is downloaded
by run_tests.sh
in your repositories.
Feel free to execute the *.bats
file directly if you want to
run just certain test locally (i.e., not on GitLab).
grep
and sed
We have already mentioned these commands. The first one prints lines matching a given regular expression, the other one is able to change the lines according to the provided regular expression and its replacement.
Warning: both commands use a slightly different regex syntax.
Always check with the man page if you are not sure.
Generally, the biggest differences across tools/languages are in handling of
special characters for repetition or grouping (()
, {}
).
Exercises
Find all lines in /etc/passwd
that contain the digit 9
.
Accounts with /sbin/nologin
in /etc/passwd
are generally system accounts
not used by a human user.
Print the list of these accounts.
Solution.
Find all lines in /etc/passwd
that start with any of the letters A, B, C or D
(case-insensitive).
Solution.
Find all lines which contain an even number of characters. Solution.
Find all e-mail addresses. Assume that a valid e-mail address has a format <s1>@<s2>.<s3>
, where each sequence <sN>
is a non-empty string of characters from English alphabet and sequences <s1>
and <s2>
may also contain digits or a dot .
.
Solution.
Print all lines containing a word (in English alphabet) which begins with capital letter and all other letters are lowercase.
Test that the word TeX
will not be matched.
Solution.
Remove all trailing spaces and tabulators. Solution.
Put every word (non-empty sequence of characters of the English alphabet) in parentheses. Solution.
Replace “Name Surname” by “Surname, N.”. Solution.
Delete all empty lines. Hint. Solution.
Reformat input to contain each sentence on a separate line.
Assume that each sentence begins with a capital English letter and ends with .
, !
, or ?
;
there may be any number of spaces between sentences.
Hint.
Solution.
Bigger script example
We will describe the following script in a bit more detail to explain typical idioms you can encounter. We will also build the script incrementally to give you an idea how to approach building bigger scripts.
But we provide complete script as well for you to check that you have build it from the fragments correctly.
Task description
Write a script that prints basic system information (hardware platform, kernel version, number of CPUs, and RAM size). The user should be able to choose different output formats.
Solution.Solution description
The core of our script is simple.
echo "Hardware platform: $( uname -m )"
echo "Kernel version: $( uname -r )"
echo "CPU count: $( nproc )"
echo "RAM size: $( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"
This output is useful for a human reader but not for machine processing. So let’s add a version that prints the output as assignment to shell variables that can be later used. I.e., in the following format.
PLATFORM="x86_64"
KERNEL_VERSION="5.10.16-arch1-1"
Of course, duplicating the script to contain the following is not a nice solution.
if [ "$format" = "shell" ]; then
echo "PLATFORM=$( uname -m )"
...
else
echo "Hardware platform: $( uname -m )"
...
fi
But it is possible to convert between these two formats. Let’s convert our script like this:
if [ "$format" = "shell" ]; then
column_no=1
else
column_no=2
fi
(
echo "PLATFORM:Hardware platform:$( uname -m )"
echo "KERNEL_VERSION:Kernel version:$( uname -r )"
echo "CPU_COUNT:CPU count:$( nproc )"
echo "RAM_TOTAL:RAM size:$( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"
) | cut '-d:' -f $column_no,3-
Not perfect but we are getting there. Let’s hide the conversion into a separate shell function.
format_normal() {
cut '-d:' -f 2,3
}
format_shell() {
cut '-d:' -f 1,3 | sed 's#:\(.*\)#="\1"#'
}
Then the script would contain the following pipeline:
(
...
echo "RAM_TOTAL:RAM size:$( sed -n 's#^MemTotal:[ ]*\([0-9]*\) kB#\1#p' </proc/meminfo )"
) | "format_${format}"
In a sense, we have used a polymorphism in our script as the $format
variable
is technically a replacement of a virtual method table.
Adding JSON is a bit more complicated, but still doable.
Note that we down-case the variable names for nicer output.
The final sed
is used to replace the trailing comma
(JSON is a very strict format).
format_json() {
local varname
local varvalue
echo "{"
cut '-d:' -f 1,3 | sed 's#:# #' | while read -r varname varvalue; do
echo -n "$varname" | tr 'A-Z' 'a-z' | sed 's#.*# "&": #'
echo "\"$varvalue\"",
done | sed '$s#,$##'
echo "}"
}
We can certainly use getopt
to allow the user to select the output format but
we will opt for using a configuration file or setting an environment variable.
Then, the default format can be specified in "$HOME/.nswi177/sysinfo.rc"
or
the script can be launched with:
FORMATTER=json ./sysinfo.sh
Many programs offer you all three options where the script first loads the
settings from a configuration file, optionally overrides them with a environment
variable, and getopt
can override these.
The loading in the script then looks like this (we switched to capitals to emphasize that the variable comes from the user and thus will be exported).
if [ -r "$HOME/.nswi177/sysinfo.rc" ]; then
. "$HOME/.nswi177/sysinfo.rc"
fi
if [ -z "${FORMATTER:-}" ]; then
FORMATTER="${DEFAULT_FORMATTER:-normal}"
fi
Graded tasks (deadline: Apr 17)
IMPORTANT NOTE #1: the tasks below use intentionally simplified assumptions and target well-formatted input. If behaviour is not defined by the text, it is defined by the tests. Many cases are intentionally not defined and not tested – use common sense to define the behaviour in these cases.
IMPORTANT NOTE #2: do not forget to check your implementation by ShellCheck.
08/timeconv.sh
(20 points)
Write a shell script that converts time in AM/PM format to 24-hour format.
The script reads stdin and prints the result to stdout. No arguments will given and no arguments are expected to be recognized.
The script will find all occurences of hh:mmAM
or hh:mmPM
and replace
them with 24-hour format equivalent.
Example input/output may look like this:
The event starts at 03:25PM and is expected to end at 06:17PM.
Registration will be opened from 09:00AM until 06:00 PM.
The event starts at 15:25 and is expected to end at 18:17.
Registration will be opened from 09:00 until 06:00 PM.
We expect that you will use separate expressions for individual PM hours
as converting 03
to 15
, 04
to 16
etc. directly in sed
is not
very straightforward.
But feel free to generate parts of the script if you like. Hint:
echo "49 50 51 52 53 54" | sed -e "$( for i in 50 51 52; do echo "s:$i:$(( i - 50 )):g"; done )"
08/ip.sh
(20 points)
Download here an excerpt of Apache access log. Basically, it is a list of files a web server was asked for (e.g. user typed their URL or clicked a link). This log file contains sucessful requests but also entries where the request was not satisfied, i.e. the file was not present (a.k.a. HTTP 404).
Some of the entries are genuine typos but some of them actually reveal that bots were trying to break into a WordPress installation (that was never present on the server anyway).
Each line contains IP address of the originator of the request, date, requested URL (together with method), error code, response size and user agent (browser identification).
Your script should read such file on stdin and print IP address of the machine that tried to access non-existent pages (look for 404) the most. No arguments will given and no arguments are expected to be recognized.
Note that the tests operate on small fragments of the actual log file to simplify debugging. The link mentioned above serves as a demonstration of what you can actually encounter.
In a real-world setup, you would use a specialized tool for processing such
logs in a more automated and structured way.
However, grep
and sed
are perfect fits for a hobby server or if you
need to operate in a isolated environment.
Note that we have randomly modified the IP addresses to preserve anonymity.
By the way, for the full log, the most offending (anonymized) IP address
is 62.150.128.144
.
08/normalize.sh
(20 points)
Write a script that normalizes given path.
The script will accept single argument: path to normalize. You can safely assume that the argument will be always provided.
The script will normalize the provided path in the following way:
- references to current directory
./
will be removed as they are redundant - references to parent directory will be removed in such way not to change the actual meaning of the path (possibly repeatedly)
- the script will not convert relative to absolute path or vice versa
- the script will not check whether the file actually exists
Following examples illustrates the expected behaviour.
/etc/passwd
⇒/etc/passwd
a/b/././c/d
⇒a/b/c/d
/a/b/../c
⇒/a/c
/usr/../etc/
⇒/etc/
You can assume that components of the path will not contain new-lines or
other special characters such as :
, "
, '
or any kind of escape sequences.
Hint: sed ':x; s/abb/ba/; tx'
causes that s/abb/ba/
is called repeatedly
as long as substition is performed (:x
defines a label while tx
is a
conditional jump to that label if the previous substition changed the input).
Try with echo 'abbbb' | sed ...
.
08/markdown.sh
(40 points)
Write a simple Markdown convertor to HTML.
We again intentionally simplify the syntax a lot: a full-fledged parser would generally work better here but the point of this task is to excercise your knowledge of basic regular expressions.
The convertor must support the following styles:
Text with _emphasis of several words_.
will be rendered asText with <em>emphasis of several words</em>.
Text with *strong emphasis*.
will be rendered asText with <strong>strong emphasis</strong>.
- Any
>
,<
or&
must be converted to HTML entities. - Links in the form of
[http://...|link text]
will be converted to<a href="http://...">link text</a>
.- URL will always start with
http://
orhttps://
- Characters
<
,>
,&
and"
must be escaped inside the URL, i.e. they must be converted to their respective HTML entities.
- URL will always start with
The markdown shall ignore other common Markdown features such as paragraph detection or (ordered or unordered) list formatting.
We do not require and we will not test nesting of any of the above mentioned
markups. Therefore, it is not necessary to handle situations such as
some _emphasis *inside* another_ one or _special > characters_ etc.
You can also safely assume that formatting marks never span multiple lines. But there can be several of them on one line, but without overlaps.
The script reads stdin and prints the result to stdout. No arguments will given and no arguments are expected to be recognized.
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what a regular expression is
-
explain why linters and style checkers should be used for source code checks
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
create and use simple regular expressions to filter text with
grep
-
use
sed
to perform text substitution -
use
.
andsource
-
use and interpret results of Shellcheck
-
use and interpret results of Pylint
-
execute BATS-based tests
-
read BATS test
-
create simple BATS tests (optional)