Lab #7 | NSWI177 | D3S

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Please, see latest news in issue #332 (from June 24).

Preflight checklist
Pandoc
Running example
Using && and || (logical program composition)
Shell variables
Command substitution (a.k.a. capturing stdout into a variable)
Functions in shell
Subshells and variable scoping
Arithmetic in the shell
Source code linting with ShellCheck
Tasks to check your understanding
Learning outcomes and after class checklist
This page changelog

The goal of this lab is to expand our knowledge about shell scripting. We will introduce variables, command substitution and also see how to perform basic arithmetics in shell. But we will also learn how to detect issues in our scripts without even running them.

We will build this lab around a single example that we will incrementally develop, so that you learn the basic concepts on a practical example (obviously, there are specific tools that could be used instead, but we hope that this is better than a completely artificial example).

Our example will be built around building a small website from Markdown sources using Pandoc. We will describe Pandoc first and then describe our running example.

Preflight checklist

You always start your shell scripts with the right shebang (and executable bit).
You can read basic HTML.
You remember what is the purpose of (program) exit code and you know what the value 0 signifies.

Pandoc

Pandoc is a universal document converter that can convert between various formats, including HTML, Markdown, Docbook, LaTeX, Word, LibreOffice, or PDF.

Ensure that your installation of Pandoc is reasonably up-to-date (i.e., at least version 3.0.1 that was released about two years ago).

Basic usage

Please, clone our example repository (or git pull it if you still have the clone around).

Move to the 07/pandoc subdirectory.

cat example.md
pandoc example.md

As you can see, the output is a conversion of the Markdown file into HTML, though without an HTML header.

Markdown can be combined with HTML directly (useful if you want a more complicated HTML code: Pandoc will copy it as-is).

<p>This is an example page in Markdown.</p>
<p>Paragraphs as well as <strong>formatting</strong> are supported.</p>
<p>Inline <code>code</code> as well.</p>
<p class="alert alert-primary">
Third paragraph with <em>HTML code</em>.
</p>

If you add --standalone, it generates a full HTML page. Let’s try it (both invocations will have the same end result):

pandoc --standalone example.md >example.html
pandoc --standalone -o example.html example.md

Try opening example.html in your web browser, too.

As mentioned, Pandoc can create OpenDocument, too (the format used mostly in the OpenOffice/LibreOffice suite).

pandoc -o example.odt example.md

Note that we have omitted the --standalone here as it is not needed for anything else than HTML output. Check how the generated document looks like in LibreOffice/OpenOffice or you can even import it to some online office suites.

You should not commit example.odt into your repository as it can be generated. That is a general rule for any file that can be created automatically.

Side note about LibreOffice

Did you know that LibreOffice can be used from the command line, too? For example, we can ask LibreOffice to convert a document to PDF via the following command:

soffice --headless --convert-to pdf example.odt

The --headless prevents opening any GUI and --convert-to should be self-explanatory.

Combined with Pandoc, three commands are enough to create an HTML page and PDF output from a single source.

Pandoc templates

By default, Pandoc uses its own default template for the final HTML. But we can change this template, too.

Look inside template.html. When the template is expanded (or rendered), the parts between dollars would be replaced with the actual content.

Let’s try it with Pandoc.

pandoc --template template.html example.md >example.html

Check what the output looks like. Notice how $body$ and $title$ were replaced.

Further uses of Pandoc

Pandoc can be used even in more sophisticated ways, but the basic usage (including templates) is enough for our running example.

Pandoc supports conversion to and from LaTeX and plenty of other formats (try with --list-output-formats and --list-input-formats).

It can be also used as a universal Markdown parser with -t json (the Python call is not needed as it only reformats the output).

echo 'Hello, **world**!' | pandoc -t json | python3 -m json.tool

Running example

Please, move to the 07/web subdirectory to see what files we have (still in your local clone of the examples repository).

Our example is a trivial website where the user edits Markdown files and we use Pandoc and a custom template to produce the final HTML. At this moment the final stage of the example is to produce HTML files that would be later copied to a web server.

If you look at the files, there are some Markdown sources and build.sh that creates the web.

Run it to see what the final result looks like.

We will now talk more about shell scripting and use our build.sh script to demonstrate how we can improve it.

Using `&&` and `||` (logical program composition)

Recall what is a program exit (return) code before continuing with this section.

Execute the following commands:

ls / && echo "ls okay"
ls /nonexistent-filename || echo "ls failed"

This is an example of how exit codes can be used in practice. We can chain commands to be executed only when the previous one failed or terminated with zero exit code.

Understanding the following is essential, because together with pipes and standard I/O redirection, it forms the basic building blocks of shell scripts.

First of all, we will introduce a syntax for conditional chaining of program calls.

If we want to execute one command only if the previous one succeeded, we separate them with && (i.e., it is a logical and) On the other hand, if we want to execute the second command only if the first one fails (in other words, execute the first or the second), we separate them with ||.

The example with ls is quite artificial as ls is quite noisy when an error occurs. However, there is also a program called test that is silent and can be used to compare numbers or check file properties. For example, test -d ~/Desktop checks that ~/Desktop is a directory. If you run it, nothing will be printed. However, in company with && or ||, we can check its result.

test -d .git && echo "We are in a root of a Git project"
test -f README.md || echo "README.md missing"

This could be used as a very primitive branching in our scripts. In one of the next labs, we will introduce proper conditional statements, such as if and while.

Despite its silentness test is actually a very powerful command – it does not print anything but can be used to control other programs.

It is possible to chain commands, && and || are left-associative and they have the same priority.

Compare the following commands and how they behave when in a directory where the file README.md is or is not present:

test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"

Extending the running example

You probably noticed that we get the last commit id (that is what git rev-parse --short HEAD does) and use to create a footer for the web page (using the -A switch of Pandoc).

That works as long as we are part of a Git repository. Copy the whole web directory outside a Git repository and run build.sh again.

fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

We got an awful message and the web was not rebuilt.

If we change the line to the following, we ensure that the script can be executed outside of a Git project.

git rev-parse --short HEAD >>version.inc.html 2>/dev/null || echo "unknown" >>version.inc.html

Perhaps it is not perfect but at least the web can still be generated.

Shell variables

Variables in the shell are often called environment variables as they are (unlike variables of any other language) visible in other programs, too.

In this sense shell variables play two important roles. There are normal variables for shell scripts (i.e., variables with the same meaning as in other programming languages), but they can also be used to configure other programs.

We have already set the variable EDITOR that is used by Git (and other programs) to determine which editor to launch. That is, the variable controls behaviour of non-script programs.

Variables are assigned by the following construct:

MY_VARIABLE="value"

Note that there can be no spaces around = as otherwise shell would consider that as calling program MY_VARIABLE with arguments = and value.

The value is usually enclosed in quotes, but you can omit them if the value contains no spaces or other special characters. Generally, it is safer to always quote the value unless it looks like a C-style identifier.

To retrieve the value of the variable, prefix its name with the dollar sign $. Occurrences of $VARIABLE are expanded to the value of the variable. This is similar to how ~ is expanded to your home directory or wildcards are expanded to the actual file names. We will discuss expansion in more detail later.

Therefore we can print the value of a variable by the following:

echo "Variable MY_VARIABLE is $MY_VARIABLE."
# Prints Variable MY_VARIABLE is value.

Note that environment variables (i.e., those that are intended to be visible to other applications) are usually named in upper case. For purely shell variables (i.e., variables in your scripts that are not interesting to other applications) you may prefer lower case names. In both cases, the convention is usually for snake_case.

Unlike in other languages, shell variables are always strings. The shell has rudimentary support for arithmetics with integers encoded as strings of digits.

Bash also supports dictionaries and arrays. While they can be extremely useful, their usage often marks the boundary where using higher-level language might make more sense with respect to maintainability of the code. We will not cover them in this course at all.

Extending the running example

Currently our files are generated to the same directory as our source files. That makes copying the HTML files to a web server error-prone as we might forget some file or copy source that is not really needed.

Let us change the code to copy the files to a separate directory. We will create public/ directory for that and modify the main part of our script to the following:

pandoc --template template.html -A version.inc.html index.md >public/index.html
pandoc --template template.html -A version.inc.html rules.md >public/rules.html

We should also add the following at the end of the script so that public contains all the required files.

cp main.css public/

All is good. Except the path is hard-coded in several places in the script. That might complicate maintenance later on.

But we can easily use variable here to store the path and allow the user to change the target directory by modifying the path in one place.

html_dir="public"

...

pandoc --template template.html -A version.inc.html index.md >"$html_dir/index.html"
pandoc --template template.html -A version.inc.html rules.md >"$html_dir/rules.html"
cp main.css "$html_dir/"

It may seem as an extra work with no real benefit. But remember that programs are perhaps written once but read many times and any piece of information that better describes the intent of the code helps the reader.

Reading environment variables in Python (and `export`)

If we want to read a shell variable in Python, we can use os.getenv(). Note that this function has an optional argument (apart from the variable name) for a default value. Always specify the default or explicitly check for None – there is no guarantee that the variable has been set.

Note that you can also use os.environ.

By default, the shell does not make all variables available to Python (or any other application, for that matter). Only so-called exported variables are visible outside the shell. To make your variable visible, simply use one of the following (the first call assumes VAR was already set):

export VAR
export VAR="value"

It is also possible to export a variable only for a specific command using this shortcut:

VAR=value command_to_run ...

The variable is changed only for the duration of the command and returns to the original state afterwards.

Extending the running example

To demonstrate this on our running example, we will use a environment variable to modify the table caption that is generated by table.py.

    caption = os.getenv('TABLE_CAPTION', 'Points')
    print(f"""
<table>
  <caption>{caption}</caption>
  <thead>
    <tr>
      <th>Team</th>
      <th>Points</th>
    </tr>
  </thead>
  <tbody>""")

And then we can change it in the build.sh script:

TABLE_CAPTION="Scoring table" ./table.py <score.csv | pandoc

Special variables and `set` and `env`

If you want to see the list of all exported variables, you can use env that prints their names together with their values.

For the list of all variables, you can execute set (again, as with cd, it is a shell built-in).

Note that some built-ins do not have their own man page but are instead described in man bash – in the manual page of the shell we are using.

There are several variables worth knowing that are usually present in any shell on any Linux installation:

$HOME refers to your home directory. This is what ~ (the tilde) expands to.
$PWD contains your current working directory.
$USER contains the name of the current user (e.g., intro).
$RANDOM contains a random number, different in each expansion (try echo $RANDOM $RANDOM $RANDOM).

`$PATH`

We already mentioned the $PATH variable. Now, it is the right time to explain it in detail.

There are two basic ways how to specify a command to the shell. It can be given as a (relative or absolute) path (e.g., ./script.sh or 01/dayname.py or /bin/bash), or as a bare name without slashes (e.g., ls).

In the first case, the shell just takes the path (relative to the working directory if needed) and executes the particular file. Of course, the file has to have its executable bit set.

In the second case, the shell looks for for the program in all directories specified in the environment variable $PATH. If there are multiple matches, the first one is used. If there is none, the shell announces a failure.

The directories in $PATH are separated by colon : and typically, $PATH would contain at least /usr/local/bin, /usr/bin, and /bin. Find out how your $PATH looks like (simply echo it on your terminal).

The concept of a search path exists in other operating systems, too. Unfortunately, they often use different separators (such as ;) because using colon may not be easily possible.

However, installed programs are not always installed to the directories listed in it and thus you typically cannot run them from the command line easily.

Extra pro hint for Windows users: if you use Chocolatey, the programs will be in the $PATH and installing new software via choco will make the experience at least a bit less painful :-).

It is possible to add . (the current directory) to the $PATH. This would enable executing your script as just script.sh instead of ./script.sh. However, do not do that (even if it is a modus operandi on other systems). This thread explains several reasons why it is a bad idea.

In short: If you put it at the beginning of $PATH, you will likely execute random files in the current directory which just happen to be named like a standard command (this is a security problem!). If you put it at the end, you will likely execute standard commands you did not even know exist (e.g., test is a shell builtin).

However, it is very useful to create a subdirectory of your home directory (typically ~/bin), add it to the $PATH, and put all your useful scripts there.

`$PATH` and the shebang (why we need `env`)

The shebang requires the interpreter to be given as an absolute path. Sometimes, this can be inconvenient.

For this reason, Python scripts often use the /usr/bin/env python3 shebang. Here, env is a command that launches the program specified as the first argument (i.e., python3), looking for it in the $PATH.

Note that the script filename is appended as another argument, so everything works as one could expect.

This is something we have not mentioned earlier – the shebang can have one optional argument (but only one). It is added between the name of the interpreter and the name of the script.

Therefore, the env-style shebang causes the env program to run with parameters python3, path-to-the-script.py, and all other arguments. The env then finds python3 in $PATH, launches it and passes path-to-the-script.py as the first argument.

Note that this is the same env command we have used to print environment variables. Without any arguments, it prints the variables. With arguments, it runs the command.

Unix has a long history. Back in the 1970s, the primary purpose of env was to work with the environment. This included running a program within a modified environment, because the shell did not know about VAR=value command yet. Decades later, it was discovered that the side-effect of finding the program in the $PATH is much more useful :-).

We will see in a few weeks why it makes sense to search for Python in the $PATH instead of using /usr/bin/python3 directly.

The short version is that with env, you can modify the $PATH variable by some clever tricks and easily switch between different Python versions without any need to modify your code.

Script parameters

In Python, we access script parameters via sys.argv. In shell the situation is a bit more complicated and unfortunately it is one of the places where the design of the language/environment is somewhat lacking.

Shell uses special variables $1, $2, …, $9 to refer to individual arguments of the script; $0 contains the script name (there is no $10 variable and accessing these parameters is even more tricky; details later).

We will later see how we can parse arguments in the usual format of -d , -f ..., now we will use $i directly.

Shell also offers a special variable "$@" that can be used to pass all current parameters to another program. We have explicitly used the quotes here as without them the argument passing can break for arguments with spaces.

As a typical example of using "$@" we will create a simple wrapper for Pandoc that adds some common options but allows the user to be further customized.

#!/bin/bash

pandoc --self-contained --base-header-level=2 --strip-comments "$@"

Effectively, our call below would be translated like this.

./pandoc_wrapper.sh --standalone --template main.html input.md
# pandoc --self-contained --base-header-level=2 --strip-comments --standalone --template main.html input.md

Recall that if the user calls this script as ./pandoc_wrapper.sh <input.md things will work. The standard input is transparently sent into Pandoc. (Technically, the standard input is redirected for the wrapper script by shell and Pandoc inside the script inherits the file descriptor.)

Uninitialized values and similar caveats

If you try to use a variable that was not initialized, shell will pretend it contains an empty string. While this can be useful, it can be also a source of nasty surprises.

As we mentioned earlier, you should always start you shell scripts with set -u to warn you about such situations.

However, you sometimes need to read from a potentially uninitialized variable to check if it was initialized. For example, we might want to read $EDITOR to get the user’s preferred editor, but provide a sane default if the variable is not set. This is easily done using the ${VAR:-default_value} notation. If VAR was set, its value is used, otherwise default_value is used. This does not trigger the warning produced by set -u.

So we can write:

"${EDITOR:-mcedit}" file-to-edit.txt

Frequently, it is better to handle the defaults at the beginning of a script using this idiom:

EDITOR="${EDITOR:-mcedit}"

Later in the script, we may call the editor using just:

"$EDITOR" file-to-edit.txt

Note that it is also possible to write ${EDITOR} to explicitly delimit the variable name. This is useful if you want to print variable followed by a letter:

file_prefix=nswi177-
echo "Will store into ${file_prefix}log.txt"
echo "Will store into $file_prefixlog.txt"

In the second case, the variable $file_prefixlog would be expanded.

Extending the running example

We will now extend our running example with several echos so that the script can print what it is doing.

This is a trivial code that checks if the first argument is --verbose and if so, it sets the variable verbose to true.

#!/bin/bash

verbose=false
test "${1:-none}" = "--verbose" && verbose=true

...

The ${1:-none} syntax is either expanded to $1 (if it’s set) or to none (if not). The test then tests for equality of "$1" and "--verbose" (whose truth value depends on $1) or equality of "none" and "--verbose" (which is always false)

Such approach would not work very well if we would like to add more switches but it is good enough for us now.

And now we can add the logging messages.

...

$verbose && echo "Reading current version..." >&2
echo "<p>Version:" >version.inc.html
git rev-parse --short HEAD >>version.inc.html 2>/dev/null || echo "unknown" >>version.inc.html
echo "</p>" >>version.inc.html

$verbose && echo "Generating HTML ..." >&2
pandoc --template template.html -A version.inc.html index.md >"$html_dir/index.html"
pandoc --template template.html -A version.inc.html rules.md >"$html_dir/rules.html"

...

How the code above works? Hint. Answer.

Expansion of variables (and other such constructs)

We saw that the shell performs various types of expansion. It expands variables, wildcards, tildes, arithmetic expressions (see below), and many other things.

It is essential to understand how these expansions interact with each other. Instead of describing the formal process (which is quite complicated), we will show several examples to demonstrate typical situations.

We will call args.py from the previous labs to demonstrate what happens. (Of course you need to call it from the right directory.)

First, parameters are prepared (split) after variable expansion:

VAR="value with spaces"
args.py "$VAR"
args.py $VAR

Prepare files named one.sh and with space.sh for the following example:

VAR="*.sh"
args.py "$VAR"
args.py $VAR
args.py "\$VAR"
args.py '$VAR'

Run the above again but remove one.sh after assigning to VAR.

Tilde expansion (your home directory) is a bit more tricky:

VAR=~
echo "$VAR" '$VAR' $VAR
VAR="~"
echo "$VAR" '$VAR' $VAR

The take-away is that variable expansion is tricky. But it is always very easy to try it practically instead of remembering all the gotchas.

As a matter of fact, if you keep in mind that spaces and wildcards require special attention, you will be fine :-).

Extending the running example

We will do only a small change. We will replace the assignment to $html_dir with the following code.

html_dir="${html_dir:-public}"

What has changed? Answer.

We can now change the behaviour of the program by two means. Use can add --verbose or modify variable html_dir. That is definitely not very user friendly. We should allow our script to be executed with --html=DIR to specify the output directory. We will get back to this in one of the later labs.

At this moment, take it as an illustration of what options are available. The use of html_dir="${html_dir:-public}" is a very cheap way to add customizability of the script that can be sufficient in many situations.

Command substitution (a.k.a. capturing stdout into a variable)

Often, we need to store output from a command into a variable. This also includes storing content of a file (or part of it) in a variable.

A prominent example is the use of the mktemp(1) command. It solves the problem with secure creation of temporary files (remember that creating a fixed-name temporary file in /tmp or elsewhere is dangerous). The mktemp command creates a uniquely-named file (or a directory) and prints its name to stdout. Obviously, to use the file in further commands, we need to store its name in a variable.

Shell offers the following syntax for the so-called command substitution:

my_temp="$( mktemp -d )"

The command mktemp -d is run and its output is stored into the variable $my_temp.

Where is stderr stored? Answer.

How would you capture stderr then?

For example like this:

my_temp="$( mktemp -d )"
stdout="$( the_command 2>"$my_temp/err.txt" )"
stderr="$( cat "$my_temp/err.txt" )"

...
# At the end of the script
rm -rf "$my_temp"

Command substitution is also often used in logging or when transforming filenames (use man pages to learn what basename, and dirname do):

echo "I am running on $( uname -m ) architecture."

input_filename="/some/path/to/a/file.sh"
backup="$( dirname "$input_filename" )/$( basename "$input_filename" ).bak"
other_backup="$( dirname "$input_filename" )/$( basename "$input_filename" .sh ).bak.sh"

Extending the running example

We will use command substitution to simplify version information generation.

echo "<p>Version: $( git rev-parse --short HEAD 2>/dev/null || echo unknown )</p>" >version.inc.html

The change is rather small but it makes the generation of the version.inc.html a bit more compact. We will improve readability of this piece of code with functions in the next section.

Functions in shell

Recall from your programming classes that functions have one main purpose.

Functions allow the developer to introduce a higher level of abstraction by naming a certain block of code, thus better capturing the intent of a larger piece of code.

Functions also reduce code duplications (i.e., the DRY principle: don’t repeat yourself) but that is mostly a side effect of creating new abstractions.

Functions in shell are rather primitive in their definition as there is never any formal list of arguments or return type specification.

function_name() {
    commands
}

A function has the same interface as a full-fledged shell script. Arguments are passed as $1, $2, …. The result of the function is an integer with the same semantics as the exit code. Thus, the () is there just to mark that this is a function; it is not a list of arguments.

Please consult the following section on variable scoping for details about which variables are visible inside a function.

Extending the running example

We will add several new functions to our example to make it a bit more useful.

We will start with the logging (use the man pages to learn what date does):

log_message() {
    echo "$( date '+build.sh | %Y-%m-%d %H:%M:%S |' )" "$@" >&2
}

Run the inner call to date by itself to see what it does (the key is that + at the beginning which informs date that we want to use a custom format).

And now we will replace the logging calls like this.

logger=":"
test "${1:-none}" = "--verbose" && logger=log_message

$logger "Reading current version..."
...
$logger "Generating HTML ..."

There are two tricks here. We have replaced true/false with direct calls to our function. Hence we do not need to have the conditional execution with && at all.

The second trick is the use of colon :. That is basically a special builtin that does nothing. But it still behaves as a command. So by setting logger to : or to log_message, we execute one of the following:

: "Reading current version"
log_message "Reading current version"

The second one calls the logger, the first one does nothing.

Voilà, our logging is complete.

On your own, wrap the the version generation into a reasonable function.

Solution.

On your own, wrap the calls of Pandoc to a suitable function.

Solution.

Function return value

Calling return terminates function execution, the optional parameter of return is the exit code.

If you use exit within a function, it terminates the whole script.

The following is an example that checks whether given file has the right Bash shebang.

is_shell_script() {
    test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash' && return 0
    return 1
}

Because the exit code of the last program is also the exit code of the whole function, we can simplify the code to the following.

is_shell_script() {
    test "$( head -n 1 "$1" 2>/dev/null )" = '#!/bin/bash'
}

And such function can be used to control program flow:

is_shell_script "input.sh" || echo "Warning: shebang missing from input.sh" >&2

Note how good naming simplifies reading of the script above.

The same effect would be obtained by using the following code directly but using function allows us to capture the intent.

test "$( head -n 1 "input.sh" 2>/dev/null)" = '#!/bin/bash' || echo "Warning: shebang missing from input.sh" >&2

Local variables in functions

It is also a good idea to give a name to the function argument instead of referring to it by $1. You can assign it to a variable, but it is preferred to mark the variable as local (see details below):

is_shell_script() {
    local filename="$1"
    test "$( head -n 1 "$filename" 2>/dev/null)" = '#!/bin/bash' )"
}

The code is virtually the same. But by assigning $1 to a properly named variable we increase the readability: the reader immediately sees that the first argument is a filename.

Unfortunately, local is a non-standard extension (see discussion here why it was rejected). However, it is available in most of todays implementations, including Bash, ZSH or Dash (interesting notes can be found here).

Command precedence

You might notice that aliases, functions, built-ins, and regular commands are all called the same way. Therefore, the shell has a fixed order of precedence: aliases are checked first, then functions, then built-ins, and finally regular commands from $PATH. Regarding that, the built-ins command and builtin might be useful (e.g., for functions of the same name).

Take away

Despite many differences from functions in other programming languages, shell functions still represent the best way to structure your scripts.

A properly named function creates an abstraction and captures the intent of the script while also hiding implementation details.

Subshells and variable scoping

This section explains few rules and facts about scoping of variables and why some constructs could not work.

Shell variables are global by default. All variables are visible in all functions, modification done inside a function is visible in the rest of the script, and so on.

It is often convenient to declare variables within functions as local, which limits the scope of the variable to the function.

More precisely, the variable is visible in the function and all functions called from it. You can imagine that the previous value of the variable is saved when you execute the local and restored upon return from the function. This is unlike what most programming languages do.

When you run another program (including shell scripts and Python programs), it gets a copy of all exported variables. When the program modifies the variables, the changes stay inside the program, not affecting the original shell in any way. (This is similar to how working directory changes behave.)

However, when you use a pipe, it is equivalent to launching a new shell: variables set inside the pipeline are not propagated to the outer code. (The only exception is that the pipeline gets even non-exported variables.)

Enclosing part of our script in ( .. ) creates a so-called subshell which behaves as if another script was launched. Again, variables modified inside this subshell are not visible to the outer shell (and also changes of working directory are not visible outside of it).

Read and run the following code to understand the mentioned issues.

variable="one"

change_globally() {
    echo "change_globally():"
    echo "  variable=$variable"
    variable="two"
    echo "  variable=$variable"
}

change_locally() {
    echo "change_locally():"
    echo "  variable=$variable"
    local variable="three"
    echo "  variable=$variable"
}

echo "variable=$variable"
change_globally
echo "variable=$variable"
change_locally
echo "variable=$variable"

(
    variable="four"
    echo "variable=$variable"
)

echo "variable=$variable"

echo | variable="five"
echo "variable=$variable"

Arithmetic in the shell

The shell is capable of basic arithmetic operations. It is good enough for computing simple sums, counting the numbers of processed files etc. If you want to solve differential equations, please choose a different programming language :-).

Simple calculations are done inside a special $(( )) environment:

counter=1
counter=$(( counter + 1 ))

Note that variables shall not be prefixed with a $ inside this environment. As a matter of fact, in most cases things will work even with $ (e.g., $(( $counter + 1 ))) but it is not a good habit to get into.

Extending the running example

As a last change to our running example we will measure how long the execution was.

For that we will use date because with +%s it will print the amount of seconds since the start of the Epoch.

As a matter of fact, all unix systems internally measure time by counting seconds from 1st January of 1970 (Epoch start) and all displayed dates are recomputed from this.

Therefore following 3 lines around the whole script can give us number of seconds that were spent running our script (at the moment, the script should not take more than 1 second to complete but we might have more pages or more data eventually).

#!/bin/bash

wallclock_start="$( date +%s )"

...

wallclock_end="$( date +%s )"

$logger "Took $(( wallclock_end - wallclock_start )) seconds to generate."

Source code linting with ShellCheck

You have already written quite a lot shell scripts. It is thus time to introduce you to ShellCheck.

ShellCheck is a tool that checks your shell scripts for common issues. These issues are not syntax errors nor logical errors. The issues raised by ShellCheck are patterns that are well-known to cause unexpected behavior, degrade performance, or may be even hiding some nasty surprises.

One such example could be if your script contains the following snippet.

cat input.txt | cut -d: -f 3

Do you know what could be possible wrong?

Technically, this code is correct and by itself does not contain any bug. However, the first cat is redundant as it prints one file only: the code can be reduced to the following form without change of functionality:

cut -d: -f 3 <input.txt

As you can see, this is essentially harmless.

But it might mean that you wanted to cat multiple files or that cat is a left-over from a previous version. Thus ShellCheck will warn you.

Another issue where ShellCheck helps is the following code:

dir_name=results/
test -d $dri_name && echo "$dir_name already exists."

Here ShellCheck will actually detect the typo as dri_name was not assigned before.

Another trap awaits in the following code:

printf "dir_name=$dir_name\n"

This works fine unless $dir_name contains a percent sign. Probably will never happen.

So this is a correct piece of shell code, but it might break on a special value. Here comes ShellCheck to help you.

ShellCheck is able to warn you about hundreds of possible issues as can be seen on this page. Get into the habit to run it on your shell scripts regularly.

In our practice, ShellCheck seldom gives false positives, but it saved us many times.

We expect you will use ShellCheck regularly and also use it to check your solution of the exam shell task.

Running Shellcheck

Running Shellcheck is really easy.

shellcheck ssg.sh

If you want to see also style hints, add -o all or use -i for more selective checks.

shellcheck -o all ssg.sh

Exercise

Go back to your submitted shell scripts and run ShellCheck on them. Fix all the errors found or reason if leaving them in is alright.

Other languages

Similar tools exist for other languages.

Pylint is such tool for Python that can detect plenty of issues and is also highly customizable.

As an exercise, find such tooling for your own language and start using it regularly. Many tools also contain IDE extensions for better user experience.

The important take away

Start using ShellCheck, Pylint or any other tool for you favorite language.

It will not detect logical errors (at least not all of them), but it will surely detect so-called code smells: places in your code that often lead to errors, undefined behaviour, or similar issues.

This is doubly important if you are new to some language: the chances are that you misunderstood some feature rather than the tool being wrong.

Tasks to check your understanding

We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.

Return to the examples from Lab 04 and decide where adding a function to the implementation would improve the readability of the script.

Print information about the last commit, when the script is executed in a directory that is not part of any Git project, the script shall print only Not inside a Git repository. Hint. Solution.

The command getent passwd USERNAME prints the information about user account USERNAME (e.g., intro) on your machine. Write a command that prints information about user intro or a message This is not NSWI177 disk if the user does not exist. Solution.

The script will print to stdout contents of a file HEADER (in the working directory).

However, if a file .NO_HEADER exists in the current directory, nothing will be printed (even if HEADER exists).

If neither of the files exists, the program should print Error: HEADER not found. on standard error and terminate with exit status 1.

Otherwise, the script will terminate with success.

Use only && and || to control program flow, do not use if even if you happen to know these constructs in shell. It is okay to get information about file existence several times in the script, we will not modify the files while your script is running.

This example can be checked via GitLab automated tests. Store your solution as 07/override.sh and commit it (push it) to GitLab.

The script will print modification date (%Y) of a file given to it as its first argument.

The modification date should be printed in YYYY-MM-DD format, if the file does not exist (or there is some other issue in reading the modification time) the program should terminate with non-zero exit code.

Hint: stat, date.

This example can be checked via GitLab automated tests. Store your solution as 07/mod_date.sh and commit it (push it) to GitLab.

Create a shell script for performing simple backups.

The script takes a single filename as an argument and creates its copy in the directory given by an environment variable BACKUP_DIR (or ~/backup if not set) with names in the form YYYY-MM-DD_hh-mm-ss_~ABSOLUTE~PATH~TO~FILE.

The timestamp will refer to the current date. Replace / with ~ (tilde) in the absolute path of the original file (realpath may be useful.)

The script will print the path with the backup file to stdout.

Example use:

export BACKUP_DIR=~/my_backup
cd /home/intro/my_dir
../path/to/07/backup.sh a.zip

Example output:

/home/intro/my_backup/2025-03-08_10-01-23_~home~intro~my_dir~a.zip

You may use the script for fast temporal backups of your current work, and clear the backup dir time to time.

Note that we expect use of cp -R so that the script will work even for directories.

Automated tests always set $BACKUP_DIR to prevent polluting your home directory. We expect you will thoroughly test the script yourself for invocations where backup to $HOME happens.

The automated tests will convert the date to a special string before checking their presence. If you see a mismatch of filenames containing DATE-NORMALIZED in the filename, you probably created the date format incorrectly.

This example can be checked via GitLab automated tests. Store your solution as 07/backup.sh and commit it (push it) to GitLab.

Create a shell function to speed up the generation of the web from our running example.

At this moment, we generate the pages every time the script is executed.

Your task is to add function should_generate that takes one argument: filename of the source file (i.e., the .md file) and returns (i.e., set its exit code) 0 if we need to generate the .html file or 1 if there is no need to generate the file.

Whether we need to generate the file or not is determined simply by checking if the file exists and if .md is newer than .html (hence Markdown was modified after last HTML generation and we should rebuild). Both of these functions are offered by the test(1) command.

We assume that the main part of the program would be modified to the following:

should_generate index.md && run_pandoc index.md >"index.html"
should_generate rules.md && run_pandoc rules.md >"rules.html"

Your task is to store only the function should_generate into 07/should_generate.sh. Do not insert anything else there, we would provide it inside our tests.

Of course, for your testing, define this function inside build.sh and then copy it to should_generate.sh when you are done.

To simplify the assignment, we assume that both .md and .html files are in the same directory and you can safely assume that you will always receive only base filename, i.e. no need to handle subdir/index.md checking for subdir/index.html.

Hint: basename index.md .md.

This example can be checked via GitLab automated tests. Store your solution as 07/should_generate.sh and commit it (push it) to GitLab.

Learning outcomes and after class checklist

This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain what is an environment variable
explain how variable scoping works in shell
explain the difference between a normal and exported shell variable
explain how $PATH variable is used in shell
explain how changing $PATH affects program execution
explain how shell expansion and splitting into command-line arguments is performed
explain concurrency issues that can occur when using temporary files
explain what is a linter and style checker
explain what kind of issues can be detected by style checkers
optional: explain why current directory is usually not part of $PATH variable

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

use Pandoc to convert between various text formats
set (assign) and read environment variables
compute mathematical expressions directly in shell using $(( )) construct
use command substitution ($( ))
use composition operands && and || in shell scripts
create and use shell functions
use subshell to group multiple commands
use and interpret results of ShellCheck
use temporary files securely in shell scripts
optional: read environment variables in Python
optional: create custom templates for Pandoc

This page changelog

2025-04-07: Update ShellCheck examples.

By changing one line we have allowed the user to modify where the generated files are stored.

With this change, the user can call the program like this and ensure the files are stored into out_www directory.

html_dir=out_www ./build.sh

By changing one line we have allowed the user to modify where the generated files are stored.

With this change, the user can call the program like this and ensure the files are stored into out_www directory.

html_dir=out_www ./build.sh

Preflight checklist

Pandoc

Basic usage

Side note about LibreOffice

Pandoc templates

Further uses of Pandoc

Running example

Using && and || (logical program composition)

Extending the running example

Shell variables

Extending the running example

Reading environment variables in Python (and export)

Extending the running example

Special variables and set and env

$PATH

$PATH and the shebang (why we need env)

Script parameters

Uninitialized values and similar caveats

Extending the running example

Expansion of variables (and other such constructs)

Extending the running example

Command substitution (a.k.a. capturing stdout into a variable)

Extending the running example

Functions in shell

Extending the running example

Function return value

Local variables in functions

Command precedence

Take away

Subshells and variable scoping

Arithmetic in the shell

Extending the running example

Source code linting with ShellCheck

Running Shellcheck

Exercise

Other languages

The important take away

Tasks to check your understanding

Learning outcomes and after class checklist

Conceptual knowledge

Practical skills

This page changelog

Using `&&` and `||` (logical program composition)

Reading environment variables in Python (and `export`)

Special variables and `set` and `env`

`$PATH`

`$PATH` and the shebang (why we need `env`)