Shell scripting (5) | Labs | NSWI177

Information below is not for the current semester. The current semester can be found here.

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

This is material for both labs 05 and 06. There will be a separate before class quiz for 05 and 06 but graded tasks (scripting) will be merged but with twice the amount of points.

In this lab we will almost complete our tour of shell scripting: you will learn about conditions and loops in the shell, and about variables. And much more.

It is important to note that all of the topics below are usable both in the scripts (i.e., non-interactively) as well as when using terminal interactively. Especially the for loops over list of files are often used directly on the command-line without enclosing them in a full-fledged script.

Do not forget that the Before class reading is mandatory and there is a quiz that you are supposed to complete before coming to the labs.

This reading is before the fifth lab.

Before class reading

Shell variables

Variables in shell are often called environment variables as they are (unlike variables of any other language) visible in other programs too. We have already set variable EDITOR that is used by Git to determine which editor to launch.

Variables are assigned by the following construct:

MY_VARIABLE="value"

Note that there can be no spaces around = as otherwise shell would consider that as calling program MY_VARIABLE with arguments = and value.

The value is usually enclosed in quotes, you can omit them if the value contains no spaces or other special characters. Generally, it is safer to always quote the value.

To retrieve the value of the variable, prefix its name with the dollar sign $. Occurrences of $VARIABLE are expanded to the value of the variable. This is similar to how ~ is expanded to your home directory or wildcards are expanded to the actual file names. We will discuss expansion in more detail during this lab.

Therefore we can print the value of a variable by the following:

echo "Variable MY_VARIABLE is $MY_VARIABLE."
# Prints Variable MY_VARIABLE is value.

Note that environment variables (i.e., those that are intended to be visible to other applications) are usually named in upper case. For purely shell variables (i.e., variables in your scripts that are not interesting to other applications) you may prefer lower case names. In both cases, the convention is usually for snake_case.

Unlike in other languages, shell variables are always strings. The shell has rudimentary support for arithmetics with integers encoded as strings of digits.

Reading environment variables in Python (and `export`)

If we want to read a shell variable in Python, we can use os.getenv(). Note that this function has an optional argument (apart from the variable name) for a default value. Always specify the default or explicitly check for None – there is no guarantee that the variable has been set.

Note that you can also use os.environ.

By default, the shell does not make all variables available to Python (or any other application, for that matter). Only so-called exported variables are visible outside the shell. To make your variable visible, simply use one of the following (the first call assumes VAR was already set):

export VAR
export VAR="value"

It is also possible to export a command only for a specific command using this shortcut:

VAR=value command_to_run ...

The variable is changed only for the duration of the command and returns to the original state afterwards.

Arithmetic in the shell

The shell is capable of basic arithmetic operations. It is good enough for computing simple sums, counting the numbers of processed files etc. If you want to solve differential equations, please choose a different programming language.

Simple calculations are done inside a special $(( )) environment:

counter=1
counter=$(( counter + 1 ))

Note that variables shall not be prefixed with a $ inside this environment. As a matter of fact, in most cases things will work even with $ (e.g. $(( $counter + 1 ))) but it is not a good habit to get into.

Special variables and `set` and `env`

If you want to see the list of all exported variables, you can use env that prints their names together with their values.

For the list of all variables, you can execute set.

You should be wondering how is it possible that the set command sees variables which are not exported. The truth is that some commands (such as set or cd) are handled directly by the shell itself instead of calling an external program. These commands are called shell built-ins.

Note that some built-ins do not have their own man page but are instead described in man bash – in the manual page of the shell we are using.

There are several variables worth knowing that are usually present in any shell on any Linux installation:

$HOME refers to your home directory. This is what ~ (the tilde) expands to.
$PWD contains your current working directory.
$USER contains the name of the current user (e.g. intro).
$RANDOM contains a random number, different in each expansion (try echo $RANDOM $RANDOM $RANDOM).

$PATH

We already mentioned the $PATH variable. Now, it is the right time to explain it in detail.

There are two basic ways how to specify a command to the shell. It can be given as a (relative or absolute) path (e.g., ./script.sh or 01/factor.py or /bin/bash), or as a bare name without slashes (e.g., ls).

In the first case, the shell just takes the path (relative to the working directory) and executes the particular file. Of course, the file has to have its executable bit set.

In the second case, the shell looks for for the program in all directories specified in the environment variable $PATH. If there are multiple matches, the first one is used. If there is none, the shell announces a failure.

The directories in $PATH are separated by colon : and typically, $PATH would contain at least /usr/local/bin, /usr/bin, and /bin. Find out how your $PATH looks like (simply echo it on your terminal).

The concept of a search path exists in other operating systems, too. Unfortunately, they often use different separators (such as ;) because using colon may not be easily possible.

However, installed programs are not always installed to the directories listed in it and thus you typically cannot run them from the command line easily. Extra pro hint for Windows users: if you use Chocolatey, the programs will be in the $PATH and installing new software via choco will make the experience at least a bit less painful :-).

It is possible to add . (the current directory) to the $PATH. This would enable executing your script as just script.sh instead of ./script.sh. However, do not do that (even if it is a modus operandi on other systems). This thread explains several reasons why it is a bad idea.

In short: If you put it at the beginning of $PATH, you will likely execute random files in the current directory which just happen to be named like a standard command (this is a security problem!). If you put it at the end, you will likely execute standard commands you even did not know to exist (e.g., test is a shell builtin).

However, it is very useful to create a subdirectory of your home directory (typically ~/bin), add it to the $PATH, and put all your useful scripts there.

`$PATH` and the shebang (why we need `env`)

The shebang requires the interpreter to be given as an absolute path. Sometimes, this can be inconvenient.

For this reason, Python scripts often use the /usr/bin/env python3 shebang. Here, env is a command that launches the program specified as the first argument (i.e., python3), looking for it in the $PATH.

Note that the script filename is appended as another argument, so everything works as one could expect.

This is something we have not mentioned earlier – the shebang can have one optional argument (but only one). It is added between the name of the interpreter and the name of the script.

Therefore, the env-style shebang causes the env program to run with parameters python3, path-to-the-script.py, and all other arguments. The env then finds python3 in $PATH, launches it and passes path-to-the-script.py as the first argument.

Note that this is the same env command we have used to print environment variables. Without any arguments, it prints the variables. With arguments, it runs the command.

Unix has a long history. Back in the 1970s, the primary purpose of env was to work with the environment. This included running a program within a modified environment, because the shell did not know about VAR=value command yet. Decades later, it was discovered that the side-effect of finding the program in the $PATH is much more useful :-).

We will see in a few weeks why it makes sense to search for Python in the $PATH instead of using /usr/bin/python3 directly.

The short version is that with env, you can modify the $PATH variable by some clever tricks and easily switch between different Python versions without any need to modify your code.

Control flow in shell scripts

Let us discuss control flow structures in the shell: conditions and loops.

Before diving into that, let us mention that multiple commands can be separated by ; (the semicolon). While in the shell scripts it is preferable to write one command per line, interactive users often find it easier to have multiple commands on one line (even if only to allow faster history browsing with the up arrow).

We will see semicolons at various places in the control flow structures, serving as a separator.

`for` loops

For loops in the shell always iterate over a set of values provided when the loop starts.

The general format is as follows:

for VARIABLE in VAL1 VAL2 VAL3; do
    body of the loop
done

Typical uses include iterating over a list of files, often generated by expanding wildcards.

Let us see an example that counts the number of digits in all *.txt files:

for i in *.txt; do
    echo -n "$i: "
    tr -c -d '0-9' <"$i" | wc -c
done

Notice that the for statement is given the variable name i without a $. We also see that variable expansion can be used in redirection of stdin (or stdout).

When writing this in the shell, the prompt would change to plain > (probably, depending on your configuration) to signal that you are expected to enter the rest of the loop.

Squeezing the whole loop into one line is also possible (but useful only for fire-and-forget type of scripts):

for i in *.txt; do echo -n "$i: "; tr -c -d '0-9' <"$i" | wc -c; done

`if` and `else`

The if condition in the shell is a bit more tricky. The essential thing to remember is that the condition is always a command to be executed and its outcome (i.e., the exit code) determines the result. So the condition is actually never in the traditional format of a equals b as it is always the exit code that controls the flow.

The general syntax of the condition is this:

if command_to_control_the_condition; then
    success
elif another_command_for_else_if_branch; then
    another_success
else
    the_else_branch_commands
fi

Note that if has to be terminated by fi and that elif and else branches are optional.

Simple conditions can be evaluated using the test command. For example, test -d NAME returns exit code 0 if a directory called NAME exists; otherwise it returns 1. It can test many other things, for example compare strings or numbers – see man test for more.

Let us see how to use if with test to check whether we are inside a Git project:

if test -d .git; then
    echo "We are in the root of a Git project"
fi

In fact, there exists a more elegant syntax: [ (left bracket) is a synonym for test which does the same, except that it requires that the last argument is ]. Using this syntax, our example can look as follows:

if [ -d .git ]; then
    echo "We are in the root of a Git project"
fi

Still, [ is just a regular command whose exit code determines what if shall do.

By the way, look into /usr/bin to see that the application file is really named [. But Bash (our shell) also implements [ as a builtin, so it is a little bit faster than executing an external program.

You can also encounter the following snippet:

if [[ -d .git ]]; then
    echo "We are in the root of a Git project"
fi

This [[ ... ]] is a different construct, closely related to the $(( ... )) syntax for arithmetic expressions. The condition is evaluated by Bash itself. This syntax is a little bit more powerful, but it is limited to recent versions of Bash, so it is unlikely work in other shells.

We will be using the traditional variant with [ only.

`while` loops

While loops have the following form:

while command_to_control_the_loop; do
    commands_to_be_executed
done

Again, the condition is true if the command_to_control_the_loop returns with exit code 0.

The following example finds the first available name for a log file. Note that this code is not immune against races when executed concurrently. That is, it assumes it can be run multiple times, but never in more processes at the same time.

counter=1
while [ -f "/var/log/myprog/main.$counter.log" ]; do
    counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/main.$counter.log"
echo "Will log into $logfile" >&2

To make the program race-resistant (i.e., against concurrent execution), we would need to use mkdir that fails when the directory already exists (i.e., it is atomic enough to distinguish if we were successful and are not just stealing someone else’s file).

Note that it uses exclamation mark ! to invert the program outcome.

counter=1
while ! mkdir "/var/log/myprog/log.$counter"; do
    counter=$(( counter + 1 ))
done
logfile="/var/log/myprog/log.$counter/main.log"
echo "Will log into $logfile" >&2

Note that there is also do ... until loop in shell. If you need such loop, please consult the manual for details.

`break` and `continue`

As in other languages, the break command is available to terminate the currently executing loop. You can use continue as usual, too.

Switch (a.k.a. `case ... esac`)

When we need to branch our program based on a variable value, shell offers the case construct. It is somehow similar to the switch construct in other languages, but it has a bit of shell specifics mixed in.

The overall syntax is the following:

case value_to_branch_on in
    option1) commands_for_option_one ;;
    option2) commands_for_option_two ;;
    *) the_default_branch ;;
esac

Notice that like with if, we terminate with the same keyword reversed and that there are two semicolons ;; to terminate the commands for a particular option.

A simple example can look like this:

case "$EDITOR" in
    mcedit) echo 'Midnight Commander rocks' ;;
    joe) echo 'Small but powerfull' ;;
    vim|emacs) echo 'Wow :-)' ;;
    *) echo "Someone really uses $EDITOR?" ;;
esac

Before class quiz

The quiz file is available in the 05 folder of this GitLab project.

Copy the right language mutation into your project as 05/before.md (i.e., you will need to rename the file).

The questions and answers are part of that file, fill in the answers in between the **[A1]** and **[/A1]** markers.

The before-05 pipeline on GitLab will test that your answers are in the correct format. It does not check for actual correctness (for obvious reasons).

Submit your before-class quiz before start of lab 05.

More about variables ⚓

You have seen that for simple cases, the following is sufficient to create and use a variable in your scripts.

output_file="out.txt"
echo "Writing to $output_file." >&2
head -n 1 /etc/passwd >"$output_file"

Uninitialized values and similar caveats ⚓

If you try to use a variable that was not initialized, shell will pretend it contains an empty string. While this can be useful, it can be also a source of nasty surprises.

As we mentioned earlier, you should always start you shell scripts with set -u to warn you about such situations.

However, you sometimes need to read from a potentially uninitialized variable to check if it was initialized. For example, we might want to read $EDITOR to get the user’s preferred editor, but provide a sane default if the variable is not set. This is easily done using the ${VAR:-default_value} notation. If VAR was set, its value is used, otherwise default_value is used. This does not trigger the warning produced by set -u.

So we can write:

"${EDITOR:-mcedit}" file-to-edit.txt

Frequently, it is better to handle the defaults at the beginning of a script using this idiom:

EDITOR="${EDITOR:-mcedit}"

Later in the script, we may call the editor using just:

"$EDITOR" file-to-edit.txt

Note that it is also possible to write ${EDITOR} to explicitly delimit the variable name. This is useful if you want to print variable followed by a letter:

file_prefix=nswi177-
echo "Will store into ${file_prefix}log.txt"
echo "Will store into $file_prefixlog.txt"

Expansion of variables (and other such constructs) ⚓

We saw that the shell performs various types of expansion. It expands variables, wildcards, tildes, arithmetic expressions, and many other things.

It is essential to understand how these expansions interact with each other. Instead of describing the formal process (which is quite complicated), we will show several examples to demonstrate typical situations.

We will call args.py from the previous labs to demonstrate what happens. (Of course you need to call it from the right directory.)

First, parameters are prepared (split) after variable expansion:

VAR="value with spaces"
args.py "$VAR"
args.py $VAR

Prepare files named one.sh and with space.sh for the following example:

VAR="*.sh"
args.py "$VAR"
args.py $VAR
args.py "\$VAR"
args.py '$VAR'

Run the above again but remove one.sh after assigning to VAR.

Tilde expansion (your home directory) is a bit more tricky:

VAR=~
echo "$VAR" '$VAR' $VAR
VAR="~"
echo "$VAR" '$VAR' $VAR

The important take-away is that variable expansion is tricky. But it is always very easy to try it practically instead of remembering all the gotchas. As a matter of fact, if you keep in mind that spaces and wildcards require special attention, you will be fine :-).

Command substitution (a.k.a. capturing stdout into a variable) ⚓

Often, we need to store output from a command into a variable. This also includes storing content of a file (or part of it) in a variable.

A prominent example is the use of the mktemp(1) command. It solves the problem with secure creation of temporary files (remember that creating a fixed-name temporary file in /tmp is dangerous). The mktemp command creates a uniquely-named file (or a directory) and prints its name to stdout. Obviously, to use the file in further commands, we need to store its name in a variable.

Shell offers the following syntax for the so-called command substitution:

my_temp="$( mktemp -d )"

The command mktemp -d is run and its output is stored into the variable $my_temp.

Where is stderr stored? Answer.

How would you capture stderr then?

For example like this:

my_temp="$( mktemp -d )"
stdout="$( the_command 2>"$my_temp/err.txt" )"
stderr="$( cat "$my_temp/err.txt" )"

Command substitution is also often used in logging or when transforming filenames (use man pages to learn what date, basename, and dirname do):

echo "I am running on $( uname -m ) architecture."

input_filename="/some/path/to/a/file.sh"
backup="$( dirname "$input_filename" )/$( basename "$input_filename" ).bak"
other_backup="$( dirname "$input_filename" )/$( basename "$input_filename" .sh ).bak.sh"

Redirection of bigger shell portions ⚓

The whole control structure (e.g, for, if, or while with all the commands inside) behaves as a single command. So you can apply redirection to the whole structure. For example:

if test -d .git; then
    echo "We are in a root of a Git project"
else
    echo "This is not a root of a Git project"
fi | tr 'a-z' 'A-Z'

The `read` command ⚓

When a shell script needs to read from stdin into a variable, there is the read built-in command:

read FIRST_LINE <input.txt
echo "$FIRST_LINE"

Typically, read is used in a while loop to iterate over the whole input. read is also able to split the line to fields on white space and assign each field in a different variable.

Considering we have an input of this format, the following loop computes the average of the numbers.

/dev/sdb 1008
/dev/sdb 1676
/dev/sdc 1505
/dev/sdc 4115
/dev/sdd 999

count=0
total=0
while read device duration; do
    count=$(( count + 1 ))
    total=$(( total + duration ))
done
echo "Average is about $(( total / count ))."

As you can guess from the above snippet, read returns 0 as long as it is able to read into the variables. Reaching the end of the file is announced by a non-zero exit code.

read can be sometimes too smart about certain inputs. For example, it interprets backslashes. You can use read -r to suppress this behavior.

Other notable parameters are -t or -p: use read --help to see their description.

Script parameters and getopt ⚓

When a shell script receives parameters, we can access them via special variables $1, $2, $3, …

Check with the following script:

echo "$#"
echo "${0}"
echo "${1:-parameter one not set}"
echo "${2:-parameter two not set}"
echo "${3:-parameter three not set}"
echo "${4:-parameter four not set}"
echo "${5:-parameter five not set}"

and run as

./script.sh
./script.sh one
./script.sh one two
./script.sh one two three
./script.sh one two three four
./script.sh one two three four five
./script.sh one two three four five six

If you want to access all parameters, there is a special variable $@ for that. Try adding args.py "$@" to the script above and re-execute.

Note that $@ must be quoted to work properly (the explanation is beyond the scope of this course). The special variable $# contains the number of arguments on the command-line and $0 refers to the actual script name (like sys.argv[0]).

`getopt` ⚓

When our script needs one argument, accessing $1 directly is fine. When you want to recognize options, it parsing of arguments becomes more complicated. Shell offers a getopt command that is able to handle command-line parsing for you.

We will not describe all the details of this command. Instead, we show an example that you can modify to your own needs.

The main arguments controlling getopt behavior are -o and -l, that contain description of the switches for our program.

Let us assume that we would want to handle options --verbose to make our script a bit more descriptive and --output to specify an alternate output file. We would also like to handle short versions of these options: -o and -v. With --version, we want to print the version of our script. And we should not forget about --help too. Non-option arguments will be interpreted as names of input files.

The specification of the getopt switches is simple:

getopt -o "vho:" -l "verbose,version,help,output:"

Single-letter switches are specified after -o, long options after -l, and a colon : after the option denotes that it expects an argument.

After that, we add -- followed by the actual parameters. Let us try:

getopt -o "vho:" -l "verbose,version,help,output:" -- --help input1.txt --output=file.txt
getopt -o "vho:" -l "verbose,version,help,output:" -- --help --verbose -o out.txt input2.txt
...

As you can see, getopt is able to parse the input and convert the parameters to a unified form, moving the non-option arguments to the end of the list.

The following “magical” line (you do not need to understand it to use it) resets $1, $2 etc. to contain the values as parsed by getopt.

eval set -- "$( getopt -o "vho:" -l "verbose,version,help,output:" -- "$@" )"

The actual processing is then quite straightforward:

#!/bin/bash

set -ueo pipefail

opts_short="vho:"
opts_long="verbose,version,help,output:"

# Check for bad usage first (notice the ||)
getopt -Q -o "$opts_short" -l "$opts_long" -- "$@" || exit 1

# Actually parse them (we are here only if they are correct)
eval set -- "$( getopt -o "$opts_short" -l "$opts_long" -- "$@" )"

be_quiet=true
output_file=/dev/stdout

while [ $# -gt 0 ]; do
    case "$1" in
        -h|--help)
            echo "Usage: $0 ..."
            exit 0
            ;;
        -o|--output)
            output_file="$2"
            shift
            ;;
        -v|--verbose)
            be_quiet=false
            ;;
        --)
            shift
            break
            ;;
        *)
            echo "Unknown option $1" >&2
            exit 1
            ;;
    esac
    shift
done

$be_quiet || echo "Starting the script"

for inp in "$@"; do
    $be_quiet || echo "Processing $inp into $output_file ..."
done

Several parts of the script deserve explanation.

true and false are not boolean values, but they can be used as such. Actually, they are very simple programs that simply terminate with the proper exit code (0 and 1, respectively). Note how we use them to drive the logging. (Incidentally, $be_verbose && echo "Message" would not work. Do you see why?)

exit immediately terminates a shell script. The optional parameter denotes the exit code of the script.

shift is a special command that shifts the variables $1, $2, … by one. After shift, $3 becomes $2, $2 becomes $1 and $1 is lost. "$@" is modified accordingly. Thus, the whole loop processes all options until encountering -- that separates options from other arguments. The for loop therefore iterates over the other arguments.

Functions in shell ⚓

You can also define functions in the shell:

function_name() {
    commands
}

A function has the same interface as a full-fledged shell script. Arguments are passed as $1, $2, …. The result of the function is an integer with the same semantics as the exit code. Thus, the () is there just to mark that this is a function; it is not a list of arguments.

Please consult the following section on variable scoping for details about which variables are visible inside a function.

A simple logging function could look like this:

msg() {
    echo "$( date '+%Y-%m-%d %H:%M:%S |' )" "$@" >&2
}

It prints the current date followed by the actual message, all to stderr.

As another example consider the following function:

get_load() {
    cut -d ' ' -f "$1" </proc/loadavg
}

load_curr="$( get_load 1 )"
load_prev="$( get_load 2 )"

Note how the function’s stdout is captured into a variable.

Calling return terminates function execution, the optional parameter of return is the exit code. (If you use exit within a function, it terminates the whole script.)

is_shell_script() {
    case "$( head -n 1 "$1" 2>/dev/null )" in
        \#!/bin/sh|\#!/bin/bash)
            return 0
            ;;
        *)
            return 1
            ;;
    esac
}

Such function can be used in if like this:

if is_shell_script "$1"; then
    echo "$1 is a shell script"
fi

Note how good naming simplifies reading of the final program. It is also a good idea to give a name to the function argument instead of referring to it by $1. You can assign it to a variable, but it is preferred to mark the variable as local (see the following section):

is_shell_script() {
    local file="$1"
    case "$( head -n 1 "$file" 2>/dev/null )" in
        \#!/bin/sh|\#!/bin/bash)
            return 0
            ;;
        *)
            return 1
            ;;
    esac
}

You might notice that aliases, functions, built-ins, and regular commands are all called the same way. Therefore, the shell has a fixed order of precedence: Aliases are checked first, then functions, then builtins, and finally regular commands from $PATH. Regarding that, the builtins command and builtin might be useful (e.g., in functions of the same name).

Despite many differences from functions in other programming languages, shell functions still represent the best way to structure your scripts. A properly named function creates an abstraction and captures the intent of the script while also hiding implementation details.

Subshells and variable scoping ⚓

This section explains few rules and facts about scoping of variables and why some constructs could not work.

Shell variables are global by default. All variables are visible in all functions, modification done inside a function is visible in the rest of the script, and so on.

It is often convenient to declare variables within functions as local, which limits the scope of the variable to the function. (More precisely, the variable is visible in the function and all functions called from it. You can imagine that the previous value of the variable is saved when you execute the local and restored upon return from the function. This is unlike what most programming languages do.)

When you run another program (including shell scripts and Python programs), it gets a copy of all exported variables. When the program modifies the variables, the changes stay inside the program, not affecting the original shell in any way. (This is similar to how working directory changes behave.)

However, when you use a pipe, it is equivalent to launching a new shell: variables set inside the pipeline are not propagated to the outer code. (The only exception is that the pipeline gets even non-exported variables.)

Enclosing part of our script in ( .. ) creates a so-called subshell which behaves as if another script was launched. Again, variables modified inside this subshell are not visible to the outer shell.

Read and run the following code to understand the mentioned issues.

global_var="one"

change_global() {
    echo "change_global():"
    echo "  global_var=$global_var"
    global_var="two"
    echo "  global_var=$global_var"
}

change_local() {
    echo "change_local():"
    echo "  global_var=$global_var"
    local global_var="three"
    echo "  global_var=$global_var"
}

echo "global_var=$global_var"
change_global
echo "global_var=$global_var"
change_local
echo "global_var=$global_var"

(
    global_var="four"
    echo "global_var=$global_var"
)

echo "global_var=$global_var"

echo "loop:"
(
    echo "five"
    echo "six"
) | while read value; do
    global_var="$value"
    echo "  global_var=$global_var"
done
echo "global_var=$global_var"

Excercises ⚓

Mass image conversion ⚓

The program convert from ImageMagick can convert images between formats using convert source.png target.jpg (with almost any file extensions). Convert all PNG images (with extension .png) in the current directory to JPEG (extension .jpg).

Answer.

By the way, ImageMagick allows to do plenty of operations, one of those which are worth remembering is resizing:

convert DSC0000.jpg -resize 800x600 thumbs/DSC0000.jpg

Standard input or arguments? ⚓

Write fact.sh with a function that computes factorial of a given number. Create two versions:

Read input from stdin.
Read input from the first argument ($1). Answer.

What version was easier to write? Which makes more sense?

Ad-hoc processing of CSV files ⚓

Write a script csv_sum.sh that reads a CSV file from stdin. Sum all the numbers in the column that is specified as the only argument. Do not forget to exit with a non-zero code and proper error message if no argument is provided.

Considering the following file named file.csv.

family_name,first_name,age,points,email
Doe,Joe,22,1,joe_doe@some_mail.com
Fog,Willy,38,8,ab@some_mail.com
Zdepa,Pepa,10,1,pepa@some_mail.com

The output of the command ./csv_sum.sh points <file.csv should be 10. Answer.

Bar plots (the shell style) ⚓

Write bar_plot.sh which prints a horizontal bar plot. Input numbers indicate the bar size. Decide what input option is more viable for you. Example:

$ ./bar_plots.sh 7 1 5
7: #######
1: #
5: #####

If the largest value is greater than 60, rescale the whole plot.

Answer.

The `tree.py` task in shell ⚓

Write the implementation of the tree.py task in shell.

Answer.

Graded tasks … ⚓

… for this lab are actually shared with the following one and will be published on the next labs.

Learning outcomes ⚓

Conceptual knowledge ⚓

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain how shell expansion and splitting into command-line argument array is performed
explain cutting points for when to use Python and when to use Bash
explain what is a environment variable
explain difference between a non-exported and exported shell (environment) variable
explain concurrency issues with temporary files
explain how program exit codes drives the flow of shell scripts
explain how if true; then ... fi is evaluated
explain meaning of $PATH environment variable and how it affects scripts with shebang
explain how variable scoping works in shell

Practical skills ⚓

Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …

set and read environment variables in shell
compute mathematical expressions directly in shell
use command substitution
use temporary files securely in shell scripts
use control flow (for, while, if, case) in shell scripts
use the read command
use getopt for parsing command line arguments
create and use functions
read environment variables in Python (optional)

for img in *.png; do
    target="$(basename -s .png "$img").jpg"
    convert "$img" "$target"
done

If you do not care for double extensions, the body of the for could be as simple as convert "$img" "$img.jpg".

#!/bin/bash

set -ueo pipefail

fact_from_stdin() {
    local N result

    read -r N
    result=1
    while [ "$N" -gt 1 ]; do
        result=$(( result * N ))
        N=$(( N - 1 ))
    done
    echo $result
}

fact_from_arg() {
    local N result
    N="$1"
    result=1
    while [ "$N" -gt 1 ]; do
        result=$(( result * N ))
        N=$(( N - 1 ))
    done
    echo $result
}

fact_from_stdin
fact_from_arg "${1:-0}"

Notice how executing read for the first time skips the headers and other reads actually read individual data rows.

#!/bin/bash
set -ueo pipefail

if [ $# -ne 1 ]; then
    echo "Usage: $0 column-name" >&2
fi

read -r headers

column=0
for header in $( echo "$headers" | tr ',' ' ' ); do
    column=$(( column + 1 ))
    if [ "$header" = "$1" ]; then
        break
    fi
done

cut '-d,' -f "$column" | paste -s -d + | bc

#!/bin/bash

set -ueo pipefail

max_from_arg() {
    local number max

    max=""
    for number in "$@"; do
        [ -z "$max" ] && max="$number"
        if [ "$number" -gt "$max" ]; then
            max="$number";
        fi
    done
    echo "$max"
}

repeat_string() {
    local i

    for i in $( seq $1 ); do
        echo -n "$2"
    done
}

if [ "$#" -eq 0 ]; then
    exit
fi

screen_width=60
longest_bar=$( max_from_arg "$@" )
if [ "$longest_bar" -gt "$screen_width" ]; then
    # do not compute for better precision later on
    scale="$screen_width / $longest_bar"
else
    scale=1
fi

for value in "$@"; do
    # $scale must not be quoted (it may contain an expression)
    size="$(( value * $scale ))"
    printf "%4d: %s\n" "$value" "$( repeat_string "$size" "#" )"
done

#!/bin/bash

set -ueo pipefail

usage() {
    echo "Usage: $2 [options]"
    echo "..."
    exit "$1"
}

list_one_directory() {
    (
        local my_dir="$1"
        local indent="$2"
        local dirs_only="$3"
        local filename

        cd "$my_dir"
        for filename in *; do
            if [ -d "$filename" ]; then
                echo "${indent}${filename}"
                list_one_directory "$filename" "${indent}    " "$dirs_only"
            elif ! $dirs_only; then
                if [ -f "$filename" ]; then
                    echo "${indent}${filename}"
                fi
            fi
        done
    )
}

opts_short="hd"
opts_long="help,directories-only"

# Check for bad usage first
getopt -Q -o "$opts_short" -l "$opts_long" -- "$@" || exit 1
# Actually parse them
eval set -- "$( getopt -o "$opts_short" -l "$opts_long" -- "$@" )"

# Startup configuration
start_dir="."
directories_only=false

while [ $# -gt 0 ]; do
    case "$1" in
        -h|--help)
            usage 0 "$0"
            ;;
        -d|--directories-only)
            directories_only=true
            ;;
        --)
            shift
            break
            ;;
        *)
            usage 1 "$0"
            ;;
    esac
    shift
done

if [ $# -eq 1 ]; then
    start_dir="$1"
elif [ $# -gt 1 ]; then
    usage 1 "$0"
fi

list_one_directory "$start_dir" "" "$directories_only"

Before class reading

Shell variables

Reading environment variables in Python (and export)

Arithmetic in the shell

Special variables and set and env

$PATH

$PATH and the shebang (why we need env)

Control flow in shell scripts

for loops

if and else

while loops

break and continue

Switch (a.k.a. case ... esac)

Before class quiz

More about variables ⚓

Uninitialized values and similar caveats ⚓

Expansion of variables (and other such constructs) ⚓

Command substitution (a.k.a. capturing stdout into a variable) ⚓

Redirection of bigger shell portions ⚓

The read command ⚓

Script parameters and getopt ⚓

getopt ⚓

Functions in shell ⚓

Subshells and variable scoping ⚓

Excercises ⚓

Mass image conversion ⚓

Standard input or arguments? ⚓

Ad-hoc processing of CSV files ⚓

Bar plots (the shell style) ⚓

The tree.py task in shell ⚓

Graded tasks … ⚓

Learning outcomes ⚓

Conceptual knowledge ⚓

Practical skills ⚓

Reading environment variables in Python (and `export`)

Special variables and `set` and `env`

`$PATH` and the shebang (why we need `env`)

`for` loops

`if` and `else`

`while` loops

`break` and `continue`

Switch (a.k.a. `case ... esac`)

The `read` command ⚓

`getopt` ⚓

The `tree.py` task in shell ⚓