Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
This is material for both labs 05 and 06. There will be a separate before class quiz for 05 and 06 but graded tasks (scripting) will be merged but with twice the amount of points.
In this lab we will almost complete our tour of shell scripting: you will learn about conditions and loops in the shell, and about variables. And much more.
It is important to note that all of the topics below are usable both in the scripts (i.e., non-interactively) as well as when using terminal interactively. Especially the for loops over list of files are often used directly on the command-line without enclosing them in a full-fledged script.
Do not forget that the Before class reading is mandatory and there is a quiz that you are supposed to complete before coming to the labs.
This reading is before the fifth lab.
More about variables
You have seen that for simple cases, the following is sufficient to create and use a variable in your scripts.
output_file="out.txt"
echo "Writing to $output_file." >&2
head -n 1 /etc/passwd >"$output_file"
Uninitialized values and similar caveats
If you try to use a variable that was not initialized, shell will pretend it contains an empty string. While this can be useful, it can be also a source of nasty surprises.
As we mentioned earlier, you should always start you shell scripts with
set -u
to warn you about such situations.
However, you sometimes need to read from a potentially uninitialized variable
to check if it was initialized.
For example, we might want to read $EDITOR
to get the user’s preferred editor,
but provide a sane default if the variable is not set.
This is easily done using the ${VAR:-default_value}
notation.
If VAR
was set, its value is used, otherwise default_value
is used.
This does not trigger the warning produced by set -u
.
So we can write:
"${EDITOR:-mcedit}" file-to-edit.txt
Frequently, it is better to handle the defaults at the beginning of a script using this idiom:
EDITOR="${EDITOR:-mcedit}"
Later in the script, we may call the editor using just:
"$EDITOR" file-to-edit.txt
Note that it is also possible to write ${EDITOR}
to explicitly delimit
the variable name.
This is useful if you want to print variable followed by a letter:
file_prefix=nswi177-
echo "Will store into ${file_prefix}log.txt"
echo "Will store into $file_prefixlog.txt"
Expansion of variables (and other such constructs)
We saw that the shell performs various types of expansion. It expands variables, wildcards, tildes, arithmetic expressions, and many other things.
It is essential to understand how these expansions interact with each other. Instead of describing the formal process (which is quite complicated), we will show several examples to demonstrate typical situations.
We will call args.py
from the previous labs to demonstrate what happens.
(Of course you need to call it from the right directory.)
First, parameters are prepared (split) after variable expansion:
VAR="value with spaces"
args.py "$VAR"
args.py $VAR
Prepare files named one.sh
and with space.sh
for the following example:
VAR="*.sh"
args.py "$VAR"
args.py $VAR
args.py "\$VAR"
args.py '$VAR'
Run the above again but remove one.sh
after assigning to VAR
.
Tilde expansion (your home directory) is a bit more tricky:
VAR=~
echo "$VAR" '$VAR' $VAR
VAR="~"
echo "$VAR" '$VAR' $VAR
The important take-away is that variable expansion is tricky. But it is always very easy to try it practically instead of remembering all the gotchas. As a matter of fact, if you keep in mind that spaces and wildcards require special attention, you will be fine :-).
Command substitution (a.k.a. capturing stdout into a variable)
Often, we need to store output from a command into a variable. This also includes storing content of a file (or part of it) in a variable.
A prominent example is the use of the mktemp(1)
command.
It solves the problem with secure creation of temporary files
(remember that creating a fixed-name temporary file in /tmp
is dangerous).
The mktemp
command creates a uniquely-named file (or a directory) and prints
its name to stdout.
Obviously, to use the file in further commands, we need to store its name in
a variable.
Shell offers the following syntax for the so-called command substitution:
my_temp="$( mktemp -d )"
The command mktemp -d
is run and its output is stored into the variable
$my_temp
.
Where is stderr stored? Answer.
How would you capture stderr then?
For example like this:
my_temp="$( mktemp -d )"
stdout="$( the_command 2>"$my_temp/err.txt" )"
stderr="$( cat "$my_temp/err.txt" )"
Command substitution is also often used in logging or when transforming filenames
(use man pages to learn what date
, basename
, and dirname
do):
echo "I am running on $( uname -m ) architecture."
input_filename="/some/path/to/a/file.sh"
backup="$( dirname "$input_filename" )/$( basename "$input_filename" ).bak"
other_backup="$( dirname "$input_filename" )/$( basename "$input_filename" .sh ).bak.sh"
Redirection of bigger shell portions
The whole control structure (e.g, for
, if
, or while
with all the commands inside)
behaves as a single command. So you can apply redirection to the whole structure.
For example:
if test -d .git; then
echo "We are in a root of a Git project"
else
echo "This is not a root of a Git project"
fi | tr 'a-z' 'A-Z'
The read
command
When a shell script needs to read from stdin into a variable, there is
the read
built-in command:
read FIRST_LINE <input.txt
echo "$FIRST_LINE"
Typically, read
is used in a while
loop to iterate over the whole input.
read
is also able to split the line to fields on white space and assign each
field in a different variable.
Considering we have an input of this format, the following loop computes the average of the numbers.
/dev/sdb 1008
/dev/sdb 1676
/dev/sdc 1505
/dev/sdc 4115
/dev/sdd 999
count=0
total=0
while read device duration; do
count=$(( count + 1 ))
total=$(( total + duration ))
done
echo "Average is about $(( total / count ))."
As you can guess from the above snippet, read
returns 0 as long as it is
able to read into the variables. Reaching the end of the file is announced by
a non-zero exit code.
read
can be sometimes too smart about certain inputs. For example, it interprets
backslashes. You can use read -r
to suppress this behavior.
Other notable parameters are -t
or -p
: use read --help
to see their
description.
Script parameters and getopt
When a shell script receives parameters, we can access them via special
variables $1
, $2
, $3
, …
Check with the following script:
echo "$#"
echo "${0}"
echo "${1:-parameter one not set}"
echo "${2:-parameter two not set}"
echo "${3:-parameter three not set}"
echo "${4:-parameter four not set}"
echo "${5:-parameter five not set}"
and run as
./script.sh
./script.sh one
./script.sh one two
./script.sh one two three
./script.sh one two three four
./script.sh one two three four five
./script.sh one two three four five six
If you want to access all parameters, there is a special variable $@
for that.
Try adding args.py "$@"
to the script above and re-execute.
Note that $@
must be quoted to work properly (the explanation is beyond
the scope of this course).
The special variable $#
contains the number of arguments on the command-line and $0
refers to the actual script name (like sys.argv[0]
).
getopt
When our script needs one argument, accessing $1
directly is fine.
When you want to recognize options, it parsing of arguments becomes more complicated.
Shell offers a getopt
command that is able to handle command-line parsing
for you.
We will not describe all the details of this command. Instead, we show an example that you can modify to your own needs.
The main arguments controlling getopt
behavior are -o
and -l
, that
contain description of the switches for our program.
Let us assume that we would want to handle options --verbose
to
make our script a bit more descriptive and --output
to specify an alternate
output file.
We would also like to handle short versions of these options: -o
and -v
.
With --version
, we want to print the version of our script.
And we should not forget about --help
too.
Non-option arguments will be interpreted as names of input files.
The specification of the getopt
switches is simple:
getopt -o "vho:" -l "verbose,version,help,output:"
Single-letter switches are specified after -o
, long options after -l
, and
a colon :
after the option denotes that it expects an argument.
After that, we add --
followed by the actual parameters. Let us try:
getopt -o "vho:" -l "verbose,version,help,output:" -- --help input1.txt --output=file.txt
getopt -o "vho:" -l "verbose,version,help,output:" -- --help --verbose -o out.txt input2.txt
...
As you can see, getopt
is able to parse the input and convert the parameters
to a unified form, moving the non-option arguments to the end of the list.
The following “magical” line (you do not need to understand it to use it)
resets $1
, $2
etc. to contain the values as parsed by getopt
.
eval set -- "$( getopt -o "vho:" -l "verbose,version,help,output:" -- "$@" )"
The actual processing is then quite straightforward:
#!/bin/bash
set -ueo pipefail
opts_short="vho:"
opts_long="verbose,version,help,output:"
# Check for bad usage first (notice the ||)
getopt -Q -o "$opts_short" -l "$opts_long" -- "$@" || exit 1
# Actually parse them (we are here only if they are correct)
eval set -- "$( getopt -o "$opts_short" -l "$opts_long" -- "$@" )"
be_quiet=true
output_file=/dev/stdout
while [ $# -gt 0 ]; do
case "$1" in
-h|--help)
echo "Usage: $0 ..."
exit 0
;;
-o|--output)
output_file="$2"
shift
;;
-v|--verbose)
be_quiet=false
;;
--)
shift
break
;;
*)
echo "Unknown option $1" >&2
exit 1
;;
esac
shift
done
$be_quiet || echo "Starting the script"
for inp in "$@"; do
$be_quiet || echo "Processing $inp into $output_file ..."
done
Several parts of the script deserve explanation.
true
and false
are not boolean values, but they can be used as such.
Actually, they are very simple programs that simply terminate
with the proper exit code (0 and 1, respectively).
Note how we use them to drive the logging.
(Incidentally, $be_verbose && echo "Message"
would not work. Do you see why?)
exit
immediately terminates a shell script.
The optional parameter denotes the exit code of the script.
shift
is a special command that shifts the variables $1
, $2
, … by
one. After shift
, $3
becomes $2
, $2
becomes $1
and $1
is lost.
"$@"
is modified accordingly.
Thus, the whole loop processes all options until encountering --
that
separates options from other arguments.
The for
loop therefore iterates over the other arguments.
Functions in shell
You can also define functions in the shell:
function_name() {
commands
}
A function has the same interface as a full-fledged shell script. Arguments
are passed as $1
, $2
, …. The result of the function is an integer with
the same semantics as the exit code. Thus, the ()
is there just to mark that this is
a function; it is not a list of arguments.
Please consult the following section on variable scoping for details about which variables are visible inside a function.
A simple logging function could look like this:
msg() {
echo "$( date '+%Y-%m-%d %H:%M:%S |' )" "$@" >&2
}
It prints the current date followed by the actual message, all to stderr.
As another example consider the following function:
get_load() {
cut -d ' ' -f "$1" </proc/loadavg
}
load_curr="$( get_load 1 )"
load_prev="$( get_load 2 )"
Note how the function’s stdout is captured into a variable.
Calling return
terminates function execution, the optional parameter of return
is the exit code. (If you use exit
within a function, it terminates the whole script.)
is_shell_script() {
case "$( head -n 1 "$1" 2>/dev/null )" in
\#!/bin/sh|\#!/bin/bash)
return 0
;;
*)
return 1
;;
esac
}
Such function can be used in if
like this:
if is_shell_script "$1"; then
echo "$1 is a shell script"
fi
Note how good naming simplifies reading of the final program.
It is also a good idea to give a name to the function argument
instead of referring to it by $1
.
You can assign it to a variable, but it is preferred to mark the
variable as local
(see the following section):
is_shell_script() {
local file="$1"
case "$( head -n 1 "$file" 2>/dev/null )" in
\#!/bin/sh|\#!/bin/bash)
return 0
;;
*)
return 1
;;
esac
}
You might notice that aliases, functions, built-ins, and regular commands are
all called the same way. Therefore, the shell has a fixed order of precedence:
Aliases are checked first, then functions, then builtins, and finally regular
commands from $PATH
. Regarding that, the builtins command
and builtin
might be useful (e.g., in functions of the same name).
Despite many differences from functions in other programming languages, shell functions still represent the best way to structure your scripts. A properly named function creates an abstraction and captures the intent of the script while also hiding implementation details.
Subshells and variable scoping
This section explains few rules and facts about scoping of variables and why some constructs could not work.
Shell variables are global by default. All variables are visible in all functions, modification done inside a function is visible in the rest of the script, and so on.
It is often convenient to declare variables within functions as local
, which
limits the scope of the variable to the function. (More precisely, the variable
is visible in the function and all functions called from it. You can imagine
that the previous value of the variable is saved when you execute the local
and restored upon return from the function. This is unlike what most programming
languages do.)
When you run another program (including shell scripts and Python programs), it gets a copy of all exported variables. When the program modifies the variables, the changes stay inside the program, not affecting the original shell in any way. (This is similar to how working directory changes behave.)
However, when you use a pipe, it is equivalent to launching a new shell: variables set inside the pipeline are not propagated to the outer code. (The only exception is that the pipeline gets even non-exported variables.)
Enclosing part of our script in ( .. )
creates a so-called subshell
which behaves as if another script was launched.
Again, variables modified inside this subshell are not visible to the
outer shell.
Read and run the following code to understand the mentioned issues.
global_var="one"
change_global() {
echo "change_global():"
echo " global_var=$global_var"
global_var="two"
echo " global_var=$global_var"
}
change_local() {
echo "change_local():"
echo " global_var=$global_var"
local global_var="three"
echo " global_var=$global_var"
}
echo "global_var=$global_var"
change_global
echo "global_var=$global_var"
change_local
echo "global_var=$global_var"
(
global_var="four"
echo "global_var=$global_var"
)
echo "global_var=$global_var"
echo "loop:"
(
echo "five"
echo "six"
) | while read value; do
global_var="$value"
echo " global_var=$global_var"
done
echo "global_var=$global_var"
Excercises
Mass image conversion
The program convert
from ImageMagick can
convert images between formats using convert source.png target.jpg
(with
almost any file extensions). Convert all PNG images (with extension .png
) in
the current directory to JPEG (extension .jpg
).
By the way, ImageMagick allows to do plenty of operations, one of those which are worth remembering is resizing:
convert DSC0000.jpg -resize 800x600 thumbs/DSC0000.jpg
Standard input or arguments?
Write fact.sh
with a function that computes factorial of a given
number. Create two versions:
-
Read input from stdin.
-
Read input from the first argument (
$1
). Answer.
What version was easier to write? Which makes more sense?
Ad-hoc processing of CSV files
Write a script csv_sum.sh
that reads a CSV file from stdin. Sum all the numbers
in the column that is specified as the only argument. Do not forget to exit with
a non-zero code and proper error message if no argument is provided.
Considering the following file named file.csv
.
family_name,first_name,age,points,email
Doe,Joe,22,1,joe_doe@some_mail.com
Fog,Willy,38,8,ab@some_mail.com
Zdepa,Pepa,10,1,pepa@some_mail.com
The output of the command ./csv_sum.sh points <file.csv
should be 10
.
Answer.
Bar plots (the shell style)
Write bar_plot.sh
which prints a horizontal bar plot.
Input numbers indicate the bar size.
Decide what input option is more viable for you. Example:
$ ./bar_plots.sh 7 1 5
7: #######
1: #
5: #####
If the largest value is greater than 60, rescale the whole plot.
Answer.The tree.py
task in shell
Write the implementation of the tree.py
task in shell.
Graded tasks …
… for this lab are actually shared with the following one and will be published on the next labs.
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain how shell expansion and splitting into command-line argument array is performed
-
explain cutting points for when to use Python and when to use Bash
-
explain what is a environment variable
-
explain difference between a non-exported and exported shell (environment) variable
-
explain concurrency issues with temporary files
-
explain how program exit codes drives the flow of shell scripts
-
explain how
if true; then ... fi
is evaluated -
explain meaning of $PATH environment variable and how it affects scripts with shebang
-
explain how variable scoping works in shell
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
set and read environment variables in shell
-
compute mathematical expressions directly in shell
-
use command substitution
-
use temporary files securely in shell scripts
-
use control flow (for, while, if, case) in shell scripts
-
use the read command
-
use getopt for parsing command line arguments
-
create and use functions
-
read environment variables in Python (optional)