Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
The goal of this lab is to define and thoroughly understand the concepts
of standard input, output, and standard error output.
This would allow us to understand program I/O redirection and composition
of different programs via pipes.
We will also customize our shell
environment a little by investigating command aliases and the .bashrc
file.
I/O redirection in practice
Prepare files one.txt
and two.txt
containing the words ONE
, TWO
respectively using echo
and stdout redirection.
Answer.
Merge (concatenate) these two files into merged.txt
.
Answer.
Appending to the end of a file
The shell also offers an option to append the output to an existing file
using the >>
operator.
Thus, the following command would add UNO
as another line into one.txt
.
echo UNO >>one.txt
If the file does not exist, it will be created.
For the following example, we will need the program tac
that reverses the order of
individual lines but otherwise works like cat
. Try this first.
tac one.txt two.txt
If you have executed the commands above, you should see the following:
UNO
ONE
TWO
Try the following and explain what happens (and why) if you execute
tac one.txt two.txt >two.txt
Answer.
Input redirection
Copy the rev
program from above and run it like this:
./rev.py <one.txt
./rev.py one.txt
./rev.py one.txt two.txt
./rev.py one.txt <two.txt
Has it behaved as you expected?
Trace which paths (i.e. through which lines) the program has taken with the above invocations.
Redirecting standard error output
To redirect the standard error output, you can use >
again, but this time preceded
by the number 2
(that denotes the stderr file descriptor).
Hence, our cat
example can be transformed to the following form where err.txt
would contain the error message and nothing would be printed on the screen.
cat one.txt nonexistent.txt two.txt >merged.txt 2>err.txt
Notable special files
We already mentioned several important files under /dev/
.
With output redirection, we can actually use some of them right away.
Run cat one.txt
and redirect the output to /dev/full
and then
to /dev/null
.
What happened?
Especially /dev/null
is a very useful file as it can be used in any
situation when we are not interested in the output of a program.
For many programs you can specify the use of stdin explicitly
by using -
(dash) as the input filename.
Another option is to use /dev/stdin
explicitly: with this name,
we can make the example with rev
work:
./rev.py /dev/stdin one.txt <two.txt
Then Python opens the file /dev/stdin
as a file and operating system
(together with shell) actually connects it with two.txt
.
/dev/stdout
can be used if we want to specify standard output explicitly
(this is mostly useful for programs coming from other environments where
the emphasis is not on using stdout that much).
Generic redirection
Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-than sign.
For example, >&2
specifies that the standard output is redirected to a standard
error output.
That may sound weird but consider the following mini-script.
Here, wget
used to fetch file from given URL.
echo "Downloading tarball for lab 02..." >&2
wget https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/nswi177-lab02.tar.gz 2>/dev/null
We actually want to hide the progress messages of wget
and print ours instead.
Take this as an illustration of the concept as wget
can be silenced via
command-line arguments (--quiet
) as well.
Sometimes, we want to redirect stdout and stderr to one single file.
In these situations simple >output.txt 2>output.txt
would not work
and we have to use >output.txt 2>&1
or &>output.txt
(to redirect
both at once).
However, what about 2>&1 >output.txt
, can we use it as well?
Try it yourself!
Hint.
Pipes (data streaming composition)
We finally move to the area where Linux excels: program composition. In essence, the whole idea behind Unix-family of operating systems is to allow easy composition of various small programs together.
Mostly, the programs that are composed together are filters and they operate on text inputs. These programs do not make any assumptions on the text format and are very generic. Special tools (that are nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML or JSON.
The advantage is that composing the programs is very easy and it is very easy to compose them incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable). This kind of incremental composition is more difficult in normal languages where printing data requires extra commands (here it is printed to the stdout without any extra work).
The disadvantage is that complex compositions can become difficult to read. It is up to the developer to decide when it is time to switch to a better language and process the data there. A typical division of labour is that shell scripts are used to preprocess the data: they are best when you need to combine data from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.
Needless to add, Linux offers plenty of tools for statistical computations or plot drawing utilities that can be controlled in CLI. Mastering of these tools is, unfortunately, out of topic for this course.
Motivation example
As a somewhat artificial example, we will consider the following CSV that can be downloaded from here.
These are actual data representing how long it took to copy the USB disk image to the USB drives in the library. The first column represents the device, the second duration of the copying.
As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some ports that are supposed to be the same are actually systematically slower.
disk,duration
/dev/sdb,1008
/dev/sdb,1676
/dev/sdc,1505
/dev/sdc,4115
...
We want to know what was the longest duration of the copying: in other words, the maximum of column two.
Well, we could use spreadsheet software for that, but we prefer to stay in the terminal. Among other reasons, we want a solution which is easily repeatable with other input files.
Recall that you have already seen the cut
command that is able to extract specific
columns from a file.
There is also the command sort
that sorts lines.
Thus our little script could look like this:
#!/bin/bash
cut -d, -f 2 <disk-speeds-data.csv >/tmp/disk_numbers.txt
sort </tmp/disk_numbers.txt
Prepare this script and run it.
The output is far from perfect: sort
has sorted the lines alphabetically, not by
numeric values.
However, a quick glance at man sort
later, we add -n
(a.k.a. --numeric-sort
)
and re-execute the script.
This time, the last line of the output shows the maximum duration of 5769 seconds. Of course, all the other lines are useless, but we will fix that in a minute.
Let us focus on the temporary file first.
There are two issues with it:
First of all, it requires disk space for another copy of the (possibly huge) data.
A bit more subtle but much more dangerous problem is that the path to the
temporary file is fixed.
Imagine what happens if you execute the script in two terminals concurrently.
Do not be fooled by the feeling that the script so short that the probability of
concurrent execution is negligible.
It is a trap that is waiting to spring.
We will talk about proper use of mktemp(1)
later, but in this example no temporary
file is needed at all. We can write:
cut -d, -f 2 <disk-speeds-data.csv | sort
The |
symbol stands for a pipe, which connects the standard output of cut
to the standard input of sort
. The pipe passes data between the two processes
without writing them to the disk at all. (Technically, the data are passed using
memory buffers, but that is a technical detail.)
The result is the same, but we escaped the pitfalls of using temporary files
and the result is actually even more readable. You can even move the first <
before cut
, so that the script can be read left-to-right like “take
disk-speeds-data.csv
, extract the second column, and then sort it”:
<disk-speeds-data.csv cut -d, -f 2 | sort
In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These transformations are composed together to produce the final result.
Finally, let us recall that we wanted to print only the biggest number.
We can use the tail
utility which prints only the last few lines of a file:
by default 10, but you can ask for just one by adding -n 1
.
As pipelines are not limited to two programs, we can simply write:
cut '-d,' -f 2 | sort -n | tail -n 1
Note that we have removed the path to the input file from the script. Now, the user is supposed to run it like:
get-slowest.sh <disk-speeds-data.csv
This actually makes the script more flexible: it is easy to test such a script with different inputs and the script can be again used as a part of a bigger pipeline.
Using &&
and ||
(logical program composition)
Execute the following commands:
ls / && echo "ls okay"
ls /nonexistent-filename || echo "ls failed"
This is an example of how return codes can be used in practice. We can chain commands to be executed only when the previous one failed or terminated with zero exit code.
Understanding the following is essential, because together with pipes and standard I/O redirection, it forms the basic building blocks of shell scripts.
First of all, we will introduce a syntax for conditional chaining of program calls.
If we want to execute one command only if the previous one succeeded, we
separate them with &&
(i.e., it is a logical and)
On the other hand, if we want to execute the second command only if the
first one fails (in other words, execute the first or the second), we
separate them with ||
.
The example with ls
is quite artificial as ls
is quite noisy when
an error occurs.
However, there is also a program called test
that is silent and can be used
to compare numbers or check file properties.
For example, test -d ~/Desktop
checks that ~/Desktop
is a directory.
If you run it, nothing will be printed.
However, in company with &&
or ||
, we can check its result.
test -d .git && echo "We are in a root of a Git project"
test -f README.md || echo "README.md missing"
This could be used as a very primitive branching in our scripts.
In the next lab, we will introduce proper conditional statements, such as if
and while
.
Note that test
is actually a very powerful command – it does not print
anything but can be used to control other programs.
It is possible to chain commands, &&
and ||
are left-associative and
they have the same priority.
Compare the following commands and how they behave when in a directory
where the file README.md
is or is not present:
test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"
Failing fast
There is a caveat regarding pipes and success of commands: the success of a
pipeline is determined by its last command.
Thus, sort /nonexistent | head
is
a successful command. To make a failure of any command fail the (whole) pipeline, you
need to run set -o pipefail
in your script (or shell) before the pipeline.
Compare the behavior of the following two snippets.
sort /nonexistent | head && echo "All is well"
set -o pipefail
sort /nonexistent | head && echo "All is well"
In most cases, you want the second behavior.
Actually, you typically want the whole script to terminate if there is
an unexpected failure. This means a failure, which was not tested by
the &&
or ||
operator (or one of the conditional statements we meet
in the next lab). Like an uncaught exception in Python.
For example, the following compound command is successful even though one of its components failed:
cat /nonexistent || echo "Oh well"
To enable terminate-on-failure, you need to call set -e
. In case of failure,
the shell will stop executing the script and exit with the same exit code as
the failed command.
Furthermore, you usually want to terminate the script when an uninitiailized variable is
used: that is enabled by set -u
. (We will talk about variables later.)
Therefore, typically, you want to start your script with the following trio:
set -o pipefail
set -e
set -u
Many commands allow short options (such as -l
or -h
you know from ls
)
to be merged like this (note that -o pipefail
has to be last):
set -ueo pipefail
Get into a habit where each of your scripts starts with this command.
Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.
Shell customization
We already mentioned that you should customize your terminal emulator to be comfortable to use. After all, you will spend at least this semester with it and it should be fun to use.
In this lab, we will show some other options how to make your shell more comfortable to use.
Command aliases
You probably noticed that you execute some commands with the same options
a lot.
One such example could be ls -l -h
that prints a detailed file listing, using
human-readable sizes.
Or perhaps ls -F
to append a slash to the directories.
And probably ls --color
too.
Shell offers to create so-called aliases where you can easily add new commands without creating full-fledged scripts somewhere.
Try executing the following commands to see how a new command l
could be
defined.
alias l='ls -l -h`
l
We can even override the original command, the shell will ensure that rewriting is not a recursive.
alias ls='ls -F --color=auto'
Note that these two aliases together also ensure that l
will display
filenames in colors.
There are no spaces around the equal sign.
Some typical aliases that you will probably want to try are the following
ones.
Use a manual page if you are unsure what the alias does.
Note that curl
is used to retrieve contents from a URL and wttr.in
is really
a URL.
By the way, try that command even if you do not plan to use this alias :-).
alias ls='ls -F --color=auto'
alias ll='ls -l'
alias l='ls -l -h'
alias cp='cp -i'
alias mv='mv -i'
alias rm='rm -i'
alias man='man -a'
alias weather='curl wttr.in'
~/.bashrc
Aliases above are nice, but you probably do not want to define them each time
you launch the shell.
However, most shells in Linux have some kind of file that they execute before
they enter interactive mode.
Typically, the file resides directly in your home directory and it is named after
the shell, ending with rc
(you can remember it as runtime configuration).
For Bash that we are using now (if you are using a different shell, you
probably already know where to find its configuration files), that file is
called ~/.bashrc
.
You have already used it when setting EDITOR
for Git, but you can also add
aliases there.
Depending on your distribution, you may already see some aliases or some
other commands there.
Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.
The .bashrc
file behaves as a shell script and you are not limited to
have only aliases there.
Virtually any commands can be there that you want to execute in every
terminal that you launch.
More examples
The following examples can be solved either by executing multiple commands or by piping basic shell commands together. To help you find the right program, you can use manual pages. You can also use our manual as a starting point.
Create a directory a
and inside it create a text file --help
containing Lorem Ipsum
.
Print the content of this file and then delete it.
Solution.
Create a directory called b
and inside it create files called
alpha.txt
and *
.
Then delete the file called *
and watch out what happened to the file alpha.txt
.
Solution.
Print the content of the file /etc/passwd
sorted by the rows.
Solution.
The command getent passwd USERNAME
prints the information about user
account USERNAME
(e.g., intro
) on your machine.
Write a command that prints information about user intro
or a message
This is not NSWI177 disk
if the user does not exist.
Solution.
Print the first and third column of the file /etc/group
.
Solution.
Count the lines of the file /etc/services
.
Solution.
Print last two lines of the files /etc/passwd
and /etc/group
using
a single command.
Solution.
Recall the file disk-speeds-data.csv
with the disk copying durations.
Compute the sum of all durations.
Solution.
Consider the following file format.
Alpha 8 4 5 0
Bravo 12 5 3 2
Charlie 1 0 11 4
Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to squeeze the spaces). Hint. Solution.
Print information about the last commit, when the script is executed in
a directory that is not part of any Git project, the script shall print
only Not inside a Git repository
.
Hint. Solution.
Print the contents of /etc/passwd
and /etc/group
separated by
text Ha ha ha
(i.e., contents of /etc/passwd
,
line with Ha ha ha
and contents of /etc/group
).
Solution.
Graded tasks (deadline: Mar 20)
Do not forget a proper shebang and the executable bit.
IMPORTANT: all the following tasks must be solved using only
pipes and &&
or ||
command composition.
Use standard shell utilities and do not use shell if
s or while
s
even if you know them (the purpose of these tasks is to exercise your
knowledge of Linux filters).
04/override.sh
(30 points)
The script will print to stdout contents of a file HEADER
(in the working directory).
However, if a file .NO_HEADER
exists in the current directory, nothing
will be printed (even if HEADER
exists).
If neither of the files exists, the program should print
Error: HEADER not found.
on standard error and terminate with exit status 1.
Otherwise, the script will terminate with success.
UPDATE: You can check for the existence of the files multiple times, you may safely assume that they will not change while your script is running. We have found a small bug in the test suite, please recheck that your solution still passes the tests.
04/second_highest_uid.sh
(30 points)
Write a script that reads passwd
-like formatted text on standard input
and prints second highest numerical user ID.
Look up passwd
entry in the fifth section of the manual pages to understand
the format used.
For testing, feel free to feed it your /etc/passwd
. Our tests will create
artificial data for testing your solution more thoroughly.
You can safely assume that user IDs are unique and there will be always at least two entries in the file.
04/row_sum.sh
(40 points)
Assume that you have a a matrix writen in a “fancy” notation. You can rely that the format is fixed (with regard to spacing, 3 digits maximum, position of pipe symbol etc.) but the number of columns or rows can differ.
Write a script that prints sum of each row.
We expect that for the following matrix we would get this output.
| 106 179 |
| 188 50 |
| 5 125 |
285
238
130
The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely on the fixed format as explained above.
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what is standard output and input
-
explain why standard input/output redirection is not (directly) observable from inside the program
-
explain why standard error output is different from standard output
-
explain how execution of
cat foo.txt
andcat <foo.txt
differs -
explain how multiple programs using stdio can be merged together
-
explain what is program exit code and how it can be used
-
explain differences and typical uses for the main five interfaces a CLI program can use: program arguments, stdin, stdout, stderr and program exit code
-
explain what a file descriptor is (from the application developers’ perspective, not from the OS/kernel side) (optional)
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
redirect standard output and input of CLI programs
-
use special file
/dev/null
-
use standard output/input in Python
-
use pipe to write simple composite programs
-
use composition operands && and || in shell scripts
-
use basic text filtering utilities such as cut, …
-
change program exit code for Python scripts
-
customize shell with aliases (optional)
-
customize shell configuration via
.bashrc
and.profile
scripts (optional)