Cvičení: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Cílem tohoto cvičení je definovat a do hloubky pochopit, co je to standardní
vstup (stdin), standardní výstup (stdout) a standardní chybový výstup
(stderr). To nám umožní porozumět tomu, jak funguje přesměrování vstupu a
výstupu (I/O redirection) a propojování programů pomocí pipes (český
ekvivalent “pipe” je doslova “roura”, ale tento pojem se v praxi téměř
nepoužívá). Také si přizpůsobíme chování našeho shellu - prozkoumáme, jak
fungují aliasy a .bashrc
.
Přesměrování v praxi
Prepare files one.txt
and two.txt
containing the words ONE
, TWO
respectively using echo
and stdout redirection.
Answer.
Merge (concatenate) these two files into merged.txt
.
Answer.
Appending to the end of a file
The shell also offers an option to append the output to an existing file
using the >>
operator.
Thus, the following command would add UNO
as another line into one.txt
.
echo UNO >>one.txt
If the file does not exist, it will be created.
For the following example, we will need the program tac
that reverses the
order of individual lines but otherwise works like cat
. Try this first.
tac one.txt two.txt
If you have executed the commands above, you should see the following:
UNO
ONE
TWO
Try the following and explain what happens (and why) if you execute
tac one.txt two.txt >two.txt
Answer.
Přesměrování vstupu
Copy the rev
program from above and run it like this:
./rev.py <one.txt
./rev.py one.txt
./rev.py one.txt two.txt
./rev.py one.txt <two.txt
Has it behaved as you expected?
Trace which paths (i.e. through which lines) the program has taken with the above invocations.
Redirecting standard error output
To redirect the standard error output, you can use >
again, but this time preceded
by the number 2
(that denotes the stderr file descriptor).
Hence, our cat
example can be transformed to the following form where
err.txt
would contain the error message and nothing would be printed on
the screen.
cat one.txt nonexistent.txt two.txt >merged.txt 2>err.txt
Důležité speciální soubory
We already mentioned several important files under /dev/
. With output
redirection, we can actually use some of them right away.
Run cat one.txt
and redirect the output to /dev/full
and then to
/dev/null
. What happened?
Especially /dev/null
is a very useful file as it can be used in any
situation when we are not interested in the output of a program.
For many programs you can specify the use of stdin explicitly by using -
(dash) as the input filename.
Another option is to use /dev/stdin
explicitly: with this name, we can
make the example with rev
work:
./rev.py /dev/stdin one.txt <two.txt
Then Python opens the file /dev/stdin
as a file and operating system
(together with shell) actually connects it with two.txt
.
/dev/stdout
can be used if we want to specify standard output explicitly
(this is mostly useful for programs coming from other environments where the
emphasis is not on using stdout that much).
Obecné přesměrování
Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-than sign.
For example, >&2
specifies that the standard output is redirected to a standard
error output.
That may sound weird but consider the following mini-script.
Here, wget
used to fetch file from given URL.
echo "Downloading tarball for lab 02..." >&2
wget https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/nswi177-lab02.tar.gz 2>/dev/null
We actually want to hide the progress messages of wget
and print ours
instead.
Take this as an illustration of the concept as wget
can be silenced via
command-line arguments (--quiet
) as well.
Sometimes, we want to redirect stdout and stderr to one single file.
In these situations simple >output.txt 2>output.txt
would not work
and we have to use >output.txt 2>&1
or &>output.txt
(to redirect
both at once).
However, what about 2>&1 >output.txt
, can we use it as well?
Try it yourself!
Hint.
Pipes (data streaming composition)
We finally move to the area where Linux excels: program composition. In essence, the whole idea behind Unix-family of operating systems is to allow easy composition of various small programs together.
Mostly, the programs that are composed together are filters and they operate on text inputs. These programs do not make any assumptions on the text format and are very generic. Special tools (that are nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML or JSON.
The advantage is that composing the programs is very easy and it is very easy to compose them incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable). This kind of incremental composition is more difficult in normal languages where printing data requires extra commands (here it is printed to the stdout without any extra work).
The disadvantage is that complex compositions can become difficult to read. It is up to the developer to decide when it is time to switch to a better language and process the data there. A typical division of labour is that shell scripts are used to preprocess the data: they are best when you need to combine data from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.
Needless to add, Linux offers plenty of tools for statistical computations or plot drawing utilities that can be controlled in CLI. Mastering of these tools is, unfortunately, out of topic for this course.
Motivation example
As a somewhat artificial example, we will consider the following CSV that can be downloaded from here.
These are actual data representing how long it took to copy the USB disk image to the USB drives in the library. The first column represents the device, the second duration of the copying.
As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some ports that are supposed to be the same are actually systematically slower.
disk,duration
/dev/sdb,1008
/dev/sdb,1676
/dev/sdc,1505
/dev/sdc,4115
...
We want to know what was the longest duration of the copying: in other words, the maximum of column two.
Well, we could use spreadsheet software for that, but we prefer to stay in the terminal. Among other reasons, we want a solution which is easily repeatable with other input files.
Recall that you have already seen the cut
command that is able to extract
specific columns from a file. There is also the command sort
that sorts
lines.
Thus our little script could look like this:
#!/bin/bash
cut -d, -f 2 <disk-speeds-data.csv >/tmp/disk_numbers.txt
sort </tmp/disk_numbers.txt
Prepare this script and run it.
The output is far from perfect: sort
has sorted the lines alphabetically,
not by numeric values. However, a quick glance at man sort
later, we add
-n
(a.k.a. --numeric-sort
) and re-execute the script.
This time, the last line of the output shows the maximum duration of 5769 seconds. Of course, all the other lines are useless, but we will fix that in a minute.
Let us focus on the temporary file first. There are two issues with it:
First of all, it requires disk space for another copy of the (possibly huge)
data. A bit more subtle but much more dangerous problem is that the path to
the temporary file is fixed. Imagine what happens if you execute the script
in two terminals concurrently. Do not be fooled by the feeling that the
script so short that the probability of concurrent execution is negligible.
It is a trap that is waiting to spring. We will talk about proper use of
mktemp(1)
later, but in this example no temporary file is needed at
all. We can write:
cut -d, -f 2 <disk-speeds-data.csv | sort
The |
symbol stands for a pipe, which connects the standard output of
cut
to the standard input of sort
. The pipe passes data between the two
processes without writing them to the disk at all. (Technically, the data
are passed using memory buffers, but that is a technical detail.)
The result is the same, but we escaped the pitfalls of using temporary files
and the result is actually even more readable. You can even move the first <
before cut
, so that the script can be read left-to-right like “take
disk-speeds-data.csv
, extract the second column, and then sort it”:
<disk-speeds-data.csv cut -d, -f 2 | sort
In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These transformations are composed together to produce the final result.
Finally, let us recall that we wanted to print only the biggest number. We
can use the tail
utility which prints only the last few lines of a file:
by default 10, but you can ask for just one by adding -n 1
. As pipelines
are not limited to two programs, we can simply write:
cut '-d,' -f 2 | sort -n | tail -n 1
Note that we have removed the path to the input file from the script. Now, the user is supposed to run it like:
get-slowest.sh <disk-speeds-data.csv
This actually makes the script more flexible: it is easy to test such a script with different inputs and the script can be again used as a part of a bigger pipeline.
Using &&
and ||
(logical program composition)
Execute the following commands:
ls / && echo "ls okay"
ls /nonexistent-filename || echo "ls failed"
This is an example of how return codes can be used in practice. We can chain commands to be executed only when the previous one failed or terminated with zero exit code.
Understanding the following is essential, because together with pipes and standard I/O redirection, it forms the basic building blocks of shell scripts.
First of all, we will introduce a syntax for conditional chaining of program calls.
If we want to execute one command only if the previous one succeeded, we
separate them with &&
(i.e., it is a logical and) On the other hand, if
we want to execute the second command only if the first one fails (in other
words, execute the first or the second), we separate them with ||
.
The example with ls
is quite artificial as ls
is quite noisy when an
error occurs. However, there is also a program called test
that is silent
and can be used to compare numbers or check file properties. For example,
test -d ~/Desktop
checks that ~/Desktop
is a directory. If you run it,
nothing will be printed. However, in company with &&
or ||
, we can
check its result.
test -d .git && echo "We are in a root of a Git project"
test -f README.md || echo "README.md missing"
This could be used as a very primitive branching in our scripts. In the
next lab, we will introduce proper conditional statements, such as if
and
while
.
Note that test
is actually a very powerful command – it does not print
anything but can be used to control other programs.
It is possible to chain commands, &&
and ||
are left-associative and
they have the same priority.
Compare the following commands and how they behave when in a directory where
the file README.md
is or is not present:
test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"
Failing fast
There is a caveat regarding pipes and success of commands: the success of a
pipeline is determined by its last command. Thus, sort /nonexistent | head
is a successful command. To make a failure of any command fail the
(whole) pipeline, you need to run set -o pipefail
in your script (or
shell) before the pipeline.
Compare the behavior of the following two snippets.
sort /nonexistent | head && echo "All is well"
set -o pipefail
sort /nonexistent | head && echo "All is well"
In most cases, you want the second behavior.
Actually, you typically want the whole script to terminate if there is an
unexpected failure. This means a failure, which was not tested by the &&
or ||
operator (or one of the conditional statements we meet in the next
lab). Like an uncaught exception in Python.
For example, the following compound command is successful even though one of its components failed:
cat /nonexistent || echo "Oh well"
To enable terminate-on-failure, you need to call set -e
. In case of
failure, the shell will stop executing the script and exit with the same
exit code as the failed command.
Furthermore, you usually want to terminate the script when an uninitiailized
variable is used: that is enabled by set -u
. (We will talk about variables
later.)
Therefore, typically, you want to start your script with the following trio:
set -o pipefail
set -e
set -u
Many commands allow short options (such as -l
or -h
you know from ls
)
to be merged like this (note that -o pipefail
has to be last):
set -ueo pipefail
Get into a habit where each of your scripts starts with this command.
Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.
Shell customization
We already mentioned that you should customize your terminal emulator to be comfortable to use. After all, you will spend at least this semester with it and it should be fun to use.
In this lab, we will show some other options how to make your shell more comfortable to use.
Command aliases
You probably noticed that you execute some commands with the same options a
lot. One such example could be ls -l -h
that prints a detailed file
listing, using human-readable sizes. Or perhaps ls -F
to append a slash
to the directories. And probably ls --color
too.
Shell offers to create so-called aliases where you can easily add new commands without creating full-fledged scripts somewhere.
Try executing the following commands to see how a new command l
could be
defined.
alias l='ls -l -h`
l
We can even override the original command, the shell will ensure that rewriting is not a recursive.
alias ls='ls -F --color=auto'
Note that these two aliases together also ensure that l
will display
filenames in colors.
There are no spaces around the equal sign.
Some typical aliases that you will probably want to try are the following
ones. Use a manual page if you are unsure what the alias does. Note that
curl
is used to retrieve contents from a URL and wttr.in
is really a
URL. By the way, try that command even if you do not plan to use this alias
:-).
alias ls='ls -F --color=auto'
alias ll='ls -l'
alias l='ls -l -h'
alias cp='cp -i'
alias mv='mv -i'
alias rm='rm -i'
alias man='man -a'
alias weather='curl wttr.in'
~/.bashrc
Aliases above are nice, but you probably do not want to define them each
time you launch the shell. However, most shells in Linux have some kind of
file that they execute before they enter interactive mode. Typically, the
file resides directly in your home directory and it is named after the
shell, ending with rc
(you can remember it as runtime configuration).
For Bash that we are using now (if you are using a different shell, you
probably already know where to find its configuration files), that file is
called ~/.bashrc
.
You have already used it when setting EDITOR
for Git, but you can also add
aliases there. Depending on your distribution, you may already see some
aliases or some other commands there.
Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.
The .bashrc
file behaves as a shell script and you are not limited to have
only aliases there. Virtually any commands can be there that you want to
execute in every terminal that you launch.
More examples
The following examples can be solved either by executing multiple commands or by piping basic shell commands together. To help you find the right program, you can use manual pages. You can also use our manual as a starting point.
Create a directory a
and inside it create a text file --help
containing Lorem Ipsum
.
Print the content of this file and then delete it.
Solution.
Create a directory called b
and inside it create files called
alpha.txt
and *
.
Then delete the file called *
and watch out what happened to the file alpha.txt
.
Solution.
Print the content of the file /etc/passwd
sorted by the rows.
Solution.
The command getent passwd USERNAME
prints the information about user
account USERNAME
(e.g., intro
) on your machine.
Write a command that prints information about user intro
or a message
This is not NSWI177 disk
if the user does not exist.
Solution.
Print the first and third column of the file /etc/group
.
Solution.
Count the lines of the file /etc/services
.
Solution.
Print last two lines of the files /etc/passwd
and /etc/group
using
a single command.
Solution.
Recall the file disk-speeds-data.csv
with the disk copying durations.
Compute the sum of all durations.
Solution.
Předpokládejme následující formát souboru.
Alpha 8 4 5 0
Bravo 12 5 3 2
Charlie 1 0 11 4
Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to squeeze the spaces). Hint. Solution.
Print information about the last commit, when the script is executed in
a directory that is not part of any Git project, the script shall print
only Not inside a Git repository
.
Hint. Solution.
Print the contents of /etc/passwd
and /etc/group
separated by
text Ha ha ha
(i.e., contents of /etc/passwd
,
line with Ha ha ha
and contents of /etc/group
).
Solution.
Hodnocené úlohy (deadline: 20. březen)
Nezapomínejte na správné nastavení executable bitu a shebang.
DŮLEŽITÉ: všechny tyto úlohy musí být vyřešení jen pomocí pipes
a &&
a ||
skládání.
Používejte standardní shellové programy, nepoužívejte shellové if
y nebo while
(cílem úloh je ověřit Vaše znalosti Linuxových filtrů).
04/override.sh
(30 bodů)
Skript vypíše na stdout obsah souboru HEADER
(v pracovním adresáři).
Pokud ale je v adresáři soubor .NO_HEADER
, nic vypsáno nebude (i pokud
HEADER
existuje).
Pokud žádný ze souborů neexistuje, program vypíše Error: HEADER not found.
na standardní chybový výstup a skončí s návratovou hodnotou 1.
Jinak skript končí úspěšně.
AKTUALIZACE: Kontrolu existence souboru můžete provést vícekrát. A můžete předpokládat, že soubory se nezmění, zatímco Váš skript běží. Našli jsme také drobnou chybu v našich testech, prosím, překontrolujte si, že Vaše řešení i nadále prochází.
04/second_highest_uid.sh
(30 bodů)
Napište skript, který bude číst na standardním vstupu soubor naformátovaný
jako passwd
a vypíše druhé nejvyšší uživatelské číslo (numerical user
ID).
Formát souboru je popsán v páté sekci manuálových stránek passwd
.
Pro testování můžete skriptu předhodit váš /etc/passwd
. Naše testy použijí
uměle vytvořená data, abychom vaše řešení otestovali pořádně.
Můžete předpokládat, že IDčka jsou jednoznačná a v souboru budou vždy alespoň dvě položky.
04/row_sum.sh
(40 bodů)
Předpokládejme, že mám matici zapsané v takovémhle “krásném” formátu. Můžete se spolehnout, že formát je pevně daný (s ohledem na mezery, maximálně trojciferné číslo a symbol pipe) ale může se lišit počet řádků i sloupců.
Napište skript, který sečte čísla v každém řádku.
Počítáme, že pro následující matici dostaneme tento výstup.
| 106 179 |
| 188 50 |
| 5 125 |
285
238
130
Skript bude vstup číst na stdinu, počet sloupců a řádků není nijak omezen (kromě celkového formátu).
Učební výstupy
Znalosti konceptů
Znalost konceptů znamená, že rozumíte významu a kontextu daného tématu a jste schopni témata zasadit do většího rámce. Takže, jste schopni …
-
vysvětlit, co je standardní výstup a vstup
-
vysvětlit, proč přesměrování standardního vstupu/výstupu není (přímo) viditelné uvnitř programu
-
vysvětlit, proč je standardní chybový výstup odlišný od standardního výstupu
-
vysvětlit, jak se liší
cat foo.txt
acat <foo.txt
-
vysvětlit, jak může být více programů používajících stdio složeno dohromady
-
vysvětlit co je návratový kód programu (exit code) a jak může být použit
-
vysvětlit rozdíly a typické využití pro pět hlavních rozhraní, které může využít CLI program: argumenty, stdin, stdout, stderr a návratová hodnota (exit code)
-
vysvětlit, co je to deskriptor souboru (z pohledu aplikace, nikoliv OS/kernelu) (volitelné)
Praktické dovednosti
Praktické dovednosti se obvykle týkají použití daných programů pro vyřešení různých úloh. Takže, dokážete …
-
přesměrovat standardní výstup a vstup CLI programů
-
používat speciální soubor
/dev/null
-
používat standardní vstup a výstup v Pythonu
-
používat pipe pro skládání programů
-
skládat programy pomocí && a || v shellových skriptech
-
používat základní filtry jako cut apod.
-
změnit návratovou hodnotu (exit code) pro Pythoní skripty
-
upravit si shell pomocí aliasů (volitelné)
-
upravit si konfiguraci shellu pomocí
.bashrc
a.profile
skriptů (volitelné)