Input, outputs and pipes (4) | Labs | NSWI177

Information below is not for the current semester. The current semester can be found here.

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Before class reading
I/O redirection in practice
Pipes (data streaming composition)
Using && and || (logical program composition)
Shell customization
More examples
Graded tasks (deadline: Mar 20)
Learning outcomes

The goal of this lab is to define and thoroughly understand the concepts of standard input, output, and standard error output. This would allow us to understand program I/O redirection and composition of different programs via pipes. We will also customize our shell environment a little by investigating command aliases and the .bashrc file.

Before class reading

This lab defines a lot of concepts that might be new to you. Please, do not skip the more theoretical parts as we will build on top of them in the following labs.

Standard input and outputs

Standard output

Standard output (often shortened to stdout) is the default output that you can use by calling print("Hello") if you are in Python, for example. Stdout is used by the basic output routines in almost every programming language.

Generally, this output has the same API as if you were writing to a file. Be it print in Python, System.out.print in Java or printf in C (where the limitations of the language necessitate the existence of a pair of printf and fprintf).

This output is usually prepared by the language runtime together with the shell and the operating system (the technical details are not that important for this course anyway). Practically, the standard output is printed to the terminal or its equivalent (and when the application is launched graphically, stdout is typically lost).

Note that in Python you can access it explicitly via sys.stdout that acts as an opened file handle (i.e., result of open).

Standard input

Similarly to stdout, almost all languages have access to stdin that represents the default input. By default, this input comes from the keyboard, although usually through the terminal (i.e., stdin is not used in graphical applications for reading keyboard input).

Note that the function input() that you may have used in your Python programs is an upgrade on top of stdin because it offers basic editing functions. Plain standard input does not support any form of editing (though typically you could use backspace to erase characters at the end of the line).

If you want to access the standard input in Python, you need to use sys.stdin explicitly. As one could expect, it uses a file API, hence it is possible to read a line from it calling .readline() on it or iterate through all lines.

In fact, the iteration of the following form is a quite common pattern for many Linux utilities (they are usually written in C but the pattern remains the same).

for line in sys.stdin:
    ...

Note that the above pattern actually works for any opened text file in Python and is the preferred way to read a textual file.

Standard I/O redirection

As a technical detail, we mentioned earlier that standard input and outputs are prepared (partially) by the operating system. This also means that it can be changed (i.e., initialized differently) without changing the program. And the program may not even “know” about it.

This is called redirection and it allows the user to specify that the standard output would not go to the screen (terminal), but rather to a file. From the point of view of the program, the API is still the same.

This redirection has to be done before the program is started and it has to be done by the caller. For us, it means we have to do it in the shell.

It is very simple: at the end of the command we can specify > output.txt and everything that would be normally printed on a screen goes to output.txt.

One big warning before you start experimenting (we will play with it during the labs, of course): the output redirection is a low-level operation and has no form of undo. Therefore, if the file you redirect to already exists, it will be overwritten without any prompting. And without any easy option to restore the original file content (and for small files, the restoration is technically impossible for most file systems used in Linux). As a precaution, get into a habit to hit <Tab> after you specify the filename. If the file does not exist, the cursor will not move. If the file already exists, the tab completion routine will insert a space.

As the simplest example, the following two commands will create files one.txt and two.txt with the words ONE and TWO inside (including the new line character at the end).

echo ONE > one.txt
echo TWO >two.txt

Note that the shell is quite flexible in the use of spaces and both options are valid (i.e., one.txt does not have a space as the first character in the filename).

From implementation point of view, echo received a single argument, the part with > filename is not passed to the program at all (i.e., do not expect to find > filename in your sys.argv).

If you know Python’s popen or a similar call, they also offer the option to specify which file to use for stdout if you want to do a redirection in your program (but only for a new program launched, not inside a running program).

If you recall Lab 02, we mentioned that the program cat is used to concatenate files. With the knowledge of output redirection, it suddenly starts to make more sense as the (merged) output can be easily stored in a file.

cat one.txt two.txt >merged.txt

There are more kinds of redirection. For example, you can use < to redirect stdin. Then, instead of reading input typed by the user on the keyboard, the program reads the input from a file.

Note that programs using Pythonic input() do not work that well with redirected input. Practically, input() is suitable for interactive programs only. You might want to use sys.stdin.readline() or for line in sys.stdin instead.

Filters

Many utilities in Linux work as so-called filters. They accept the input from stdin and print their output to stdout.

One such example is cut that can be used to print only certain columns from the input. For example, running it as cut -d: -f 1 above /etc/passwd will display a list of accounts (usernames) on the current machine.

Try to explain the difference between the following two calls:

cut -d: -f 1 </etc/passwd
cut -d: -f 1 /etc/passwd

The above behaviour is quite common for most filters: you can specify the input file explicitly, but when missing, the program reads from the stdin.

To return to the question above: the difference is that in the first case (with input redirection), the input file is opened by the shell and opened file is passed to cut. Problems in opening the file are reported by shell and cut might not be launched at all. In the second case, the file is opened by cut (i.e., cut executes the open() call and also needs to handle the error).

In your own filters, you should also follow this approach: the amount of source code you need to write is negligible, but it gives the user flexibility in use. The following snippet demonstrates how you can easily add the same behaviour to a program called rev.py that simply reverses the ordering of characters on each line (for brevity, we ignore file opening errors).

#!/usr/bin/env python3

import sys

def reverse(inp):
    for line in inp:
        print(line.rstrip('\n')[::-1])

def main():
    if len(sys.argv) == 1:
        reverse(sys.stdin)
    else:
        for filename in sys.argv[1:]:
            with open(filename, "r") as inp:
                reverse(inp)

if __name__ == '__main__':
    main()

Note that rev is also a standard Linux program.

Standard error output

While it often makes sense to redirect the output, you often want to see error messages still on the screen.

Imagine files one.txt and two.txt exist while nonexistent.txt is not in the directory. We will now execute the following command.

cat one.txt nonexistent.txt two.txt >merged.txt

Obviously, cat prints an error message when the file does not exist. However, with redirection the error message would be hidden.

That is not very practical: you definitely do not want to have the error message inside merged.txt and you want to know about the error.

Therefore, every Linux program also has a standard error output (often just stderr) that also goes to the screen but is logically different from stdout and is not subject to > redirection.

In Python, it is available as sys.stderr and it is (as sys.stdout) an opened file.

Extending our rev.py program, we would use stderr in the following way:

def main():
    if len(sys.argv) == 1:
        reverse(sys.stdin)
    else:
        for filename in sys.argv[1:]:
            try:
                with open(filename, "r") as inp:
                    reverse(inp)
            except FileNotFoundError as e:
                print("{}: No such file".format(filename), file=sys.stderr)

Standard error output is also often used for printing debugging information or for logging in general. The reasons should be obvious: the output always goes to the screen, thus it is a good way to inform the user about the progress (for example) and it does not change the real output of the program (as stdout is logically a different file).

Under the hood

Technically, opened files have so-called file descriptors that are used when an application communicates with the operating system (recall that file operations have to be done by the operating system). The file descriptor is an integer that serves as an index in a table of opened files that is kept for each process (i.e., running instance of a program).

This number – the file descriptor — is then passed to system calls which operate on the opened file. For example, write gets two arguments: an opened file descriptor and a byte buffer to write (in our examples, we will pass the string directly for simplicity). Therefore, when your application calls print("Message", file=some_file), eventually your program would call the operating system as write(3, "Message\n") where 3 denotes the file descriptor for the opened file represented by the some_file handle.

While the above may look like a technical detail, it will help you understand why the standard error redirection looks the way it does, or why file operations in most programming languages require opening the file first before writing to it (i.e., why write_to_file(filename, contents) is never a primitive operation).

In any unix-style environment, the file descriptors 0, 1, and 2 are always used for standard input, standard output, and standard error output, respectively. That is, the call print("Message") in Python eventually ends up in calling write(1, "Message\n") and a call to print("Error", file=sys.stderr) calls write(2, "Error\n").

This also explains why the output redirection is even possible – the caller (e.g., the shell) can simply open the file descriptor 1 to point to a different file as a normal application does not touch it at all.

The fact that stdout and stderr are logically different streams (files) also explains the word probably in one of the examples above. Even though they both end in the same physical device (the terminal), they may use a different configuration: typically, the standard output is buffered, i.e., output of your application goes to the screen only when there is enough of it while the standard error is not buffered – it is printed immediately. The reason is probably obvious – error messages should be visible as soon as possible, while normal output might be delayed to improve performance.

Note that the buffering policy can be more sophisticated, but the essential take away is that any output to the stderr is displayed immediately while stdout might be delayed.

Program return (exit) code

So far, the programs we have used announced errors as messages. That is quite useful for interactive programs as the user wants to know what went wrong.

However, for non-interactive use, checking for error messages is actually very error-prone. Error messages change, the users can have their system localized etc. etc. Therefore, Linux offers a different way of checking whether a program terminated correctly or not.

Whether a program terminates successfully or with a failure, is signalled by its so-called return (or exit) code. This code is an integer and unlike in other programming languages, zero denotes success and any non-zero value denotes an error.

Why do you think that the authors decided that zero (that is traditionally reserved for false) means success and nonzero (traditionally converted to true) means failure? Hint: in how many ways can a program succeed?

Unless specified otherwise, when your program terminates normally (i.e., main reaches the end and no exception is raised), the exit code is zero.

If you want to change this behaviour, you need to specify this exit code as a parameter to the exit function. In Python, it is sys.exit.

For C programs, the main function can either return void or int. With int, the return value from main() is actually the exit code.

You should always use the int variant and always specify a proper exit code.

As an example, the following is a modification of the rev above, this time with proper error checking and error code.

def main():
    exit_code = 0
    if len(sys.argv) == 1:
        reverse(sys.stdin)
    else:
        for filename in sys.argv[1:]:
            try:
                with open(filename, "r") as inp:
                    reverse(inp)
            except FileNotFoundError as e:
                print("{}: No such file".format(filename), file=sys.stderr)
                exit_code = 1
    sys.exit(exit_code)

We will later see that shell control flow (e.g. conditions and loops) is actually controlled by program exit codes.

Before class quiz

The quiz file is available in the 04 folder of this GitLab project.

Copy the right language mutation into your project as 04/before.md (i.e., you will need to rename the file).

The questions and answers are part of that file, fill in the answers in between the **[A1]** and **[/A1]** markers.

The before-04 pipeline on GitLab will test that your answers are in the correct format. It does not check for actual correctness (for obvious reasons).

Submit your before-class quiz before start of the next lab.

After class tasks

They will be published next Monday.

I/O redirection in practice

Prepare files one.txt and two.txt containing the words ONE, TWO respectively using echo and stdout redirection. Answer.

Merge (concatenate) these two files into merged.txt. Answer.

Appending to the end of a file

The shell also offers an option to append the output to an existing file using the >> operator. Thus, the following command would add UNO as another line into one.txt.

echo UNO >>one.txt

If the file does not exist, it will be created.

For the following example, we will need the program tac that reverses the order of individual lines but otherwise works like cat. Try this first.

tac one.txt two.txt

If you have executed the commands above, you should see the following:

UNO
ONE
TWO

Try the following and explain what happens (and why) if you execute

tac one.txt two.txt >two.txt

Answer.

Input redirection

Copy the rev program from above and run it like this:

Hint.

./rev.py <one.txt
./rev.py one.txt
./rev.py one.txt two.txt
./rev.py one.txt <two.txt

Has it behaved as you expected?

Trace which paths (i.e. through which lines) the program has taken with the above invocations.

Redirecting standard error output

To redirect the standard error output, you can use > again, but this time preceded by the number 2 (that denotes the stderr file descriptor).

Hence, our cat example can be transformed to the following form where err.txt would contain the error message and nothing would be printed on the screen.

cat one.txt nonexistent.txt two.txt >merged.txt 2>err.txt

Notable special files

We already mentioned several important files under /dev/. With output redirection, we can actually use some of them right away.

Run cat one.txt and redirect the output to /dev/full and then to /dev/null. What happened?

Especially /dev/null is a very useful file as it can be used in any situation when we are not interested in the output of a program.

For many programs you can specify the use of stdin explicitly by using - (dash) as the input filename.

Another option is to use /dev/stdin explicitly: with this name, we can make the example with rev work:

./rev.py /dev/stdin one.txt <two.txt

Then Python opens the file /dev/stdin as a file and operating system (together with shell) actually connects it with two.txt.

/dev/stdout can be used if we want to specify standard output explicitly (this is mostly useful for programs coming from other environments where the emphasis is not on using stdout that much).

Generic redirection

Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-than sign.

For example, >&2 specifies that the standard output is redirected to a standard error output. That may sound weird but consider the following mini-script.

Here, wget used to fetch file from given URL.

echo "Downloading tarball for lab 02..." >&2
wget https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/nswi177-lab02.tar.gz 2>/dev/null

We actually want to hide the progress messages of wget and print ours instead.

Take this as an illustration of the concept as wget can be silenced via command-line arguments (--quiet) as well.

Sometimes, we want to redirect stdout and stderr to one single file. In these situations simple >output.txt 2>output.txt would not work and we have to use >output.txt 2>&1 or &>output.txt (to redirect both at once). However, what about 2>&1 >output.txt, can we use it as well? Try it yourself! Hint.

Pipes (data streaming composition)

We finally move to the area where Linux excels: program composition. In essence, the whole idea behind Unix-family of operating systems is to allow easy composition of various small programs together.

Mostly, the programs that are composed together are filters and they operate on text inputs. These programs do not make any assumptions on the text format and are very generic. Special tools (that are nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML or JSON.

The advantage is that composing the programs is very easy and it is very easy to compose them incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable). This kind of incremental composition is more difficult in normal languages where printing data requires extra commands (here it is printed to the stdout without any extra work).

The disadvantage is that complex compositions can become difficult to read. It is up to the developer to decide when it is time to switch to a better language and process the data there. A typical division of labour is that shell scripts are used to preprocess the data: they are best when you need to combine data from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.

Needless to add, Linux offers plenty of tools for statistical computations or plot drawing utilities that can be controlled in CLI. Mastering of these tools is, unfortunately, out of topic for this course.

Motivation example

As a somewhat artificial example, we will consider the following CSV that can be downloaded from here.

These are actual data representing how long it took to copy the USB disk image to the USB drives in the library. The first column represents the device, the second duration of the copying.

As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some ports that are supposed to be the same are actually systematically slower.

disk,duration
/dev/sdb,1008
/dev/sdb,1676
/dev/sdc,1505
/dev/sdc,4115
...

We want to know what was the longest duration of the copying: in other words, the maximum of column two.

Well, we could use spreadsheet software for that, but we prefer to stay in the terminal. Among other reasons, we want a solution which is easily repeatable with other input files.

Recall that you have already seen the cut command that is able to extract specific columns from a file. There is also the command sort that sorts lines.

Thus our little script could look like this:

#!/bin/bash

cut -d, -f 2 <disk-speeds-data.csv >/tmp/disk_numbers.txt
sort </tmp/disk_numbers.txt

Prepare this script and run it.

The output is far from perfect: sort has sorted the lines alphabetically, not by numeric values. However, a quick glance at man sort later, we add -n (a.k.a. --numeric-sort) and re-execute the script.

This time, the last line of the output shows the maximum duration of 5769 seconds. Of course, all the other lines are useless, but we will fix that in a minute.

Let us focus on the temporary file first. There are two issues with it: First of all, it requires disk space for another copy of the (possibly huge) data. A bit more subtle but much more dangerous problem is that the path to the temporary file is fixed. Imagine what happens if you execute the script in two terminals concurrently. Do not be fooled by the feeling that the script so short that the probability of concurrent execution is negligible. It is a trap that is waiting to spring. We will talk about proper use of mktemp(1) later, but in this example no temporary file is needed at all. We can write:

cut -d, -f 2 <disk-speeds-data.csv | sort

The | symbol stands for a pipe, which connects the standard output of cut to the standard input of sort. The pipe passes data between the two processes without writing them to the disk at all. (Technically, the data are passed using memory buffers, but that is a technical detail.)

The result is the same, but we escaped the pitfalls of using temporary files and the result is actually even more readable. You can even move the first < before cut, so that the script can be read left-to-right like “take disk-speeds-data.csv, extract the second column, and then sort it”:

<disk-speeds-data.csv cut -d, -f 2 | sort

In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These transformations are composed together to produce the final result.

Finally, let us recall that we wanted to print only the biggest number. We can use the tail utility which prints only the last few lines of a file: by default 10, but you can ask for just one by adding -n 1. As pipelines are not limited to two programs, we can simply write:

cut '-d,' -f 2 | sort -n | tail -n 1

Note that we have removed the path to the input file from the script. Now, the user is supposed to run it like:

get-slowest.sh <disk-speeds-data.csv

This actually makes the script more flexible: it is easy to test such a script with different inputs and the script can be again used as a part of a bigger pipeline.

Using `&&` and `||` (logical program composition)

Execute the following commands:

ls / && echo "ls okay"
ls /nonexistent-filename || echo "ls failed"

This is an example of how return codes can be used in practice. We can chain commands to be executed only when the previous one failed or terminated with zero exit code.

Understanding the following is essential, because together with pipes and standard I/O redirection, it forms the basic building blocks of shell scripts.

First of all, we will introduce a syntax for conditional chaining of program calls.

If we want to execute one command only if the previous one succeeded, we separate them with && (i.e., it is a logical and) On the other hand, if we want to execute the second command only if the first one fails (in other words, execute the first or the second), we separate them with ||.

The example with ls is quite artificial as ls is quite noisy when an error occurs. However, there is also a program called test that is silent and can be used to compare numbers or check file properties. For example, test -d ~/Desktop checks that ~/Desktop is a directory. If you run it, nothing will be printed. However, in company with && or ||, we can check its result.

test -d .git && echo "We are in a root of a Git project"
test -f README.md || echo "README.md missing"

This could be used as a very primitive branching in our scripts. In the next lab, we will introduce proper conditional statements, such as if and while.

Note that test is actually a very powerful command – it does not print anything but can be used to control other programs.

It is possible to chain commands, && and || are left-associative and they have the same priority.

Compare the following commands and how they behave when in a directory where the file README.md is or is not present:

test -f README.md || echo "README.md missing" && echo "We have README.md"
test -f README.md && echo "We have README.md" || echo "README.md missing"

Failing fast

There is a caveat regarding pipes and success of commands: the success of a pipeline is determined by its last command. Thus, sort /nonexistent | head is a successful command. To make a failure of any command fail the (whole) pipeline, you need to run set -o pipefail in your script (or shell) before the pipeline.

Compare the behavior of the following two snippets.

sort /nonexistent | head && echo "All is well"

set -o pipefail
sort /nonexistent | head && echo "All is well"

In most cases, you want the second behavior.

Actually, you typically want the whole script to terminate if there is an unexpected failure. This means a failure, which was not tested by the && or || operator (or one of the conditional statements we meet in the next lab). Like an uncaught exception in Python.

For example, the following compound command is successful even though one of its components failed:

cat /nonexistent || echo "Oh well"

To enable terminate-on-failure, you need to call set -e. In case of failure, the shell will stop executing the script and exit with the same exit code as the failed command.

Furthermore, you usually want to terminate the script when an uninitiailized variable is used: that is enabled by set -u. (We will talk about variables later.)

Therefore, typically, you want to start your script with the following trio:

set -o pipefail
set -e
set -u

Many commands allow short options (such as -l or -h you know from ls) to be merged like this (note that -o pipefail has to be last):

set -ueo pipefail

Get into a habit where each of your scripts starts with this command.

Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.

Pitfalls of pipes (a.k.a. SIGPIPE)

set -ueo pipefail can cause unwanted and quite unexpected behaviour.

Following script terminates with a hard-to-explain error (note that the final hexdump is only to ensure we do not print garbage from /dev/urandom directly on the terminal).

set -ueo pipefail
cat /dev/urandom | head -n 1 | hexdump || echo "Pipe failed?"

Despite the fact that everything looks fine.

The reason comes from the head command. head has a very smart implementation that terminates after first -n lines were printed. Reasonable right? But that means that the first cat is suddenly writing to a pipe that no one reads. It is like writing to a file that was already closed. That generates an exception (well, kind of) and cat terminates with an error. Because of set -o pipefail, the whole pipeline fails.

The truth is that distinguishing whether the closed pipe is a valid situation that shall be handled gracefully or if it indicates an issue is impossible. Therefore cat terminates with an error (after all, someone just closed its output without letting it know first) and thus the shell has to mark the whole pipeline as failed.

Solving this is not always easy and several options are available. Each has its pros and cons.

When you know why this can occur, adding || true marks the pipeline as fine.

Shell customization

We already mentioned that you should customize your terminal emulator to be comfortable to use. After all, you will spend at least this semester with it and it should be fun to use.

In this lab, we will show some other options how to make your shell more comfortable to use.

Command aliases

You probably noticed that you execute some commands with the same options a lot. One such example could be ls -l -h that prints a detailed file listing, using human-readable sizes. Or perhaps ls -F to append a slash to the directories. And probably ls --color too.

Shell offers to create so-called aliases where you can easily add new commands without creating full-fledged scripts somewhere.

Try executing the following commands to see how a new command l could be defined.

alias l='ls -l -h`
l

We can even override the original command, the shell will ensure that rewriting is not a recursive.

alias ls='ls -F --color=auto'

Note that these two aliases together also ensure that l will display filenames in colors.

There are no spaces around the equal sign.

Some typical aliases that you will probably want to try are the following ones. Use a manual page if you are unsure what the alias does. Note that curl is used to retrieve contents from a URL and wttr.in is really a URL. By the way, try that command even if you do not plan to use this alias :-).

alias ls='ls -F --color=auto'
alias ll='ls -l'
alias l='ls -l -h'

alias cp='cp -i'
alias mv='mv -i'
alias rm='rm -i'

alias man='man -a'

alias weather='curl wttr.in'

`~/.bashrc`

Aliases above are nice, but you probably do not want to define them each time you launch the shell. However, most shells in Linux have some kind of file that they execute before they enter interactive mode. Typically, the file resides directly in your home directory and it is named after the shell, ending with rc (you can remember it as runtime configuration).

For Bash that we are using now (if you are using a different shell, you probably already know where to find its configuration files), that file is called ~/.bashrc.

You have already used it when setting EDITOR for Git, but you can also add aliases there. Depending on your distribution, you may already see some aliases or some other commands there.

Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.

The .bashrc file behaves as a shell script and you are not limited to have only aliases there. Virtually any commands can be there that you want to execute in every terminal that you launch.

More examples

The following examples can be solved either by executing multiple commands or by piping basic shell commands together. To help you find the right program, you can use manual pages. You can also use our manual as a starting point.

Create a directory a and inside it create a text file --help containing Lorem Ipsum. Print the content of this file and then delete it. Solution.

Create a directory called b and inside it create files called alpha.txt and *. Then delete the file called * and watch out what happened to the file alpha.txt. Solution.

Print the content of the file /etc/passwd sorted by the rows. Solution.

The command getent passwd USERNAME prints the information about user account USERNAME (e.g., intro) on your machine. Write a command that prints information about user intro or a message This is not NSWI177 disk if the user does not exist. Solution.

Print the first and third column of the file /etc/group. Solution.

Count the lines of the file /etc/services. Solution.

Print last two lines of the files /etc/passwd and /etc/group using a single command. Solution.

Recall the file disk-speeds-data.csv with the disk copying durations. Compute the sum of all durations. Solution.

Consider the following file format.

Alpha     8  4  5  0
Bravo    12  5  3  2
Charlie   1  0 11  4

Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to squeeze the spaces). Hint. Solution.

Print information about the last commit, when the script is executed in a directory that is not part of any Git project, the script shall print only Not inside a Git repository. Hint. Solution.

Print the contents of /etc/passwd and /etc/group separated by text Ha ha ha (i.e., contents of /etc/passwd, line with Ha ha ha and contents of /etc/group). Solution.

Graded tasks (deadline: Mar 20)

Do not forget a proper shebang and the executable bit.

IMPORTANT: all the following tasks must be solved using only pipes and && or || command composition. Use standard shell utilities and do not use shell ifs or whiles even if you know them (the purpose of these tasks is to exercise your knowledge of Linux filters).

`04/override.sh` (30 points)

The script will print to stdout contents of a file HEADER (in the working directory).

However, if a file .NO_HEADER exists in the current directory, nothing will be printed (even if HEADER exists).

If neither of the files exists, the program should print Error: HEADER not found. on standard error and terminate with exit status 1.

Otherwise, the script will terminate with success.

UPDATE: You can check for the existence of the files multiple times, you may safely assume that they will not change while your script is running. We have found a small bug in the test suite, please recheck that your solution still passes the tests.

`04/second_highest_uid.sh` (30 points)

Write a script that reads passwd-like formatted text on standard input and prints second highest numerical user ID.

Look up passwd entry in the fifth section of the manual pages to understand the format used.

For testing, feel free to feed it your /etc/passwd. Our tests will create artificial data for testing your solution more thoroughly.

You can safely assume that user IDs are unique and there will be always at least two entries in the file.

`04/row_sum.sh` (40 points)

Assume that you have a a matrix writen in a “fancy” notation. You can rely that the format is fixed (with regard to spacing, 3 digits maximum, position of pipe symbol etc.) but the number of columns or rows can differ.

Write a script that prints sum of each row.

We expect that for the following matrix we would get this output.

| 106 179 |
| 188  50 |
|   5 125 |

285
238
130

The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely on the fixed format as explained above.

Learning outcomes

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain what is standard output and input
explain why standard input/output redirection is not (directly) observable from inside the program
explain why standard error output is different from standard output
explain how execution of cat foo.txt and cat <foo.txt differs
explain how multiple programs using stdio can be merged together
explain what is program exit code and how it can be used
explain differences and typical uses for the main five interfaces a CLI program can use: program arguments, stdin, stdout, stderr and program exit code
explain what a file descriptor is (from the application developers’ perspective, not from the OS/kernel side) (optional)

Practical skills

Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …

redirect standard output and input of CLI programs
use special file /dev/null
use standard output/input in Python
use pipe to write simple composite programs
use composition operands && and || in shell scripts
use basic text filtering utilities such as cut, …
change program exit code for Python scripts
customize shell with aliases (optional)
customize shell configuration via .bashrc and .profile scripts (optional)

Note that this solution requires reading the input twice, hence we assume that the input is in score.txt.

<score.txt tr -s ' ' | cut -d ' ' -f 2- | tr ' ' '+' | bc | paste score.txt - | tr '\t' ' '

#!/usr/bin/env python3

import sys

def reverse(inp):
    for line in inp:
        print(line.rstrip('\n')[::-1])

def main():
    exit_code = 0
    if len(sys.argv) == 1:
        reverse(sys.stdin)
    else:
        for filename in sys.argv[1:]:
            try:
                with open(filename, "r") as inp:
                    reverse(inp)
            except FileNotFoundError as e:
                print("{}: No such file".format(filename), file=sys.stderr)
                exit_code = 1
    sys.exit(exit_code)

if __name__ == '__main__':
    main()