Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
- Running example
- Standard input and outputs
- Filters
- Pipes (data streaming composition)
- Writing your own filters
- Standard error output
- Under the hood (about file descriptors)
- Advanced I/O redirection
- Program return (exit) code
- Shell customization
- More examples
- Before-class tasks (deadline: start of your lab, week March 6 - March 10)
- Post-class tasks (deadline: March 26)
- Learning outcomes
- This page changelog
The goal of this lab is to define and thoroughly understand the concepts
of standard input, output, and standard error output.
This would allow us to understand program I/O redirection and composition
of different programs via pipes.
We will also customize our shell
environment a little by investigating command aliases and the .bashrc
file.
Running example
We will build this lab around a single example that we will incrementally develop, so that you learn the basic concepts on a practical example (obviously, there are specific tools that could be used instead, but we hope that this is better than a completely artificial example).
Data for our example can be downloaded (i.e., git clone
d) from
this repository
where they reside in the 04/
subdirectory.
They simulate simplified logs from a web server, where the web server records which files (URLs) were accessed at which time.
Practically, each file represents traffic for one day in a simplified CSV format.
Fields are separated by a comma, there is no header, and for each record we remember the date, the client’s IP address, the URL that was requested, and the amount of transferred bytes.
Our task is to write a program that prints a brief summary of the data:
- Print 3 most accessed URLs.
- Print 3 days with the highest volume of traffic (i.e., the sum of transferred bytes).
- Print total amount of data transferred.
Before we build the solution we need to lay some groundwork.
Standard input and outputs
We will start the lab with few definitions of concepts that you probably already know (but maybe not under exactly these names).
Standard output
Standard output (often shortened to stdout) is the default output that you can
use by calling print("Hello")
if you are in Python, for example.
Stdout is used by the basic output routines in almost every programming language.
Generally, this output has the same API as if you were writing to a file.
Be it print
in Python, System.out.print
in Java or printf
in C
(where the limitations of the language necessitate the existence of a pair of
printf
and fprintf
).
This output is usually prepared by the language runtime together with the shell and the operating system (the technical details are not that important for this course anyway). Practically, the standard output is printed to the terminal or its equivalent (and when the application is launched graphically, stdout is typically lost).
Note that in Python you can access it explicitly via sys.stdout
that
acts as an opened file handle (i.e., result of open
).
Standard input
Similarly to stdout, almost all languages have access to stdin that represents the default input. By default, this input comes from the keyboard, although usually through the terminal (i.e., stdin is not used in graphical applications for reading keyboard input).
Note that the function input()
that you may have used in your Python
programs is an upgrade on top of stdin because it offers basic editing
functions.
Plain standard input does not support any form of editing
(though typically you could use backspace to erase characters at the
end of the line).
If you want to access the standard input in Python, you need to use sys.stdin
explicitly.
As one could expect, it uses a file API, hence it is possible to read a line
from it calling .readline()
on it or to iterate through all lines.
In fact, the iteration of the following form is a quite common pattern for many Linux utilities (they are usually written in C but the pattern remains the same).
for line in sys.stdin:
...
Many of the utilities actually read from stdin by default.
For example, cut -d : -f 1
prints only the first column of data of each
line (and expects the columns to be delimited by :
).
Run it and type the following on the keyboard, terminating each line with <Enter>
.
cut -d : -f 1
one:two
alpha:bravo
uno:dos
You should see the first column echoed underneath your input.
What to do when you are done? Typing exit
will not help here but <Ctrl>-D
works.
Standard I/O redirection
As a technical detail, we mentioned earlier that the standard input and output are prepared (partially) by the operating system. This also means that it can be changed (i.e., initialized differently) without changing the program. And the program may not even “know” about it.
This is called redirection and it allows the user to specify that the standard output would not go to the screen (terminal), but rather to a file. From the point of view of the program, the API is still the same.
This redirection has to be done before the program is started and it has to be done by the caller. For us, it means we have to do it in the shell.
It is very simple: at the end of the command we can specify > output.txt
and
everything that would be normally printed on a screen goes to output.txt
.
Before you start experimenting: the output redirection is a low-level operation and has no form of undo. Therefore, if the file you redirect to already exists, it will be overwritten without questions. And without any easy option to restore the original file content (and for small files, the restoration is technically impossible for most file systems used in Linux).
As a precaution, get into a habit to hit <Tab>
after you specify the filename.
If the file does not exist, the cursor will not move.
If the file already exists, the tab completion routine will insert a space.
As the simplest example, the following two commands will create files one.txt
and
two.txt
with the words ONE
and TWO
inside (including the new line character at the end).
echo ONE > one.txt
echo TWO >two.txt
Note that the shell is quite flexible in the use of spaces and both options are valid
(i.e., one.txt
does not have a space as the first character in the filename).
From implementation point of view, echo
received a single argument, the part
with > filename
is not passed to the program at all
(i.e., do not expect to find > filename
in your sys.argv
).
If you recall Lab 02, we mentioned that
the program cat
is used to concatenate files.
With the knowledge of output redirection, it suddenly starts to make more sense as
the (merged) output can be easily stored in a file.
cat one.txt two.txt >merged.txt
Appending in output redirection
The shell also offers an option to append the output to an existing file
using the >>
operator.
Thus, the following command would add UNO
as another line into one.txt
.
echo UNO >>one.txt
If the file does not exist, it will be created.
For the following example, we will need the program tac
that reverses the order of
individual lines but otherwise works like cat
(note that tac
is cat
but backwards, what a cool name). Try this first.
tac one.txt two.txt
If you have executed the commands above, you should see the following:
UNO
ONE
TWO
Try the following and explain what happens (and why) if you execute
tac one.txt two.txt >two.txt
Answer.
Input redirection
Similarly, the shell offers <
for redirecting stdin.
Then, instead of reading input typed by the user on the keyboard, the program
reads the input from a file.
Note that programs using Pythonic input()
do not work that well with
redirected input.
Practically, input()
is suitable for interactive programs only.
You might want to use sys.stdin.readline()
or for line in sys.stdin
instead.
When input is redirected, we do not need to issue <Ctrl>-D
to close
the input as the input is closed automatically when reaching the end of the file.
Standard input and output: check you understand the basics
Filters
Many utilities in Linux work as so-called filters. They accept the input from stdin and print their output to stdout.
One such example is cut
that can be used to print only certain columns
from the input.
For example, running it as cut -d : -f 1
with /etc/passwd
as its input
will display a list of accounts (usernames) on the current machine.
Try to explain the difference between the following two calls:
cut -d : -f 1 </etc/passwd
cut -d : -f 1 /etc/passwd
The above behavior is quite common for most filters: you can specify the input file explicitly, but when it is missing, the program reads from the stdin.
To return to the question above: the difference is that in the first case
(with input redirection), the input file is opened by the shell and opened
file is passed to cut
.
Problems in opening the file are reported by shell and cut
might not be
launched at all.
In the second case, the file is opened by cut
(i.e., cut
executes the
open()
call and also needs to handle errors).
Advancing the running example
Armed with this knowledge, we can actually solve the first part of our running example. Recall that we have files that logged traffic each day and we want to find URLs that are most common in all the files together.
That means we need to join all files together, keep only the URL and find the three most frequent lines.
And we can do that. Recall that cat
can be used concatenate files and
cut
can be used to keep only certain columns. We will do finding the most
frequent URL in a while.
So, how about this?
#!/bin/bash
cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv >_logs_merged.csv
cut -d , -f 5 <_logs_merged.csv
We have used a quite explicit wildcard to ensure we do not print some
random CSVs even though cat logs/*.csv
could work as well.
Consider how much time this would take to write in Python.
The script has one big flaw (we will solve it soon but it needs to be mentioned anyway).
The script writes to a file called _logs_merged.csv
. We have prefixed
the filename with underscore to mark it as somewhat special but still:
what if the user created such file manually?
We would overwrite that file, no question asked. No option to recover.
Never do that in your scripts again.
Pipes (data streaming composition)
We finally move to the area where Linux excels: program composition. In essence, the whole idea behind Unix-family of operating systems is to allow easy composition of various small programs together.
Mostly, the programs that are composed together are filters and they operate on text inputs. These programs do not make any assumptions on the text format and are very generic. Special tools (that are nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML or JSON.
The advantage is that composing the programs is very easy and it is very easy to compose them incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable). This kind of incremental composition is more difficult in normal languages where printing data requires extra commands (here it is printed to the stdout without any extra work).
The disadvantage is that complex compositions can become difficult to read. It is up to the developer to decide when it is time to switch to a better language and process the data there. A typical division of labour is that shell scripts are used to preprocess the data: they are best when you need to combine data from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.
Needless to add, Linux offers a plenty of tools for statistical computations or plot drawing utilities that can be controlled by CLI. Mastering of these tools is, unfortunately, out of topic for this course.
Let us return to the running example again.
We already mentioned that the temporary file we used is bad because we might have overwritten someone elses data.
But it also requires disk space for another copy of the (possibly huge) data.
A bit more subtle but much more dangerous problem is that the path to the
temporary file is fixed.
Imagine what happens if you execute the script in two terminals concurrently.
Do not be fooled by the feeling that the script so short that the probability of
concurrent execution is negligible.
It is a trap that is waiting to spring.
We will talk about proper use of mktemp(1)
later, but in this example no temporary
file is needed at all.
We learned about program composition, right? And we can use it here.
cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5
The |
symbol stands for a pipe, which connects the standard output of cat
to the standard input of cut
. The pipe passes data between the two processes
without writing them to the disk at all. (Technically, the data are passed using
memory buffers, but that is a technical detail.)
The result is the same, but we escaped the pitfalls of using temporary files and the result is actually even more readable.
In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These transformations are composed together to produce the final result.
Advancing the running example a bit more
We wanted to print the three most visited URLs first.
Using the pipe above we can print all the URLs in a single list.
To find the most often visited ones we will use a typical trick where we
first sort the lines alphabetically and then use program uniq
with -c
to count unique lines (in effect counting how many times each URL was visited).
We then sort this output by the numbers and print first 3 lines.
Hence our program will evolve like this (lines starting with #
are obviously
comments).
# Get all URLs
cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5
# We will make the wildcard shorter to save space
cat logs/*.csv | cut -d , -f 5
# Sort URLs, have same URLs on adjoining lines
cat logs/*.csv | cut -d , -f 5 | sort
# Count number of occurrences (uniq does not sort the file)
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c
# Sort output of uniq numerically
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n
# Print last file lines only
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n | tail -n 3
Do not be scared. We advanced by little steps on each line. Run the individual commands yourself and watch how the output is transformed.
Exercise
Print the total amount of transferred bytes using the logs from our running example (i.e., the last part of the task).
Hint: you will need cat
, cut
, paste
and bc
.
First part should be easy: we are interested only in the last column.
cat logs/*.csv | cut -d , -f 4
To sum lines of numbers we will use paste
that is able to merge lines
from multiple files or join lines into a single file.
We will give it separator of +
to create a huge expression
SIZE1+SIZE2+SIZE3+...
.
cat logs/*.csv | cut -d , -f 4 | paste -s -d +
Finally, we will use bc
to sum the lines.
cat logs/*.csv | cut -d , -f 4 | paste -s -d + | bc
bc
alone is a quite powerful calculator than can be used interactively
too (recall that <Ctrl>-D
will terminate the input in interactive mode).
More examples are provided at the end of this lab.
Quick check of filters
Writing your own filters
Let us finish another part of the running example. We want to compute traffic for each day and print days with the most traffic.
Knowing how we composed things so far, we lack only the middle part of the pipeline. Summing the sizes for each day.
There is no ready-made solution for this (advanced users might consider
installing termsql
) but we will
create our own in Python and plug it into our pipeline.
We will try to make it simple yet versatile enough.
Recall we want to group the traffic by dates, hence our program should be able to do the following tranformation.
# Input
day1 1
day1 2
day2 4
day1 3
day2 1
# Output
day1 6
day2 5
Here is our version of the program. Notice that we have (for now) ignored error handling but allowed the program to be used as a filter in the middle of the pipeline (i.e., read from stdin when no arguments are provided) but also easily usable for multiple files.
In your own filters, you should also follow this approach: the amount of source code you need to write is negligible, but it gives the user flexibility in use.
#!/usr/bin/env python3
import sys
def sum_file(inp, results):
for line in inp:
(key, number) = line.split(maxsplit=1)
results[key] = results.get(key, 0) + int(number)
def main():
sums = {}
if len(sys.argv) == 1:
sum_file(sys.stdin, sums)
else:
for filename in sys.argv[1:]:
with open(filename, "r") as inp:
sum_file(inp, sums)
for key, sum in sums.items():
print(f"{key} {sum}")
if __name__ == "__main__":
main()
With such program in place, we can extend our web statistics script in the following manner.
cat logs/*.csv | cut -d , -f 1,4 | tr ',' ' ' | ./group_sum.py
Use man
to find out what tr
does.
On your own, extend the solution to print only the top 3 days
(sort
can order the lines using different columns than the whole line too).
Answer.
Standard error output
While it often makes sense to redirect the output, you often want to see error messages still on the screen.
Imagine files one.txt
and two.txt
exist while nonexistent.txt
is
not in the directory.
We will now execute the following command.
No, do not imagine it. Create the files one.txt
and two.txt
to contain words ONE
and
TWO
yourself on the command line.
Hint.
Answer.
cat one.txt nonexistent.txt two.txt >merged.txt
Obviously, cat
prints an error message when the file does not exist.
However, if the error message were printed to stdout, it would be redirected
to merged.txt
together with the actual output. This would not be practical.
Therefore, every Linux program also has a standard error output
(often just stderr) that also goes to the screen but is logically
different from stdout and is not subject to >
redirection.
In Python, it is available as sys.stderr
and it is (as sys.stdout
)
an opened file.
We can extend our implementation to handle I/O errors like this:
try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
Under the hood (about file descriptors)
The following text provides overview of file descriptors that are abstractions used by the OS and the application when working with opened files. Understanding this concept is not essential for this course but it is a general principle that (to some extent) is present in most operating systems and applications (or programming languages).
Advanced I/O redirection
Ensure you have the group_sum.py
script available.
Prepare files one.txt
and two.txt
:
echo ONE 1 > one.txt
echo ONE 1 > two.txt
echo TWO 2 >> two.txt
Now execute the following commands.
./group_sum.py <one.txt
./group_sum.py one.txt
./group_sum.py one.txt two.txt
./group_sum.py one.txt <two.txt
Has it behaved as you expected?
Trace which paths (i.e. through which lines) the program has taken with the above invocations.
Redirecting standard error output
To redirect the standard error output, you can use >
again, but this time preceded
by the number 2
(that denotes the stderr file descriptor).
Hence, our cat
example can be transformed to the following form where err.txt
would contain the error message and nothing would be printed on the screen.
cat one.txt nonexistent.txt two.txt >merged.txt 2>err.txt
Redirecting into and inside a script
Consider the following mini-script (first-column.sh
) that extracts and
sorts the first column (for colon-delimited data such as in /etc/passwd
).
#!/bin/bash
cut -d : -f 1 | sort
Then the user can use the script like this and cut
standard input would
be properly wired to the shell standard input or through the pipe.
cat /etc/passwd | ./first-column.sh
./first-column.sh </etc/passwd
head /etc/passwd | ./first-column.sh | tail -n 3
While the above example is somewhat artificial but it demonstrates the important principle that stdin is naturally available even inside scripts when redirected from the “outside”.
Generic redirection
Shell allows us to redirect outputs quite freely using file descriptor numbers before and after the greater-than sign.
For example, >&2
specifies that the standard output is redirected to a standard
error output.
That may sound weird but consider the following mini-script.
Here, wget
used to fetch file from given URL.
echo "Downloading tarball for lab 02..." >&2
wget https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/nswi177-lab02.tar.gz 2>/dev/null
We actually want to hide the progress messages of wget
and print ours instead.
Take this as an illustration of the concept as wget
can be silenced via
command-line arguments (--quiet
) as well.
Sometimes, we want to redirect stdout and stderr to one single file.
In these situations simple >output.txt 2>output.txt
would not work
and we have to use >output.txt 2>&1
or &>output.txt
(to redirect
both at once).
However, what about 2>&1 >output.txt
, can we use it as well?
Try it yourself!
Hint.
Notable special files
We already mentioned that virtually everything in Linux is a file.
Many special files representing devices are in /dev/
subdirectory.
Some of them are very useful for output redirection.
Run cat one.txt
and redirect the output to /dev/full
and then
to /dev/null
.
What happened?
Especially /dev/null
is a very useful file as it can be used in any
situation when we are not interested in the output of a program.
For many programs you can specify the use of stdin explicitly
by using -
(dash) as the input filename.
Another option is to use /dev/stdin
explicitly: with this name,
we can make the example with group_sum.py
work:
./group_sum.py /dev/stdin one.txt <two.txt
Then Python opens the file /dev/stdin
as a file and operating system
(together with shell) actually connects it with two.txt
.
/dev/stdout
can be used if we want to specify standard output explicitly
(this is mostly useful for programs coming from other environments where
the emphasis is not on using stdout that much).
Program return (exit) code
So far, the programs we have used announced errors as messages. That is quite useful for interactive programs as the user wants to know what went wrong.
However, for non-interactive use, checking for error messages is actually very error-prone. Error messages change, the users can have their system localized etc. etc. Therefore, Linux offers a different way of checking whether a program terminated correctly or not.
Whether a program terminates successfully or with a failure, is signalled by its so-called return (or exit) code. This code is an integer and unlike in other programming languages, zero denotes success and any non-zero value denotes an error.
Why do you think that the authors decided that zero (that is traditionally reserved for false) means success and nonzero (traditionally converted to true) means failure? Hint: in how many ways can a program succeed?
Unless specified otherwise, when your program terminates normally
(i.e., main
reaches the end and no exception is raised), the exit code is
zero.
If you want to change this behavior, you need to specify this exit code
as a parameter to the exit
function.
In Python, it is sys.exit
.
As an example, the following is a modification of the group_sum.py
above,
this time with proper exit code handling.
def main():
sums = {}
exit_code = 0
if len(sys.argv) == 1:
sum_file(sys.stdin, sums)
else:
for filename in sys.argv[1:]:
try:
with open(filename, "r") as inp:
sum_file(inp, sums)
except IOError as e:
print(f"Error reading file {filename}: {e}", file=sys.stderr)
exit_code = 1
for key, sum in sums.items():
print(f"{key} {sum}")
sys.exit(exit_code)
We will later see that shell control flow (e.g., conditions and loops) is actually controlled by program exit codes.
Failing fast
So far, we expected that our shell scripts will never fail. We have not prepared them for any kind of failure.
We will eventually see how exit codes can be tested and used to control our shell scripts more, but for now we want to stop whenever any failure occurs.
That is actually quite sane behavior: you typically want the whole program to terminate if there is an unexpected failure (rather than continuing with inconsistent data). Like an uncaught exception in Python.
To enable terminate-on-failure, you need to call set -e
. In case of failure,
the shell will stop executing the script and exit with the same exit code as
the failed command.
Furthermore, you usually want to terminate the script when an uninitialized variable is
used: that is enabled by set -u
.
We will talk about variables later but -e
and -u
are usually set together.
And there is also a caveat regarding pipes and success of commands: the success of a
pipeline is determined by its last command.
Thus, sort /nonexistent | head
is
a successful command. To make a failure of any command fail the (whole) pipeline, you
need to run set -o pipefail
in your script (or shell) before the pipeline.
Therefore, typically, you want to start your script with the following trio:
set -o pipefail
set -e
set -u
Many commands allow short options (such as -l
or -h
you know from ls
)
to be merged like this (note that -o pipefail
has to be last):
set -ueo pipefail
Get into a habit where each of your scripts starts with this command.
Actually, from now on, the GitLab pipeline will check that this command is a part of your scripts.
Pitfalls of pipes (a.k.a. SIGPIPE)
Exit code: check you understand the basics
Shell customization
We already mentioned that you should customize your terminal emulator to make it comfortable to use. After all, you will spend at least this semester with it and it should be fun to use.
In this lab, we will show some other options how to make your shell more comfortable to use.
Command aliases
You probably noticed that you execute some commands with the same options
a lot.
One such example could be ls -l -h
that prints a detailed file listing, using
human-readable sizes.
Or perhaps ls -F
to append a slash to the directories.
And probably ls --color
, too.
Shell offers to create so-called aliases where you can easily add new commands without creating full-fledged scripts somewhere.
Try executing the following commands to see how a new command l
could be
defined.
alias l='ls -l -h`
l
We can even override the original command, the shell will ensure that rewriting is not a recursive.
alias ls='ls -F --color=auto'
Note that these two aliases together also ensure that l
will display
filenames in colors.
There are no spaces around the equal sign.
Some typical aliases that you will probably want to try are the following
ones.
Use a manual page if you are unsure what the alias does.
Note that curl
is used to retrieve contents from a URL and wttr.in
is really
a URL.
By the way, try that command even if you do not plan to use this alias :-).
alias ls='ls -F --color=auto'
alias ll='ls -l'
alias l='ls -l -h'
alias cp='cp -i'
alias mv='mv -i'
alias rm='rm -i'
alias man='man -a'
alias weather='curl wttr.in'
~/.bashrc
Aliases above are nice, but you probably do not want to define them each time
you launch the shell.
However, most shells in Linux have some kind of file that they execute before
they enter interactive mode.
Typically, the file resides directly in your home directory and it is named after
the shell, ending with rc
(you can remember it as runtime configuration).
For Bash which we are using now (if you are using a different shell, you
probably already know where to find its configuration files), that file is
called ~/.bashrc
.
You have already used it when setting EDITOR
for Git, but you can also add
aliases there.
Depending on your distribution, you may already see some aliases or some
other commands there.
Add aliases you like there, save the file and launch a new terminal. Check that the aliases work.
The .bashrc
file behaves as a shell script and you are not limited to
have only aliases there.
Virtually any commands can be there that you want to execute in every
terminal that you launch.
Changing your prompt ($PS1
)
You can also modify how your prompt looks like. The default is usually reasonable but some people prefer more information in there. If you are one of those, here are the details (take it as an overview as prompt customization is a topic for a whole book).
More examples
The following examples can be solved either by executing multiple commands or by piping basic shell commands together. To help you find the right program, you can use manual pages. You can also use our manual as a starting point.
Note that none of the solutions requires anything else than using few pipelines.
For advanced users: definitely you do not need if
or while
or read
or
even using PERL
or AWK
.
Before-class tasks (deadline: start of your lab, week March 6 - March 10)
The following tasks must be solved and submitted before attending your lab. If you have lab on Wednesday at 10:40, the files must be pushed to your repository (project) at GitLab on Wednesday at 10:39 latest.
For virtual lab the deadline is Tuesday 9:00 AM every week (regardless of vacation days).
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of the tasks there are automated tests that can help you check completeness of your solution (see here how to interpret their results).
04/line_count.sh
(30 points, group shell
)
Count total number of lines of all text files (i.e., *.txt
) in current
directory.
The script will output only a single number.
You can assume that there will be always at least one such file present.
04/users.sh
(40 points, group admin
)
Print real names of users containing system
anywhere in their record
(i.e. the word system
appears anywhere on the line).
List of users is stored either in /etc/passwd
or via getent passwd
.
Your script will assume that the list of users will come on standard input.
Hence test it as getent passwd | 04/users.sh
.
04/fastest.sh
(30 points, group shell
)
Assume the following input format (durations are integers) containing program execution durations together with their authors.
name1,duration_in_seconds_1
name2,duration_in_seconds_2
Write author of the fastest solution (you can safely assume that the durations are distinct).
Post-class tasks (deadline: March 26)
We expect you will solve the following tasks after attending the labs and hearing feedback to your before-class solutions.
All tasks (unless explicitly noted otherwise) must be submitted to your submission repository. For most of the tasks there are automated tests that can help you check completeness of your solution (see here how to interpret their results).
04/row_sum.sh
(50 points, group shell
)
Assume that you have a a matrix writen in a “fancy” notation. You can rely that the format is fixed (with regard to spacing, 3 digits maximum, position of pipe symbol etc.) but the number of columns or rows can differ.
Write a script that prints sum of each row.
We expect that for the following matrix we would get this output.
| 106 179 |
| 188 50 |
| 5 125 |
285
238
130
The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely on the fixed format as explained above.
04/day_of_week.py
(50 points, group devel
)
Write a Python filter that converts date to day of week.
The program will convert dates in first column only (using whitespace for splitting), invalid dates will be ignored (and the line will be kept as-is). Rest of the column will copied to the output.
2023-02-20 Rest of the line
Some other line
2023-02-21 Line contents
Monday Rest of the line
Some other line
Tuesday Line contents
The program must be able launchable as:
04/day_of_week.py <input.txt
04/day_of_week.py input.txt
cat one.txt two.txt | 04/day_of_week.py
If the file cannot be opened, the program will print an error message to stderr (exact wording is defined by the tests) and will terminate with exit code 1.
You can expect that the program will not be invoked as 04/day_of_week.py one.txt two.txt
.
We expect you will use functions from the datetime module.
Learning outcomes
Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what is standard input and output
-
explain why standard input or output redirection is not (directly) observable from within the program
-
explain why there are two output streams: stdout and stderr
-
explain how execution of
cat foo.txt
andcat <foo.txt
differs -
explain how standard inputs/outputs of several programs can be chained together
-
explain what is program exit code
-
explain differences and typical uses for the main five interfaces of a command-line program: command-line arguments, stdin, stdout, stderr, and exit code
-
optional: explain what is a file descriptor (from the perspective of a userland developer)
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
redirect standard input and standard (error) output of a program in shell
-
set exit code of a Python script
-
use the special file
/dev/null
-
use standard input and output in Python
-
use the pipe
|
to chain multiple programs together -
use basic text filtering tools:
cut
,sort
, … -
use
grep -F
to filter lines matching provided pattern -
optional: customize shell script with aliases
-
optional: store custom shell configuration in
.bashrc
(or.profile
) scripts -
optional: customize prompt with the
PS1
variable
This page changelog
-
2023-02-25: Move task
04/users.sh
to theadmin
group. -
2023-03-03: Emphasize how stdin can be redirected into a script.