Lab #3 | NSWI177 Labs | D3S

Information below is not for the current semester. The current semester can be found here.

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

A script in the Linux environment is any program that is interpreted when being run (i.e., the program is distributed as a source code). In this sense, there are shell scripts (the language is the shell as you have seen it last time), Python, Ruby or PHP scripts.

The advantage of so-called scripting languages is that they do require only a text editor for development and that they are easily portable. Disadvantage is that you need to install the interpreter first. Fortunately, Linux typically comes with many interpreters preinstalled and starting with a scripting language is thus very easy.

We will build this lab around a shell script that we will incrementally develop, so that you learn the basic concepts on a practical example (obviously, there are specific tools that could be used instead, but we hope that this is better than a completely artificial example).

Preflight checklist

You have selected a nice Terminal emulator for yourself. Both on the school machine as well as in your private Linux installation.
You have selected a nice TUI text editor that you know how to control. Ensure it is available both in the lab and on your machine and it is configured correctly.
You can use mc or ranger for basic file operations.
You can use cd, pwd, ls, cat (and hexdump) to navigate through the file system and inspect files.

Running example

Data for our example can be downloaded from this project (they are inside 03 directory). Feel free to grab the whole repository as a tarball/ZIP file (links are under the blue Code button) and unpack it in mc.

They simulate simplified logs from a web server, where the web server records which files (URLs) were accessed at which time.

Practically, each file represents traffic for one day in a simplified CSV format.

Fields are separated by a comma, there is no header, and for each record we remember the date, the client’s IP address, the URL that was requested, and the amount of transferred bytes.

In reality, the data would be also compressed and would probably contain more details about the client (e.g., the browser used), but otherwise the data recorded represent a fairly typical web server log format.

Our task is to write a program that prints a brief summary of the data:

Print 3 most accessed URLs.
Print total amount of data transferred.
Print 3 days with the highest volume of traffic (i.e., the sum of transferred bytes).

But before we build the solution we need to lay some groundwork. And because there will be a lot of that, we will finish the third subtask during the next lab.

Shell scripts

To write a shell script, we simply write the commands into a file (instead of typing them in a terminal).

Therefore, a simple script that prints some information about your system could be as simple as the following.

cat /proc/cpuinfo
cat /proc/meminfo

If you store this into a file first.sh, then you can execute it with the following command.

bash first.sh

Notice that we have executed bash as that is the shell program (interpreter) that we are using and the name of the input file.

It will cat those two files (note that we could have executed a single cat with two arguments as well).

Recall that your 01/dayname.py script can be executed with the following command (again, we run the right interpreter).

python3 dayname.py

Shebang and executable bit

Running scripts by specifying the interpreter to use (i.e., the command to run the script file with) is not very elegant. There is an easier way: we mark the file as executable and Linux handles the rest.

Actually, when we execute the cat command or mc, there is a file (usually in the /bin or /usr/bin directory) that is named cat or mc and is marked executable. (For now, imagine the special executable mark as a special file attribute.) Notice that there is no file extension.

However, marking the file as executable is only the first half of the solution. Imagine that we create the following content and store it into a file hello.py marked as executable.

print("Hello")

And then we want to run it.

But wait! How will the system know which interpreter to use? For binary executables (e.g., originally from C sources), it is easy as the binary is (almost) directly in the machine code. But here we need an interpreter first.

In Linux, the interpreter is specified via so-called shebang or hashbang. As a matter of fact, you have already encountered it several times: When the first line of the script starts with #! (hence the name hash and bang), Linux expects a path to the interpreter after it and will run this interpreter and ask it to execute the script.

If there is no shebang, the behavior is not well defined.

The Linux kernel refuses to execute shebang-less scripts. But if you run them from the shell, the shell will try interpreting them as shell scripts. It is good practice not to rely on this behavior.

For shell scripts, we will be using #!/bin/bash, for Python we need to use #!/usr/bin/env python3. We will explain the env later on; for now, please just remember to use this version.

Refrain from using anything else than #!/usr/bin/env python3 for your Python scripts. #!/usr/bin/env python or #!/usr/bin/python3 are wrong and can cause various surprises.

Note that most interpreters use # to denote a comment which means that no extra handling is needed to skip the first line (as it is really not needed by the interpreter).

You will often encounter #!/bin/sh for shell scripts. For most scripts is actually does not matter: simple constructs work the same, but /bin/bash offers some nice extensions. We will be using /bin/bash in this course as the extensions are rather useful.

You may need to use /bin/sh if you are working on older systems or you need to have your script portable to different flavours of Unix systems.

To complicate things a bit more, on some systems /bin/sh is the same as /bin/bash as it is really a superset.

Bottom line is: unless you know what you are doing, stick with #!/bin/bash shebang for now.

Now back to the original question: how is the script executed. The system takes the command from the shebang, appends the actual filename of the script as a parameter, and runs that. When the user specifies more arguments (such as --version), they are appended as well.

For example, if hexdump were actually a shell script, it would start with the following:

#!/bin/bash

...
code-to-loop-over-bytes-and-print-them-goes-here
...

Executing hexdump -C file.gif would then actually execute the following command:

/bin/bash hexdump -C file.gif

Notice that the only magic thing behind shebang and executable files is that the system assembles a longer command line.

The user does not need to care about the implementation language.

Let us try it practically.

We know about the shebang, so we will update our example and also mark the file as an executable one.

Store the following into first.sh.

#!/bin/bash

cat /proc/cpuinfo
cat /proc/meminfo

To mark it as executable, we run the following command. For now, please, remember it as a magic that must be done, more details why it looks like this will come later.

chmod +x first.sh

chmod will not work on file systems that are not Unix/Linux-friendly. That unfortunately includes even NTFS.

GitLab web GUI does not offer any means for setting the executable bit. You need to use Git CLI client instead (that will come in the next lab).

Now we can easily execute the script with the following command:

./first.sh

The obvious question is: why the redundant ./ is needed instead of just calling first.sh? The ./ refers to the current directory after all, right (recall previous lab)? So it refers to the same file!

When you type a command (e.g., cat) without any path (i.e., only bare filename containing the program), shell looks into so-called $PATH to actually find the file with the program (usually, $PATH would contain directory /usr/bin where most of the executable binaries are stored). Unlike in other operating systems, shell does not look into the working directory when program cannot be found in the $PATH.

To run a program in the current directory, we need to specify its path (when any extra path is provided, shell ignores $PATH and simply looks for the file). Luckily, it does not have to be an absolute path, but a relative one is sufficient. Hence the magic spell of ./.

If you move to another directory, you can execute it by providing a relative path too, such as ../first.sh.

Run ls in the directory now. You should see first.sh now printed in green. If not, you can try ls --color or check that you have run chmod correctly.

If you do not have a colorful terminal (unusual but still possible), you can use ls -F to distinguish file types: directories will have a slash appended, executable files will have an asterisk next to their filename.

Mini-excercise

Create a script that prints all image files in current directory (for now, you can safely assume there will always be some). Try to run it from different directories using relative and absolute path.

Solution.

Create a script that prints information about currently visible disk partitions in the system. For now, it will only display contents of /proc/partitions.

Solution.

Changing working directory

Let us modify our first script a little bit.

cd /proc
cat cpuinfo
cat meminfo

Run the script again.

Despite the fact that the script changed directory to /proc, when it terminates, we are still in the original directory.

Try inserting pwd to ensure that the script really is inside /proc.

This is an essential take away – every process (running program; this includes scripts) has its own current directory. When it is started, it inherits the directory from its caller (e.g., from the shell it was run from). Then it can change the current directory, but that does not affect other processes in any way. Thus, when the program terminates, the caller is still in the same directory.

This also means that cd itself cannot be a normal binary. Because if it would be a normal program (e.g., in Python), any change inside it would be useless after its termination.

Hence, cd is a so called builtin that is implemented inside the shell itself.

Debugging the scripts

If you want to see what is happening, run the script as bash -x first.sh. Try it now. For longer scripts, it is better to print your own messages as -x tends to become too verbose.

To print a message to the terminal, you can use the echo command. With few exceptions (more about these later), all arguments are simply echoed to the terminal.

Create a script echos.sh with the following content and explain the differences:

#!/bin/bash

echo alpha bravo charlie
echo alpha  bravo   charlie
echo "alpha   bravo"   charlie

Answer.

Advancing our running example

We will now start working on our running example to prepare it.

For starters, create a version that simply echos the list of files we will work with.

Assume that the program will read files from the logs subdirectory. Do not forget to make your script executable and add the right shebang.

Solution.

Command-line arguments

Command-line arguments (such as -l for ls or -C for hexdump) are the usual way to control the behaviour of CLI tools in Linux. For us, as developers, it is important to learn how to work with them inside our programs.

We will talk about using these arguments in shell scripts later on, today we will handle them in Python.

Accessing these arguments in Python is very easy. We need to add import sys to our program and then we can access these arguments in the sys.argv list.

Therefore, the following program prints its arguments.

#!/usr/bin/env python3

import sys

def main():
    for arg in sys.argv:
        print("'{}'".format(arg))

if __name__ == '__main__':
    main()

When we execute it (of course, first we chmod +x it), we will see the following (lines prefixed with $ denote the command, the rest is command output).

$ ./args.py
'./args.py'
$ ./args.py one two
'./args.py'
'one'
'two'
$ ./args.py "one  two"
'./args.py'
'one  two'

Note that the zeroth index is occupied by the command itself (we will not use it now, but it can be used for some clever tricks) and notice how the second and third command differs from inside Python.

It should not be surprising though, recall the previous lab and handling of filenames with spaces in them.

Run the above command and give it a wildcard as a parameter. Assuming you already have some shell scripts with .sh extension, look at the behavior of the following invocations.

./args.py *.py
./args.py *.sh
./args.py *.shhhhhh

Recall previous lab if you are unsure what has happened. Hint.

Standard input and outputs

You probably know the following concepts already but maybe not under exactly these names, hence we will try to refresh your knowledge about them.

Standard output

Standard output (often shortened to stdout) is the default output that you can use by calling print("Hello") if you are in Python, for example. Stdout is used by the basic output routines in almost every programming language.

Quick check: how you print to stdout in shell? Answer.

Generally, this output has the same API as if you were writing to a file. Be it print in Python, System.out.print in Java or printf in C (where the limitations of the language necessitate the existence of a pair of printf and fprintf).

This output is usually prepared by the language runtime together with the shell and the operating system (the technical details are not that important for this course anyway). Practically, the standard output is printed to the terminal or its equivalent (and when the application is launched graphically, stdout is typically lost).

Note that in Python you can access it explicitly via sys.stdout that acts as an opened file handle (i.e., result of open).

Standard input

Similarly to stdout, almost all languages have access to stdin that represents the default input. By default, this input comes from the keyboard, although usually through the terminal (i.e., stdin is not used in graphical applications for reading keyboard input).

Note that the function input() that you may have used in your Python programs is an upgrade on top of stdin because it offers basic editing functions. Plain standard input does not support any form of editing (though typically you could use backspace to erase characters at the end of the line).

If you want to access the standard input in Python, you need to use sys.stdin explicitly. As one could expect, it uses a file API, hence it is possible to read a line from it calling .readline() on it or to iterate through all lines.

In fact, the iteration of the following form is a quite common pattern for many Linux utilities (they are usually written in C but the pattern remains the same).

for line in sys.stdin:
    ...

Note that the above pattern actually works for any opened text file in Python and it is the preferred way to read a textual file.

Many of the utilities actually read from stdin by default. For example, cut -d : -f 1 prints only the first column of data of each line (and expects the columns to be delimited by :).

Run it and type the following on the keyboard, terminating each line with <Enter>.

cut -d : -f 1

one:two
alpha:bravo
uno:dos

You should see the first column echoed underneath your input.

What to do when you are done? Typing exit will not help here but <Ctrl>-D works.

Pressing <Ctrl>-D on an empty line will close the standard input. The program cut will realize that there is no more input to process and will gracefully terminate. Note that this is something else than <Ctrl>-C which forcefully kills the running process. From the user’s perspective, these look similar in the context of the utility cut, but the behavior is totally different with important semantics difference (that can be observed when using other tools).

Standard I/O redirection

As a technical detail, we mentioned earlier that the standard input and output are prepared (partially) by the operating system. This also means that it can be changed (i.e., initialized differently) without changing the program. And the program may not even “know” about it.

This is called redirection and it allows the user to specify that the standard output would not go to the screen (terminal), but rather to a file. From the point of view of the program, the API is still the same.

This redirection has to be done before the program is started and it has to be done by the caller. For us, it means we have to do it in the shell.

It is very simple: at the end of the command we can specify > output.txt and everything that would be normally printed on a screen goes to output.txt.

Before you start experimenting: the output redirection is a low-level operation and has no form of undo. Therefore, if the file you redirect to already exists, it will be overwritten without questions. And without any easy option to restore the original file content (and for small files, the restoration is technically impossible for most file systems used in Linux).

As a precaution, get into a habit to hit <Tab> after you specify the filename. If the file does not exist, the cursor will not move. If the file already exists, the tab completion routine will insert a space.

As the simplest example, the following two commands will create files one.txt and two.txt with the words ONE and TWO inside (including the new line character at the end).

echo ONE > one.txt
echo TWO >two.txt

Note that the shell is quite flexible in the use of spaces and both options are valid (i.e., one.txt does not have a space as the first character in the filename).

From implementation point of view, echo received a single argument, the part with > filename is not passed to the program at all (i.e., do not expect to find > filename in your sys.argv).

If you know Python’s popen or a similar call, they also offer the option to specify which file to use for stdout if you want to do a redirection in your program (but only for a new program launched, not inside a running program).

If you recall Lab 02, we mentioned that the program cat is used to concatenate files. With the knowledge of output redirection, it suddenly starts to make more sense as the (merged) output can be easily stored in a file.

cat one.txt two.txt >merged.txt

Appending in output redirection

The shell also offers an option to append the output to an existing file using the >> operator. Thus, the following command would add UNO as another line into one.txt.

echo UNO >>one.txt

If the file does not exist, it will be created.

For the following example, we will need the program tac that reverses the order of individual lines but otherwise works like cat (note that tac is cat but backwards, what a cool name). Try this first.

tac one.txt two.txt

If you have executed the commands above, you should see the following:

UNO
ONE
TWO

Try the following and explain what happens (and why) if you execute

tac one.txt two.txt >two.txt

Answer.

Input redirection

Similarly, the shell offers < for redirecting stdin. Then, instead of reading input typed by the user on the keyboard, the program reads the input from a file.

Note that programs using Pythonic input() do not work that well with redirected input. Practically, input() is suitable for interactive programs only. You might want to use sys.stdin.readline() or for line in sys.stdin instead.

When input is redirected, we do not need to issue <Ctrl>-D to close the input as the input is closed automatically when reaching the end of the file.

Standard input and output: check you understand the basics

Select all true statements. You need to have enabled JavaScript for the quiz to work.

Filters

Many utilities in Linux work as so-called filters. They accept the input from stdin and print their output to stdout.

One such example is cut that can be used to print only certain columns from the input. For example, running it as cut -d : -f 1 with /etc/passwd as its input will display a list of accounts (usernames) on the current machine.

Try to run the following two commands (and notice the difference).

cut -d : -f 1 </etc/passwd
cut -d : -f 1 /etc/passwd

The above behavior is quite common for most filters: you can specify the input file explicitly, but when it is missing, the program reads from the stdin.

What is the difference between the two invocations above? They will print the same result, after all.

In the first case (with input redirection), the input file is opened by the shell and opened file is passed to cut. Problems in opening the file are reported by shell and cut might not be launched at all. In the second case, the file is opened by cut (i.e., cut executes the open() call and also needs to handle errors).

Advancing the running example

Armed with this knowledge, we can actually solve the first part of our running example. Recall that we have files that logged traffic each day and we want to find URLs that are most common in all the files together.

That means we need to join all files together, keep only the URL and find the three most frequent lines.

And we can do that. Recall that cat can be used concatenate files and cut can be used to keep only certain columns. We will do finding the most frequent URL in a while.

So, how about this?

#!/bin/bash

echo "Will look into the following files:" logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv

cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv >_logs_merged.csv
cut -d , -f 5 <_logs_merged.csv

We have used a quite explicit wildcard to ensure we do not print some random CSVs even though cat logs/*.csv could work as well.

Consider how much time this would take to write in Python.

The script has one big flaw (we will solve it soon but it needs to be mentioned anyway).

The script writes to a file called _logs_merged.csv. We have prefixed the filename with underscore to mark it as somewhat special but still: what if the user created such file manually?

We would overwrite that file, no question asked. No option to recover.

Never do that in your scripts.

You may also encounter variant where cut is called as cut -d, -f3. Most programs are smart enough to recognize both variants but it is important to remember that this is something that must be handled by each program.

That is, the program must be able to work with sys.argv[1] == '-d,' and with (sys.argv[1] == '-d') and (sys.argv[2] == ',').

Pipes (data streaming composition)

We finally move to the area where Linux excels: program composition. In essence, the whole idea behind Unix-family of operating systems is to allow easy composition of various small programs together.

Mostly, the programs that are composed together are filters and they operate on text inputs. These programs do not make any assumptions on the text format and are very generic. Special tools (that are nevertheless part of Linux software repositories) are needed if the input is more structured, such as XML or JSON.

The advantage is that composing the programs is very easy and it is very easy to compose them incrementally too (i.e., add another filter only when the output from the previous ones looks reasonable). This kind of incremental composition is more difficult in normal languages where printing data requires extra commands (here it is printed to the stdout without any extra work).

The disadvantage is that complex compositions can become difficult to read. It is up to the developer to decide when it is time to switch to a better language and process the data there. A typical division of labour is that shell scripts are used to preprocess the data: they are best when you need to combine data from multiple files (such as hundreds of various reports, etc.) or when the data needs to be converted to a reasonable format (e.g. non-structured logs from your web server into a CSV loadable into your favorite spreadsheet software or R). Computing statistics and similar tasks are best left to specialized tools.

Needless to add, Linux offers a plenty of tools for statistical computations or plot drawing utilities that can be controlled by CLI. Mastering of these tools is, unfortunately, out of topic for this course.

Let us return to the running example again.

We already mentioned that the temporary file we used is bad because we might have overwritten someone elses data.

But it also requires disk space for another copy of the (possibly huge) data.

A bit more subtle but much more dangerous problem is that the path to the temporary file is fixed. Imagine what happens if you execute the script in two terminals concurrently. Do not be fooled by the feeling that the script so short that the probability of concurrent execution is negligible. It is a trap that is waiting to spring. We will talk about proper use of mktemp(1) later, but in this example no temporary file is needed at all.

We learned about program composition, right? And we can use it here.

cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5

The | symbol stands for a pipe, which connects the standard output of cat to the standard input of cut. The pipe passes data between the two processes without writing them to the disk at all. (The data are passed using memory buffers, but that is a technical detail.)

The result is the same, but we escaped the pitfalls of using temporary files and the result is actually even more readable.

The pipe | connects standard output of the left-side program with standard input of the right-side program and shell/OS ensure that the data are flowing between the two programs.

The programs typically do not know that they are part of a pipe: stdout and stdin are prepared transparently by the system and the programs (or their developers) do not need to care about this.

For cases when the first command also reads from standard input another syntax is available. For example, this prints a sorted list of local user accounts (usernames).

cut -d : -f 1 </etc/passwd | sort

We can even move the first < before cut, so that the script can be read left-to-right like “take /etc/passwd, extract the first column, and then sort it”:

</etc/passwd cut -d : -f 1 | sort

In essence, the family of unix systems is built on top of the ability of creating pipelines, which chain a sequence of programs using pipes. Each program in the pipeline denotes a type of transformation. These transformations are composed together to produce the final result.

Advancing the running example a bit more

We wanted to print the three most visited URLs first.

Using the pipe above we can print all the URLs in a single list.

To find the most often visited ones we will use a typical trick where we first sort the lines alphabetically and then use program uniq with -c to count unique lines (in effect counting how many times each URL was visited). We then sort this output by the numbers and print first 3 lines.

In a Pythonic solution, you would probably create a dictionary, key being the URL, value being the counter (how many times the URL was accessed). And then print the keys with highest value. An ugly solution one might hack to make things work could look like this (and this expects that all files are already concatenated):

import sys

urls = {}
for line in map(lambda x: x.rstrip().split(',')[4], sys.stdin):
    urls[line] = urls.get(line, 0) + 1
how_many = 3
for url, count in sorted(urls.items(), key=lambda item: item[1], reverse=True):
    print("{:7} {}".format(count, url))
    how_many = how_many - 1
    if how_many <= 0:
        break

In shell our program will evolve like this (lines starting with # are obviously comments).

# Get all URLs
cat logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv | cut -d , -f 5

# We will make the wildcard shorter to save space
cat logs/*.csv | cut -d , -f 5

# Sort URLs, have same URLs on adjoining lines
cat logs/*.csv | cut -d , -f 5 | sort

# Count number of occurrences (uniq does not sort the file)
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c

# Sort output of uniq numerically (and in reverse)
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n -r

# Print first file lines only
cat logs/*.csv | cut -d , -f 5 | sort | uniq -c | sort -n -r | head -n 3

Do not be scared. We advanced by little steps on each line. Run the individual commands yourself and watch how the output is transformed.

Note how the shell solution is easier to debug (once you know the language): you build it little by little while in the Python script it requires extra prints (that you then need to remove) and the solution is much more tightly knotted that the shell one.

Exercise

Print the total amount of transferred bytes using the logs from our running example (i.e., the last part of the task).

Hint: you will need cat, cut, paste and bc.

First part should be easy: we are interested only in the last column.

cat logs/*.csv | cut -d , -f 4

To sum lines of numbers we will use paste that is able to merge lines from multiple files or join lines into a single file. We will give it separator of + to create a huge expression SIZE1+SIZE2+SIZE3+....

cat logs/*.csv | cut -d , -f 4 | paste -s -d +

Finally, we will use bc to sum the lines.

cat logs/*.csv | cut -d , -f 4 | paste -s -d + | bc

bc alone is a quite powerful calculator than can be used interactively too (recall that <Ctrl>-D will terminate the input in interactive mode).

More examples are provided at the end of this lab.

You now know basically everything about pipes. The rest of the magic is the knowledge of available filters (and a few corner cases).

It is like API in Python: the more you know it, the easier it is to build new programs.

Quick check of filters

Select all true statements. You need to have enabled JavaScript for the quiz to work.

Redirecting into and inside a script

Consider the following mini-script (first-column.sh) that extracts and sorts the first column (for colon-delimited data such as in /etc/passwd). Notice there is no input file specification.

#!/bin/bash

cut -d : -f 1 | sort

Then the user can use the script like this and cut standard input would be properly wired to the shell standard input or through the pipe.

cat /etc/passwd | ./first-column.sh
./first-column.sh </etc/passwd
head /etc/passwd | ./first-column.sh | tail -n 3

While the above example is somewhat artificial it demonstrates the important principle that stdin is naturally available even inside scripts when redirected from the “outside”.

More examples

The following examples can be solved either by executing multiple commands or by piping basic shell commands together. To help you find the right program, you can use manual pages. You can also use our manual as a starting point.

Note that none of the solutions requires anything else than using few pipelines. For advanced users: definitely you do not need if or while or read or even using PERL or AWK.

The first batch of examples also contains our solution so that you can compare it with yours.

The second batch does not contain solutions but automated tests are available.

Examples with complete solutions

Use the following CSV with data on how long it took to copy the USB disk image to the USB drives in the library. The first column represents the device, the second duration of the copying.

As a matter of fact, the first column also indirectly represents port of the USB hub (this is more by accident but it stems from the way we organized the copying). As a sidenote: it is interesting to see that some ports that are supposed to be the same are actually systematically slower.

We want to know what was the longest duration of the copying: in other words, the maximum of column two.

Solution.

Create a directory a and inside it create a text file --help containing Lorem Ipsum. Print the content of this file and then delete it. Solution.

Create a directory called b and inside it create files called alpha.txt and *. Then delete the file called * and watch out what happened to the file alpha.txt. Solution.

Print the content of the file /etc/passwd sorted by the rows. Solution.

Print the first and third column of the file /etc/group. Solution.

Count the lines of the file /etc/services. Solution.

Print last two lines of the files /etc/passwd and /etc/group using a single command. Solution.

Recall the file disk-speeds-data.csv with the disk copying durations. Compute the sum of all durations. Solution.

Consider the following file format.

Alpha     8  4  5  0
Bravo    12  5  3  2
Charlie   1  0 11  4

Append to each row sum of its line. You do not need to keep the original alignment (i.e., feel free to squeeze the spaces).

Hint.

Solution.

Print the contents of /etc/passwd and /etc/group separated by text Ha ha ha (i.e., contents of /etc/passwd, line with Ha ha ha and contents of /etc/group). Solution.

Print vendors of your CPU. Use the file /proc/cpuinfo as the starting point. Solution.

Examples with automated tests

Count total number of lines of all text files (i.e., *.txt) in current directory. The script will output only a single number.

You can assume that there will be always at least one such file present.

This example can be checked via GitLab automated tests. Store your solution as 03/line_count.sh and commit it (push it) to GitLab.

Print real names of users containing system anywhere in their record (i.e. the word system appears anywhere on the line).

List of users is stored either in /etc/passwd or via getent passwd. Your script will assume that the list of users will come on standard input.

Hence test it as getent passwd | 03/users.sh.

This example can be checked via GitLab automated tests. Store your solution as 03/users.sh and commit it (push it) to GitLab.

Assume the following input format (durations are integers) containing program execution durations together with their authors.

name1,duration_in_seconds_1
name2,duration_in_seconds_2

Write author of the fastest solution (you can safely assume that the durations are distinct).

This example can be checked via GitLab automated tests. Store your solution as 03/fastest.sh and commit it (push it) to GitLab.

Assume that you have a a matrix writen in a “fancy” notation. You can rely that the format is fixed (with regard to spacing, 3 digits maximum, position of pipe symbol etc.) but the number of columns or rows can differ.

Write a script that prints sum of each row.

We expect that for the following matrix we would get this output.

| 106 179 |
| 188  50 |
|   5 125 |

285
238
130

The script will read input from stdin, there is no limit on the amount of columns or rows but you can rely on the fixed format as explained above.

This example can be checked via GitLab automated tests. Store your solution as 03/row_sum.sh and commit it (push it) to GitLab.

Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain what is a script in a Linux environment
explain what is a shebang (hashbang) and how it influences script execution
understand the difference when script has or does not have executable bit set
explain what is a working directory
explain why working directory is private to a running program
explain how are parameters (arguments) passed in a script with a shebang
explain what is standard input and output
explain why standard input or output redirection is not (directly) observable from within the program
explain how execution of cat foo.txt and cat <foo.txt differs
explain how standard inputs/outputs of several programs can be chained together
optional: explain why cd cannot be a normal executable file like /usr/bin/ls

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

create a Linux script with correct shebang
set the executable script using the chmod utility
access command-line arguments in a Python program
redirect standard input and standard output of a program in shell
use standard input and output in Python
use the pipe | to chain multiple programs together
use basic text filtering tools: cut, sort, …
use grep -F to filter lines matching provided pattern

Note that this solution requires reading the input twice, hence we assume that the input is in score.txt.

<score.txt tr -s ' ' | cut -d ' ' -f 2- | tr ' ' '+' | bc | paste score.txt - | tr '\t' ' '

Recall that we had to add quotes to ls (for example) when a filename contained a space. Otherwise, ls accepts a list of files as arguments.

Here it is very similar – echo accepts a list of words to print (and prints them with a single space in between). If you need to print multiple spaces, you need to surround them with quotes as otherwise the blank space is parsed by the shell as a separator.

Hence, the output (we replaced spaces with ␣ for better clarity in the following dump):

alpha␣bravo␣charlie
alpha␣bravo␣charlie
alpha␣␣␣bravo␣charlie

Note that you may also substitute echo with args.py that we talk about later on to better understand what is happening under the hood.

#!/bin/bash

echo "Will look into the following files:" logs/20[0-9][0-9]-[01][0-9]-[0-3][0-9].csv

Alternatively, we can use the following commands instead:

#!/bin/bash

echo "Will look into the following files"
ls logs # Or use the wildcard here as well