Lab #11 | NSWI177 | D3S

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Please, see latest news in issue #234 (from May 22).

Preflight checklist
Unix-style access rights
Processes (and signals)
Files and storage management
File archiving and compression
find
Tasks to check your understanding
Learning outcomes and after class checklist
This page changelog

This lab will span several topics that are not very big. They are also only partially connected with each other so you might read the big sections in virtually any order you prefer.

This is probably the most theory-heavy lab: there are not that many things to actually try and practice but a lot of background-style information that might put other knowledge in better perspective.

This lab also contains a mini homework for two points.

Preflight checklist

You remember about shell wildcards.
You know that C strings are terminated with zero byte.

Unix-style access rights

So far we have (almost) ignored the fact that there are different user accounts on any Linux machine. And that users cannot access all files on the machine. In this section we will explain the basics of Unix-style access rights and how to interpret them.

After all, you can log in to a shared machine and you should be able to understand what you can access and what you cannot.

Recall what we said about /etc/passwd earlier – it contains the list of user accounts on that particular machine (technically, it is not the only source of user records, but it is a good enough approximation for now).

Every running application, i.e., a process, is owned by one of the users from /etc/passwd (again, we simplify things a little bit). We also say that the process is running under a specific user.

And every file in the filesystem (including both real files such as ~/.bashrc and virtual ones such as /dev/sda or /proc/uptime) has some owner.

When a process tries to read or modify a file, the operating system decides whether the operation is permitted. This decision is based on the owner of the file, the owner of the process, and permissions defined for the file. If the operation is forbidden, the input/output function in your program raises an exception (e.g., in Python), or returns an error code (in C).

Since a model based solely on owners would be too inflexible, there are also groups of users (defined in /etc/group). Every user is a member of one or more groups, one of them is called the primary group. These are associated with every process of the user. Files have both an owning user and an owning group.

Files are assigned three sets of permissions: one for the owner of the file, one for users in the owning group, and one for all other users. The exact algorithm for deciding which set will be used is this:

If the user running the process is the same as the owner of the file, owner access rights are used (sometimes also referred to as user access rights).
If the user running the process is in a group that was set on the file group access rights are used.
Otherwise, the system checks against other access rights.

Every set of permissions contains three rights: read (r), write (w), execute (x):

Read and write operations on a file are obvious.
The execute right is checked when the user tries to run the file as a program (recall that without chmod +x, the error message was in the sense of Permission denied: this is the reason).
- Note that a script which is readable, but not executable, can be still run by launching the appropriate interpreter manually.
- Note that when a program is run, the new process will inherit the owner and groups of its parent (e.g., of the shell that launched it). Ownership of the executable file itself is not relevant once the program was started. For example, /usr/bin/mc is owned by root, yet it can be launched by any user and the running process is then owned by the user who launched it.

The same permissions also apply to directories. Their meaning is a bit different, though:

Read right allows the user to list directory entries (files, symlinks, sub-directories, etc.).
Write right allows the user to create, remove, and rename entries inside that directory. Note that removing write permission from a file inside a writable directory is pointless as it does not prevent the user from overwriting the file completely with a new one.
Execute right on a directory allows the user to open the entries. (If a directory has x, but not r, you can use the files inside it if you know their names; however, you cannot list them. On the contrary, if a directory has r, but not x, you can only view the entries, but not use them.)

Permissions of a file or directory can be changed only by its owner, regardless of the current permissions. That is, the owner can deny access to themselves by removing all access rights, but can always restore them later.

`root` account

Apart from accounts for normal users, there is always an account for a so-called superuser – more often called simply just root – that has administrator privileges and is permitted to do anything with any file in the system. The permissions checks described above simply do not apply to root-owned processes.

Viewing and changing the permissions

Looking at the shortcuts of rwx for individual permissions, you may found them familiar:

drwxr-xr-x 1 intro intro 48 Feb 23 16:00 02/
drwxr-xr-x 1 intro intro 60 Feb 23 16:00 03/
-rw-r--r-- 1 intro intro 51 Feb 23 16:00 README.md

The very first column actually contains the type of the entry (d for directory, - for a plain file, etc.) and three triplets of permissions. The first triplet refers to owner of the file, the middle one to the group, and last to the rest of the world (other). The third and fourth columns contain the owner and the group of the file.

Typically, your personal files in your home directory will have you as the owner together with a group with the same name. That is a default configuration that prevents other users from seeing your files.

Do check that it is true for all directories under /home on the shared machine.

But also note that most of the files in your home directory are actually world-readable (i.e., anyone can read them).

That is actually quite fine because if you check permissions for your ~, you will see that it is typically drwx------. Only the owner can modify and cd to it. Since no one can actually change to your directory, no one will be able to read your files (technically, reading a file involves traversing the whole directory and checking access rights on the whole path).

To change the permissions, you can use chmod program. It has the general format of

chmod WHO[+=-]PERMISSION file1 file2 ...

WHO can be empty (for all three of user, group and others) or specification of u, g or o. And PERMISSION can be r, w or x.

It is possible to combine these, e.g. g-r,u=rw removes read permission for group and sets read-write for the owner (i.e. the file will not be executable by the owner regardless of a previous state of the executable bit).

Sticky and other bits

If you execute the following command, you will see a bit different output that you would probably expect.

ls -ld /usr/bin/passwd /tmp

drwxrwxrwt 23 root root   640 Mar  3 08:15 /tmp/
-rwsr-xr-x  1 root root 51464 Jan 27 14:47 /usr/bin/passwd*

You should have noticed that /tmp has t in place of an executable bit and passwd has s there.

Those are special variants of the executable bit. The t bit on a directory specifies that user can remove only their own files. The reason is obvious – you shall not be able to remove someone else’s files inside /tmp. Something that is otherwise impossible to specify with traditional (basic) permissions.

The s bit (set-uid) is a bit more tricky. It specifies that no matter who runs the shell, passwd will be running under the user owning the file (i.e., root for this file).

While it may look useless, it is a simple way to allow running certain programs with elevated (higher) permissions. passwd is a typical example. It allows the user to change their password. However, this password is stored in a file that is not readable by any user on the system except root (for obvious reasons). Giving the s bit to the executable means that the process would be running under root and would be able to modify the user database (i.e., /etc/passwd and /etc/shadow that contains the actual passwords).

Since changing the permissions can be done only by the owner of the file, there is no danger that a malicious user would add the s bit to other executables.

There are other nuances regarding Unix permissions and their setting, refer to chmod(1) for details.

Beyond traditional Unix permissions: POSIX ACL

The permission model described above is a rare example of a concept coming from Unix that is considered inflexible for use today. However, it is also considered as a typical example of a simple but well usable security model.

Many programs copied this model and you can encounter it in other places too. It is definitely something worth remembering and understanding.

The inflexibility of the system comes from the fact that allowing a set of users to access a particular file means creating a special group for these users. These groups are defined in /etc/group and changing them requires administrator privileges.

With an increasing number of users, the amount of possibly needed groups grows exponentially. On the other hand, for most situations, the basic Unix permissions are sufficient.

To tackle this problem, Linux offers also so-called POSIX access control lists where it is possible to assign an arbitrary list of users to any file to specify the permissions.

getfacl and setfacl are the utilities to control these rights but since these are practically needed rarely, we will leave their knowledge at the level of reading the corresponding manpages and acl(5).

Access rights checks

Let’s return a little bit to the access rights.

Change permission of some of your scripts to be --x. Try to execute them. What happens? Answer.

Remove writable bit for a file and write to it using stdout redirection. What happens?

Access rights quiz

Assuming the following output of ls -l (script.sh is really a shell script) and assuming user bob is in group wonderland while user lewis is not.

-rwxr-xr-- 1 alice wonderland 1234 Feb 20 11:11 script.sh

Select all true statements.

You need to have enabled JavaScript for the quiz to work.

Processes (and signals)

Files in the system are the passive elements of it. The active parts are running programs that actually modify data. Let us have a look what is actually running on our machine.

When you start a program (i.e., an executable file), it becomes a process. The executable file and a running process share the code – it is the same in both. However, the process also contains the stack (e.g., for local variables), heap, current directory, list of opened files etc. etc. – all this is usually considered a context of the process. Often, the phrases running program and process are used interchangeably.

To view the list of running processes on our machine, we can use htop to view basic properties of processes. Similar to MidnightCommander, function keys perform the most important actions and the help is visible in the bottom bar. You can also configure htop to display information about your system like amount of free memory or CPU usage.

For non-interactive use we can execute ps -e (or ps -axufw for a more detailed list).

For illustration here, this is an example of ps output (with --forest option use to depict also the parent/child relation).

However, run ps -ef --forest on the shared machine to also view running processes of your colleagues.

Listing of processes is not protected in any way from other users. Every user on a particular machine can see what other users are running (including command-line arguments).

Keep in mind to never pass passwords as command line arguments and pass them always through files (with proper permissions) or interactively on stdin.

UID          PID    PPID  C STIME TTY          TIME CMD
root           2       0  0 Feb22 ?        00:00:00 [kthreadd]
root           3       2  0 Feb22 ?        00:00:00  \_ [rcu_gp]
root           4       2  0 Feb22 ?        00:00:00  \_ [rcu_par_gp]
root           6       2  0 Feb22 ?        00:00:00  \_ [kworker/0:0H-events_highpri]
root           8       2  0 Feb22 ?        00:00:00  \_ [mm_percpu_wq]
root          10       2  0 Feb22 ?        00:00:00  \_ [rcu_tasks_kthre]
root          11       2  0 Feb22 ?        00:00:00  \_ [rcu_tasks_rude_]
root           1       0  0 Feb22 ?        00:00:09 /sbin/init
root         275       1  0 Feb22 ?        00:00:16 /usr/lib/systemd/systemd-journald
root         289       1  0 Feb22 ?        00:00:02 /usr/lib/systemd/systemd-udevd
root         558       1  0 Feb22 ?        00:00:00 /usr/bin/xdm -nodaemon -config /etc/X11/...
root         561     558 10 Feb22 tty2     22:42:35  \_ /usr/lib/Xorg :0 -nolisten tcp -auth /var/lib/xdm/...
root         597     558  0 Feb22 ?        00:00:00  \_ -:0
intro        621     597  0 Feb22 ?        00:00:40      \_ xfce4-session
intro        830     621  0 Feb22 ?        00:05:54          \_ xfce4-panel --display :0.0 --sm-client-id ...
intro       1870     830  4 Feb22 ?        09:32:37              \_ /usr/lib/firefox/firefox
intro       1966    1870  0 Feb22 ?        00:00:01              |   \_ /usr/lib/firefox/firefox -contentproc ...
intro       4432     830  0 Feb22 ?        01:14:50              \_ xfce4-terminal
intro       4458    4432  0 Feb22 pts/0    00:00:11                  \_ bash
intro     648552    4458  0 09:54 pts/0    00:00:00                  |   \_ ps -ef --forest
intro      15655    4432  0 Feb22 pts/4    00:00:00                  \_ bash
intro     639421  549293  0 Mar02 pts/8    00:02:00                      \_ man ps
...

First of all, each process has a process ID, often just PID (but not this one). The PID is a number assigned by the kernel and used by many utilities for process management. PID 1 is used by the first process in the system, which is always running. (PID 0 is reserved as a special value – see fork(2) if you are interested in details.) Other processes are assigned their PIDs incrementally (more or less) and PIDs are eventually reused.

As an important technical detail: process with PID 1 is the first process and thus it controls the rest of the system (because it indirectly spawns every other program).

If you run /bin/sh as PID 1, you will have only shell on your system and nothing else. That might be extremely useful for some special maintenance (look for init= in kernel parameters documentation).

Typically, PID 1 will be something like /lib/systemd/systemd that controls other services (recall lab 08).

Note that all this information is actually available in /proc/PID/ and that is where ps reads its information from.

Execute ps -ef --forest again to view all process on your machine. Because of your graphical interface, the list will be probably quite long.

Practically, a small server offering web pages, calendar and SSH access can have about 80 processes, for a desktop running Xfce with browser and few other applications, the number will rise to almost 300 (this really depends a lot on the configuration but it is a ballpark estimate). About 50–60 of these are actually internal kernel threads. In other words, a web/calendar server needs about 20 “real” processes, a desktop about 200 of them :-).

Quick check on processes

Select all true statements. You need to have enabled JavaScript for the quiz to work.

Signals

Linux systems use the concept of signals to communicate asynchronously with a running program (process). The word asynchronously means that the signal can be sent (and delivered) to the process regardless of its state. Compare this with communication via standard input (for example), where the program controls when it will read from it (by calling appropriate I/O read function).

However, signals do not provide a very rich communication channel: the only information available (apart from the fact that the signal was sent) is the signal number. Most signal numbers are defined by the kernel, which also handles some signals by itself. Otherwise, signals can be received by the application and acted upon. If the application does not handle the signal, it is processed in the default way. For some (most) signals, the default is terminating the application; other signals are ignored by default.

This is actually expressed in the fact that the utility used to send signals is called kill (see below).

By default, the kill utility sends signal 15 (also called TERM or SIGTERM) that instructs the application to terminate. An application may decide to catch this signal, flush its data to the disk etc., and then terminate. But it can do virtually anything and it may even ignore the signal completely. Apart from TERM, we can instruct kill to send the KILL signal (number 9) which is handled by kernel itself. It immediately and forcefully terminates the application. The application may try to register to receive notification for the KILL signal but kernel will never deliver it.

Many other signals are sent to the process in reaction to a specific event. For example, the signal PIPE is sent when a process tries to write to a pipe, whose reading end was already closed – the “Broken pipe” message you already saw is printed by the shell if the command was terminated by this signal. Terminating a program by pressing Ctrl-C in the terminal actually sends the INT (interrupt, number 2) signal to it. If you are curious about the other signals, see signal(7).

For example, when the system is shutting down, it sends TERM to all its processes. This gives them a chance to terminate cleanly. Processes which are still alive after some time are killed forcefully with KILL.

Use of `kill`, `pkill` and `pgrep`

The pgrep command can be used to find processes matching a given name.

Open two extra terminals and run sleep 600 in one and sleep 800 in the second one. The sleep program simply waits given amount of seconds before terminating.

In a third terminal, run the following commands to understand how the searching for the processes is done.

pgrep sleep
pgrep 600
pgrep -f 600

What have you learnt? Answer.

When we know the PID, we can use the kill utility to actually terminate the program. Try running kill PID with PID of one of the sleeps and look what happened in the terminal with sleep.

You should see something like this:

Terminated (SIGTERM).

This message informs us that the command was forcefully terminated.

Similarly, you can use pkill to kill processes by name (but be careful as with great power comes great responsibility). Consult the manual pages for more details.

There is also killall command that behaves similarly. On some Unix systems (e.g., Solaris), this command has completely different semantics and is used to shut down the whole machine.

Reacting to signals in shell

Reaction to signals in shell is done through a trap command.

Note that a typical action for a signal handler in a shell script is clean-up of temporary files.

#!/bin/sh

set -ueo pipefail

on_interrupt() {
    echo "Interrupted, terminating ..." >&2
    exit 17
}

on_exit() {
    echo "Cleaning up..." >&2
    rm -f "$my_temp"
}

my_temp="$( mktemp )"

trap on_interrupt INT TERM
trap on_exit EXIT

echo "Running with PID $$"

counter=1
while [ "$counter" -lt 10 ]; do
    date "+%Y-%m-%d %H:%M:%S | Waiting for Ctrl-C (loop $counter) ..."
    echo "$counter" >"$my_temp"
    sleep 1
    counter=$(( counter + 1 ))
done

The command trap receives as the first argument the command to execute on the signal. Other arguments list the signals to react to. Note that a special signal EXIT means normal script termination as well as termination as a default action of other signals. Hence, we do not need to call on_exit after the loop terminates.

We use exit 17 to report termination through the Ctrl-C handler (the value is arbitrary by itself).

Feel free to check the return value with echo $? after the command terminates. The special variable $? contains the exit code of the last command.

If your shell script starts with set -e, you will rarely need $? as any non-zero value will cause script termination.

However, following construct prevents the termination and allows you to branch your code based on exit value if needed.

set -e

...
# Prevent termination through set -e
rc=0
some_command_with_interesting_exit_code || rc=$?
if [ $rc -eq 0 ]; then
    ...
elif [ $rc -eq 1 ]; then
    ...
else
    ...
fi

Using - (dash) instead of the handler causes the respective handler to be set to default.

Your shell scripts shall always include a signal handler for clean-up of temporary files.

Note the use of $$ which prints the current PID.

Run the above script, note its PID and run the following in a new terminal.

kill THE_PID_PRINTED_BY_THE_ABOVE_SCRIPT

The script was terminated and the clean-up routine was called. Compare with situation when you comment out the trap command.

Run the script again but pass -9 to kill to specify that you want to send signal nine (i.e., KILL).

What happened? Answer.

While signals are a rudimentary mechanism, which passes binary events with no additional data, they are the primary way of process control in Linux.

If you need a richer communication channel, you can use D-Bus instead.

Reasonable reaction to basic signals is a must for server-style applications (e.g., a web server should react to TERM by completing outstanding requests without accepting new connections, and terminating afterwards). In shell scripts, it is considered good manners to always clean up temporary files.

See the signal library for signal handling in Python.

Deficiencies in signal design and implementation

Signals are the rudimentary mechanism for interprocess communication on Unix systems. Unfortunately their design has several flaws that complicate their safe usage.

We will not dive into details but you should bear in mind that signal handling can be tricky in situations where you cannot afford to lose any signal or when signals can come quickly one after another. And there is a whole can of worms when using signals in multithreaded programs.

On the other hand, for simple shell scripts where we want to clean-up on forceful termination the pattern we have shown above is sufficient. It guards our script when user hits Ctrl-C because they realized that it is working on wrong data or something similar.

But note that it contains a bug for the case when the user hits Ctrl-C very early during script execution.

my_temp="$( mktemp )"
# User hits Ctrl-C here
trap on_interrupt INT TERM
trap on_exit EXIT

The temporary file was already created but the handler was not yet registered and thus the file will not be removed. But changing the order complicates the signal handler as we need to test that $my_temp was already initialized.

But the fact that signals can be tricky does not mean that we should abandon the basic means of ensuring that our scripts clean-up after themselves even when they are forcefully terminated.

In other programming languages the clean-up is somewhat simpler because it is possible to create a temporary file that is always automatically removed once a process terminates.

It relies on a neat trick where we can open a file (create it) and immediately remove it. However, as long as we keep the file descriptor (i.e., the result of Pythonic open) the system keeps the file contents intact. But the file is already gone (in the sense of the link to its content) and closing the file removes it completely.

Because shell is based on running multiple processes, the above trick does not work for shell scripts.

Quick check about signals

Select all correct statements about signals and processes. You need to have enabled JavaScript for the quiz to work.

Files and storage management

Before proceeding, recall that files reside on file systems that are the structures on the actual block devices (typically, disks).

Working with file systems and block devices is necessary when installing a new system, rescuing from a broken device, or simply checking available free space.

You are already familiar with normal files and directories. But there are other types of files that you can find on a Linux system.

Symbolic links

Linux allows to create a symbolic link to another file. This special file does not contain any content by itself and merely points to another file.

An interesting feature of a symbolic link is that it is transparent to the standard file I/O API. If you call Pythonic open on a symbolic link, it will transparently open the file the symbolic link points to. That is the intended behavior.

The purpose of symbolic links is to allow different perspectives on the same files without need for any copying and synchronization.

For example, a movie player is able to play only files in directory Videos. However, you actually have the movies elsewhere because they are on a shared hard drive. With the use of a symbolic link, you can make Videos a symbolic link to the actual storage and make the player happy. (For the record, we do not know about any movie player with such behaviour, but there are plenty of other programs where such magic can make them work in a complex environment they were not originally designed for.)

Note that a symbolic link is something else than what you may know as Desktop shortcut or similar. Such shortcuts are actually normal files where you can specify which icon to use and also contain information about the actual file. Symbolic links operate on a lower level.

To create a symbolic link, run ln -s. For example, running the following will create a symlink to /etc/passwd named users.txt. Note that running cat users.txt will open users.txt even though the kernel will supply the contents of /etc/passwd.

ln -s /etc/passwd users.txt

The strace programs prints information about executed system calls – i.e. when application needed a service from the operating system call.

The following will run cat users.txt but will print the system calls to stderr. Intercepting those related to open will show that the application will not know if it is opening a normal file or a symbolic link (see the last line of the output below).

strace cat users.txt 2>&1 | grep 'open.*"'

openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "users.txt", O_RDONLY) = 3

Apart from symbolic links it is also possible to create so-called hard-links. While symbolic links are special kind of files, hard-links are one level lower. A hard-link means that a certain filename points to the same data region (structure) on a disk as a different filename (unlike a symbolic link that internally contains one level of indirection).

Hard-links are covered in Linux administration course.

Special files

There are also other special files that represent physical devices or files that serve as a spy-hole into the state of the system.

The reason is that it is much simpler for the developer that way. You do not need special utilities to work with a disk, you do not need a special program to read the amount of free memory. You simply read the contents of a well-known file and you have the data.

It is also much easier to test such programs because you can easily give them mock files by changing the file paths – a change that is unlikely to introduce a serious bug into the program.

Usually Linux offers the files that reveal state of the system in a textual format. For example, the file /proc/meminfo can look like this:

MemTotal:        7899128 kB
MemFree:          643052 kB
MemAvailable:    1441284 kB
Buffers:          140256 kB
Cached:          1868300 kB
SwapCached:            0 kB
Active:           509472 kB
Inactive:        5342572 kB
Active(anon):       5136 kB
Inactive(anon):  5015996 kB
Active(file):     504336 kB
Inactive(file):   326576 kB
...

This file is nowhere on the disk but when you open this path, Linux creates the contents on the fly.

Notice how the information is structured: it is a textual file, so reading it requires no special tools and the content is easily understood by a human. On the other hand, the structure is quite rigid: each line is a single record, keys and values are separated by a colon. Easy for machine parsing as well.

File system hierarchy

We will now briefly list some of the key files you can find on virtually any Linux machine.

Do not be afraid to actually display contents of the files we mention here. hexdump -C is really a great tool.

/boot contains the bootloader for loading the operating system. You would rarely touch this directory once the system is installed.

/dev is a very special directory where hardware devices have their file counterparts. You will probably see there a file sda or nvme0 that represents your hard (or SSD) drive. Unless you are running under a superuser, you will not have access to these files, but if you would hexdump them, you would see the bytes as they are on the actual hard drive.

And writing to such files would overwrite the data on your drive!

The fact is that disk utilities in Linux accept paths to the disk drives they will operate on. Thus it is very easy to give it a file and pretend that it is a disk to be formatted. That can be used to create disk images or for file recovery. And it greatly simplifies the testing of such tools because you do not need to have a real disk for testing.

It is important to note that these files are not physical files on your disk (after all, it would mean having a disk inside a disk). When you read from them, the kernel recognizes that and returns the right data.

This directory also contains several special but very useful files for software development.

/dev/urandom returns random bytes indefinitely. It is probably internally used inside your favorite programming language to implement its random() function. Try to run hexdump on this file (and recall that <Ctrl>-C will terminate the program once you are tired of the randomness).

/dev/null is your local black hole: it discards everything written to it.

There is also /dev/full that emulates a full disk or /dev/zero that supplies an infinite stream of zero bytes.

/etc/ contains system-wide configuration. Typically, most programs in UNIX systems are configured via text files. The reasoning is that an administrator needs to learn only one tool – a good text editor – for system management. The advantage is that most configuration files have support for comments and it is possible to comment even on the configuration. For an example of such a configuration file, you can have a look at /etc/systemd/system.conf to get the feeling.

Perhaps the most important file is /etc/passwd that contains a list of user accounts. Note that it is a plain text file where each row represents one record and individual attributes are simply separated by a colon :. Very simple to read, very simple to edit, and very simple to understand. In other words, the KISS principle in practice.

/home contains home directories for normal user accounts (i.e., accounts for real – human – users).

/lib and /usr contain dynamic libraries, applications, and system-wide data files.

/var is for volatile data. If you would install a database or a web server on your machine, its files would be stored here.

/tmp is a generic location for temporary files. This directory is automatically cleaned at each reboot, so do not use it for permanent storage. Many systems also automatically remove files which were not modified in the last few days.

/proc is a virtual file system that allows controlling and reading of kernel (operating system) settings. For example, the file /proc/meminfo contains quite detailed information about RAM usage.

Again, /proc/* are not normal files, but virtual ones. Until you read them, their contents do not exist physically anywhere.

When you open /proc/meminfo, the kernel will read its internal data structures, prepare its content (in-memory only), and give it to you. It is not that this file would be physically written every 5 seconds or so to contain the most up-to-date information.

Mounts and mount-points

Each file system (that we want to access) is accessible as a directory somewhere (compared to a drive letter in other systems, for example).

When we can access /dev/sda3 under /home we say that /dev/sda3 is mounted under /home, /home is then called the mount point, /dev/sda3 is often called a volume.

Most devices are mounted automatically during boot. This includes / (root) where the system is as well as /home where your data reside. File systems under /dev or /proc are actually special file systems that are mounted to these locations. Hence, the file /proc/uptime does not physically exist (i.e., there is no disk block with its content anywhere on your hard drive) at all.

The file systems that are mounted during boot are listed in /etc/fstab. You will rarely need to change this file on your laptop and this file was created for you during installation. Note that it contains volume identification (such as path to the partition), the mount point and some extra options.

When you plug-in a removable USB drive, your desktop environment will typically mount it automatically. Mounting it manually is also possible using the mount utility. However, mount has to be run under root to work (this thread explains several aspects why mounting a volume could be a security risk). Therefore, you need to play with this on your installations where you can become root. It will not work on any of the shared machines.

Technical note: the above text may seem contradictory, as mount requires root password yet your desktop environment (DE) may mount the drive automatically without asking for any password. Internally, your DE does not call mount, but it talks to daemons called Udisks and Polkit which run with root privileges. The daemons together verify that the mounted device is actually a removable one and that the user is a local one (i.e., it will not work over SSH). If these conditions are satisfies, it mounts the disk for the given user. By the way, you can talk to Udisks from the shell using udisksctl.

To test the manual mounting, plug-in your USB device and unmount it in your GUI if it was mounted automatically (note that the usual path the device is mounted is somewhere under /media).

Your USB will probably be available as /dev/sdb1 or /dev/sda1 depending what kind of disk you have (consult the following section about lsblk to view the list of drives).

Mounting disks manually

sudo mkdir /mnt/flash
sudo mount /dev/sdb1 /mnt/flash

Your data shall be visible under /mnt/flash.

To unmount, run the following command:

sudo umount /mnt/flash

Note that running mount without any arguments prints a list of currently active mounts. For this, root privileges are not required.

Disk space usage utilities

The basic utility for checking available disk space is df (disk free).

Filesystem     1K-blocks    Used Available Use% Mounted on
devtmpfs         8174828       0   8174828   0% /dev
tmpfs            8193016       0   8193016   0% /dev/shm
tmpfs            3277208    1060   3276148   1% /run
/dev/sda3      494006272 7202800 484986880   2% /
tmpfs            8193020       4   8193016   1% /tmp
/dev/sda1        1038336  243188    795148  24% /boot

In the default execution (above), it uses one-kilobyte blocks. For a more readable output, run it with -BM or -BG (megas and gigas) or with -h to let it select the most suitable unit.

We will return to the topic of storage management in the last lab too.

Check you understand it all

Select all true statements. You need to have enabled JavaScript for the quiz to work.

File archiving and compression

A somewhat related topic to the above is how Linux handles file archival and compression.

Archiving on Linux systems typically refers to merging multiple files into one (for easier transfer) and compression of this file (to save space). Sometimes, only the first step (i.e., merging) is considered archiving.

While these two actions are usually performed together, Linux keeps the distinction as it allows combination of the right tools and formats for each part of the job. Note that on other systems where the ZIP file is the preferred format, these actions are blended into one.

The most widely used program for archiving is tar. Originally, its primary purpose was archiving on tapes, hence the name: tape archiver. It is always run with an option specifying the mode of operation:

-c to create a new archive from existing files,
-x to extract files from the archive,
-t to print the table of files inside the archive.

The name of the archive is given via the -f option; if no name is specified, the archive is read from standard input or written to standard output.

As usually, the -v option increases verbosity. For example, tar -cv prints names of files added to the archive, tar -cvv prints also file attributes (like ls -l). (Everything is printed to stderr, so that stdout can be still used for the archive.) Plain tar -t prints only file names, tar -tv prints also file attributes.

An uncompressed archive can be created this way:

tar -cf archive.tar dir_to_archive/

A compressed archive can be created by piping the output of tar to gzip:

tar -c dir_to_archive/ | gzip >archive.tar.gz

As this is very frequent, tar supports a -z switch, which automatically calls gzip, so that you can write:

tar -czf archive.tar.gz dir_to_archive/

tar has further switches for other (de)compression programs: bzip2, xz, etc.. Most importantly, the -a switch chooses the (de)compression program according to the name of the archive file.

If you want to compress a single file, plain gzip without tar is often used. Some tools or APIs can even process gzip-compressed files transparently.

To unpack an archive, you can again pipe gzip -d (decompress) to tar, or use -z as follows:

tar -xzf archive.tar.gz

Like many other file-system related programs, tar will overwrite existing files without any warning.

We recommend to install atool as a generic wrapper around tar, gzip, unzip and plenty of other utilities to simplify working with archives. For example:

apack archive.tar.gz dir_to_archive/
aunpack archive.tar.gz

Note that atool will not overwrite existing files by default (which is another very good reason for using it).

It is a good practice to always archive a single directory. That way, user that unpacks your archive will not have your files scattered in the current directory but neatly prepared in a single new directory.

To view the list of files inside an archive, you can execute als.

`find`

While ls(1) and wild-card expansion are powerful, sometimes we need to select files using more sophisticated criteria. There comes the find(1) program useful.

Without any arguments, it lists all files in current directory, including files in nested directories.

Do not run it on root directory (/) unless you know what you are doing (and definitely not on the shared linux.ms.mff.cuni.cz machine).

With -name parameter you can limit the search to files matching given wildcard pattern.

Following command finds all alpha.txt files in current directory and in any subdirectory (regardless of depth).

find -name alpha.txt

Why the following command for finding all *.txt files would not work?

find -name *.txt

Hint. Answer.

find has many options – we will not duplicate its manpage here but mention those that are worth remembering.

-delete immediately deletes the found files. Very useful and very dangerous.

-exec runs a given program on every found file. You have to use {} to specify the found filename and terminate the command with ; (since ; terminates commands in shell too, you will need to escape it).

find -name '*.md' -exec wc -l {} \;

Note that for each found file, new invocation of wc happens. This can be altered by changing the command terminator (\;) to +. See the difference between invocation of the following two commands:

find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +

Caveats

By default, find prints one filename per-line. However, filename can even contain the newline character (!) and thus the following idiom is not 100% safe.

find -options-for-find | while read filename; do
    do_some_complicated_things_with "$filename"
done

If you want to be really safe, use -print0 and IFS= read -r -d $'\0' filename as that would use the only safe delimiter – \0. Alternatively, you can pipe the output of find -print0 to xargs --null.

However, if you are working with your own files or the pattern is safe, the above loop is fine (just do not forget that directories are files too and they can contain \n in their names too).

Shell also allows you to export a function and call back to it from inside xargs. The invocation pattern looks awful but it is a safe approach if you want to execute a complex operation on top of found files.

my_callback_function() {
    echo ""
    echo "\$0 = $0"
    echo "\$@ =" "$@"
}
export -f my_callback_function

find . -print0 | xargs -0 -n 1 bash -c 'my_callback_function "$@"' arg_zero arg_one

Obviously, name your function properly based on what it does. Our name has one advantage – it is clearly visible in all three places where the identifier is used.

Recall that you can define functions directly in shell and the above can be actually created interactively without storing it as a script.

Tasks to check your understanding

We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.

Graded mini-homework

Run program nswi177-signals on linux.ms.mff.cuni.cz.

You will need to send specific signals in given order to this program to complete this task.

The program will guide you: it will print which signals you are supposed to send.

The deadline for this task is May 11.

Please, copy the last line of output (there will be two numbers) to 11/signal.txt into your GitLab repository student-LOGIN.

This task is not fully checked by the automated tests.

There are multiple options available, separate your answer with spaces or commas, e.g. **[A1]** 1,2 **[/A1]**.

Assume that we have a file `test.txt` for which `ls -l` prints the following:

    -rw-r----- 1 bjorn ursidae 13 Mar 21 14:54 test.txt

Which of the following users will be able to read the contents of the file?

 1. `bjorn` in group `ursidae`
 2. `bjorn` in groups `carnivora` and `mammalia`
 3. `iorek` in group `ursidae`
 4. `iorek` in groups `carnivora` and `mammalia`
 5. `root` (the superuser)
 6. everybody

**[A1]** ... **[/A1]**

Consider that the file from the previous example is stored within
the directory `/data` with the following permissions as printed by `ls -l`:


    drwxrwx-wx 3 bjorn ursidae 4096 Mar 21 14:53 data


Which of the following users will be able to delete the file?

 1. `bjorn` in group `ursidae`
 2. `bjorn` in groups `carnivora` and `mammalia`
 3. `iorek` in group `ursidae`
 4. `iorek` in groups `carnivora` and `mammalia`
 5. `root` (the superuser)
 6. everybody

You can assume that the root directory `/` is readable and executable
by everybody.

**[A2]** ... **[/A2]**

Continuing with the previous questions, which commands can be used to make
the file `test.txt` readable and writeable only to the owner and nobody else?

 1. `chmod u=rw test.txt`
 2. `chmod =rw test.txt`
 3. `chmod g= test.txt`
 4. `chmod o= test.txt`
 5. `chmod g=,o= test.txt`
 6. `chmod g-r test.txt`
 7. `chmod g-rwx test.txt`

**[A3]** ... **[/A3]**

This example can be checked via GitLab automated tests. Store your solution as 11/rights.md and commit it (push it) to GitLab.

Move into your submission repository (student-LOGIN).

Check all your shell scripts with Shellcheck and your Python scripts with Pylint.

Solution.

Compress all .csv files that are greater than 1M using GZ (gzip).

Do not overwrite existing files without prompting.

Hint.

Solution.

Recall our static site generator from Lab 09. In it we had the following loop to generate HTML file from Markdown source.

generate_web() {
    local page
    for page in src/*.md; do
        if ! [ -f "$page" ]; then
            continue
        fi
        build_markdown_page "$page"
    done
}

Update the implementation to support nested subdirectories as well.

Solution.

Learning outcomes and after class checklist

This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain basic access rights in Unix operating systems
explain what individual access rights r, w and x mean for normal files and what for directories
explain what is a process signal
explain difference between normal files, directories, symbolic links, device files and system-state files (e.g. from /proc filesystem)
list fundamental top-level directories on a typical Linux installation and describe their function
explain in general terms how the directory tree is formed by mounting individual (file) subsystems
explain why Linux maintains separation of archiving and compression programs (e.g. tar and gzip)
explain what is a set-uid bit
explain what is a process and how it differs from an executable file
explain difference of ownership of a file and a running process
optional: provide a high-level overview of POSIX ACLs

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

view and change basic access permissions of a file
use ps to view list of existing processes (including -e, -f and --forest switches)
use pgrep to find specific processes
send a signal to a running process
use htop to interactively monitor existing processes
mount disks using the mount command (both physical disks as well as images)
get summary information about disk usage with df command
use either tar or atool to work with standard Linux archives
use find with basic predicates (-name, -type) and actions (-exec, -delete)

This page changelog

2025-04-09: One more example with find.
2025-04-09: Fix learning outcomes.

find -name '*.csv' -type f -size +$(( 1024 * 1024 ))c -exec gzip {} \;

Note that the default unit is in 512 blocks (yes, that is not very practical) and the rounding rules are also rather counter-intuitive.

gzip will not overwrite existing files hence nothing more needs to be done.

There are various approach how to change the implementation, our solution uses find with -print0 and modified while read loop.

build_markdown_page() {
    local input_file="$1"
    local output_dir="$( dirname "$input_file" | sed 's:^src:public:' )"
    local output_file="$output_dir/$( basename "$input_file" ".md" ).html"
    mkdir -p "$output_dir"
    $LOGGER "Generating $input_file => $output_file"
    pandoc_as_filter "$input_file" >"$output_file"
}

...

generate_web() {
    find src -type f -name '*.md' -print0 | while IFS= read -r -d $'\0' filename; do
        build_markdown_page "$filename"
    done
}

find -name '*.csv' -type f -size +$(( 1024 * 1024 ))c -exec gzip {} \;

Note that the default unit is in 512 blocks (yes, that is not very practical) and the rounding rules are also rather counter-intuitive.

gzip will not overwrite existing files hence nothing more needs to be done.

There are various approach how to change the implementation, our solution uses find with -print0 and modified while read loop.

build_markdown_page() {
    local input_file="$1"
    local output_dir="$( dirname "$input_file" | sed 's:^src:public:' )"
    local output_file="$output_dir/$( basename "$input_file" ".md" ).html"
    mkdir -p "$output_dir"
    $LOGGER "Generating $input_file => $output_file"
    pandoc_as_filter "$input_file" >"$output_file"
}

...

generate_web() {
    find src -type f -name '*.md' -print0 | while IFS= read -r -d $'\0' filename; do
        build_markdown_page "$filename"
    done
}

Preflight checklist

Unix-style access rights

root account

Viewing and changing the permissions

Sticky and other bits

Beyond traditional Unix permissions: POSIX ACL

Access rights checks

Access rights quiz

Processes (and signals)

Quick check on processes

Signals

Use of kill, pkill and pgrep

Reacting to signals in shell

Deficiencies in signal design and implementation

Quick check about signals

Files and storage management

Symbolic links

Special files

File system hierarchy

Mounts and mount-points

Mounting disks manually

Disk space usage utilities

Check you understand it all

File archiving and compression

find

Caveats

Tasks to check your understanding

Graded mini-homework

Learning outcomes and after class checklist

Conceptual knowledge

Practical skills

This page changelog

`root` account

Use of `kill`, `pkill` and `pgrep`

`find`