Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11.
Please, see latest news in issue #92 (from April 03).
This lab will span several topics that are not very big. They are also only partially connected with each other so you might read the big sections in virtually any order you prefer.
This is probably the most theory-heavy lab: there are not that many things to actually try and practice but a lot of background-style information that might put other knowledge in better perspective.
This lab also contains a mini homework for two points.
Preflight checklist
- You remember about shell wildcards.
- You know that C strings are terminated with zero byte.
Unix-style access rights
So far we have (almost) ignored the fact that there are different user accounts on any Linux machine. And that users cannot access all files on the machine. In this section we will explain the basics of Unix-style access rights and how to interpret them.
After all, you can log in to a shared machine and you should be able to understand what you can access and what you cannot.
Recall what we said about /etc/passwd
earlier – it contains the list of user accounts
on that particular machine
(technically, it is not the only source of user records, but it is a good
enough approximation for now).
Every running application, i.e., a process, is owned by one of the users
from /etc/passwd
(again, we simplify things a little bit).
We also say that the process is running under a specific user.
And every file in the filesystem (including both real files such as ~/.bashrc
and virtual ones such as /dev/sda
or /proc/uptime
) has some owner.
When a process tries to read or modify a file, the operating system decides whether the operation is permitted. This decision is based on the owner of the file, the owner of the process, and permissions defined for the file. If the operation is forbidden, the input/output function in your program raises an exception (e.g., in Python), or returns an error code (in C).
Since a model based solely on owners would be too inflexible, there are also
groups of users (defined in /etc/group
). Every user is a member of one or
more groups, one of them is called the primary group. These are associated
with every process of the user. Files have both an owning user and an owning group.
Files are assigned three sets of permissions: one for the owner of the file, one for users in the owning group, and one for all other users. The exact algorithm for deciding which set will be used is this:
- If the user running the process is the same as the owner of the file, owner access rights are used (sometimes also referred to as user access rights).
- If the user running the process is in a group that was set on the file group access rights are used.
- Otherwise, the system checks against other access rights.
Every set of permissions contains three rights: read (r
), write (w
), execute (x
):
-
Read and write operations on a file are obvious.
-
The execute right is checked when the user tries to run the file as a program (recall that without
chmod +x
, the error message was in the sense of Permission denied: this is the reason).- Note that a script which is readable, but not executable, can be still run by launching the appropriate interpreter manually.
- Note that when a program is run, the new process will inherit the owner
and groups of its parent (e.g., of the shell that launched it). Ownership
of the executable file itself is not relevant once the program was started.
For example,
/usr/bin/mc
is owned byroot
, yet it can be launched by any user and the running process is then owned by the user who launched it.
The same permissions also apply to directories. Their meaning is a bit different, though:
- Read right allows the user to list directory entries (files, symlinks, sub-directories, etc.).
- Write right allows the user to create, remove, and rename entries inside that directory. Note that removing write permission from a file inside a writable directory is pointless as it does not prevent the user from overwriting the file completely with a new one.
- Execute right on a directory allows the user to open the entries.
(If a directory has
x
, but notr
, you can use the files inside it if you know their names; however, you cannot list them. On the contrary, if a directory hasr
, but notx
, you can only view the entries, but not use them.)
Permissions of a file or directory can be changed only by its owner, regardless of the current permissions. That is, the owner can deny access to themselves by removing all access rights, but can always restore them later.
root
account
Apart from accounts for normal users, there is always an account for a so-called superuser – more often called simply just root – that has administrator privileges and is permitted to do anything with any file in the system. The permissions checks described above simply do not apply to root-owned processes.
Viewing and changing the permissions
Looking at the shortcuts of rwx
for individual permissions, you may found
them familiar:
drwxr-xr-x 1 intro intro 48 Feb 23 16:00 02/
drwxr-xr-x 1 intro intro 60 Feb 23 16:00 03/
-rw-r--r-- 1 intro intro 51 Feb 23 16:00 README.md
The very first column actually contains the type of the entry (d
for directory, -
for a plain file, etc.)
and three triplets of permissions.
The first triplet refers to owner of the file, the middle one to the group,
and last to the rest of the world (other).
The third and fourth columns contain the owner and the group of the file.
Typically, your personal files in your home directory will have you as the owner together with a group with the same name. That is a default configuration that prevents other users from seeing your files.
Do check that it is true for all directories under /home
on the shared
machine.
But also note that most of the files in your home directory are actually world-readable (i.e., anyone can read them).
That is actually quite fine because if you check permissions for your ~
,
you will see that it is typically drwx------
.
Only the owner can modify and cd
to it.
Since no one can actually change to your directory, no one will be able to read
your files
(technically, reading a file involves traversing the whole directory and checking
access rights on the whole path).
To change the permissions, you can use chmod
program.
It has the general format of
chmod WHO[+=-]PERMISSION file1 file2 ...
WHO
can be empty (for all three of user, group and others) or specification
of u
, g
or o
.
And PERMISSION
can be r
, w
or x
.
It is possible to combine these, e.g. g-r,u=rw
removes read permission
for group and sets read-write for the owner (i.e. the file will not
be executable by the owner regardless of a previous state of the executable bit).
Access rights checks
Let’s return a little bit to the access rights.
Change permission of some of your scripts to be --x
.
Try to execute them.
What happens?
Answer.
Remove writable bit for a file and write to it using stdout redirection. What happens?
Access rights quiz
Assuming the following output of ls -l
(script.sh
is really a shell script)
and assuming user bob
is in group wonderland
while user lewis
is not.
-rwxr-xr-- 1 alice wonderland 1234 Feb 20 11:11 script.sh
Select all true statements.
You need to have enabled JavaScript for the quiz to work.Processes (and signals)
Files in the system are the passive elements of it. The active parts are running programs that actually modify data. Let us have a look what is actually running on our machine.
When you start a program (i.e., an executable file), it becomes a process. The executable file and a running process share the code – it is the same in both. However, the process also contains the stack (e.g., for local variables), heap, current directory, list of opened files etc. etc. – all this is usually considered a context of the process. Often, the phrases running program and process are used interchangeably.
To view the list of running processes on our machine,
we can use htop to view basic properties of processes.
Similar to MidnightCommander, function keys perform the most important actions
and the help is visible in the bottom bar.
You can also configure htop
to display information about your system like amount of free memory
or CPU usage.
For non-interactive use we can execute ps -e
(or ps -axufw
for a more detailed list).
For illustration here, this is an example of ps
output (with --forest
option use to depict also the parent/child relation).
However, run ps -ef --forest
on the shared machine to also view running
processes of your colleagues.
UID PID PPID C STIME TTY TIME CMD
root 2 0 0 Feb22 ? 00:00:00 [kthreadd]
root 3 2 0 Feb22 ? 00:00:00 \_ [rcu_gp]
root 4 2 0 Feb22 ? 00:00:00 \_ [rcu_par_gp]
root 6 2 0 Feb22 ? 00:00:00 \_ [kworker/0:0H-events_highpri]
root 8 2 0 Feb22 ? 00:00:00 \_ [mm_percpu_wq]
root 10 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_kthre]
root 11 2 0 Feb22 ? 00:00:00 \_ [rcu_tasks_rude_]
root 1 0 0 Feb22 ? 00:00:09 /sbin/init
root 275 1 0 Feb22 ? 00:00:16 /usr/lib/systemd/systemd-journald
root 289 1 0 Feb22 ? 00:00:02 /usr/lib/systemd/systemd-udevd
root 558 1 0 Feb22 ? 00:00:00 /usr/bin/xdm -nodaemon -config /etc/X11/...
root 561 558 10 Feb22 tty2 22:42:35 \_ /usr/lib/Xorg :0 -nolisten tcp -auth /var/lib/xdm/...
root 597 558 0 Feb22 ? 00:00:00 \_ -:0
intro 621 597 0 Feb22 ? 00:00:40 \_ xfce4-session
intro 830 621 0 Feb22 ? 00:05:54 \_ xfce4-panel --display :0.0 --sm-client-id ...
intro 1870 830 4 Feb22 ? 09:32:37 \_ /usr/lib/firefox/firefox
intro 1966 1870 0 Feb22 ? 00:00:01 | \_ /usr/lib/firefox/firefox -contentproc ...
intro 4432 830 0 Feb22 ? 01:14:50 \_ xfce4-terminal
intro 4458 4432 0 Feb22 pts/0 00:00:11 \_ bash
intro 648552 4458 0 09:54 pts/0 00:00:00 | \_ ps -ef --forest
intro 15655 4432 0 Feb22 pts/4 00:00:00 \_ bash
intro 639421 549293 0 Mar02 pts/8 00:02:00 \_ man ps
...
First of all, each process has a process ID, often just PID
(but not this one).
The PID is a number assigned by the kernel and used by many utilities for process management.
PID 1 is used by the first process in the system, which is always running.
(PID 0 is reserved as a special value – see fork(2)
if you are interested in details.)
Other processes are assigned their PIDs incrementally (more or less) and PIDs are eventually
reused.
Note that all this information is actually available in /proc/
PID/
and that is where ps
reads its information from.
Execute ps -ef --forest
again to view all process on your machine.
Because of your graphical interface, the list will be probably quite long.
Practically, a small server offering web pages, calendar and SSH access can have about 80 processes, for a desktop running Xfce with browser and few other applications, the number will rise to almost 300 (this really depends a lot on the configuration but it is a ballpark estimate). About 50–60 of these are actually internal kernel threads. In other words, a web/calendar server needs about 20 “real” processes, a desktop about 200 of them :-).
Quick check on processes
Signals
Linux systems use the concept of signals to communicate asynchronously with a running program (process). The word asynchronously means that the signal can be sent (and delivered) to the process regardless of its state. Compare this with communication via standard input (for example), where the program controls when it will read from it (by calling appropriate I/O read function).
However, signals do not provide a very rich communication channel: the only information available (apart from the fact that the signal was sent) is the signal number. Most signal numbers are defined by the kernel, which also handles some signals by itself. Otherwise, signals can be received by the application and acted upon. If the application does not handle the signal, it is processed in the default way. For some (most) signals, the default is terminating the application; other signals are ignored by default.
This is actually expressed in the fact that the utility used to send signals
is called kill
(see below).
By default, the kill
utility sends signal 15 (also called TERM
or SIGTERM
) that
instructs the application to terminate.
An application may decide to catch this signal, flush its data to the disk etc.,
and then terminate.
But it can do virtually anything and it may even ignore the signal completely.
Apart from TERM
, we can instruct kill
to send the KILL
signal (number 9)
which is handled by kernel itself. It immediately and forcefully terminates
the application. The application may try to register to receive notification
for the KILL
signal but kernel will never deliver it.
Many other signals are sent to the process in reaction to a specific event.
For example, the signal PIPE
is sent when a process tries to write to a pipe,
whose reading end was already closed – the “Broken pipe” message you already saw
is printed by the shell if the command was terminated by this signal.
Terminating a program by pressing Ctrl-C
in the terminal actually sends the INT
(interrupt, number 2) signal to it.
If you are curious about the other signals, see signal(7)
.
Use of kill
, pkill
and pgrep
The pgrep
command can be used to find processes matching a given name.
Open two extra terminals and run sleep 600
in one and sleep 800
in the
second one.
The sleep
program simply waits given amount of seconds before terminating.
In a third terminal, run the following commands to understand how the searching for the processes is done.
pgrep sleep
pgrep 600
pgrep -f 600
What have you learnt? Answer.
When we know the PID, we can use the kill
utility to actually terminate the
program.
Try running kill PID
with PID of one of the sleeps and look what happened
in the terminal with sleep
.
You should see something like this:
Terminated (SIGTERM).
This message informs us that the command was forcefully terminated.
Similarly, you can use pkill
to kill processes by name
(but be careful as with great power comes great responsibility).
Consult the manual pages for more details.
Reacting to signals in shell
Reaction to signals in shell is done through a trap
command.
Note that a typical action for a signal handler in a shell script is clean-up of temporary files.
#!/bin/sh
set -ueo pipefail
on_interrupt() {
echo "Interrupted, terminating ..." >&2
exit 17
}
on_exit() {
echo "Cleaning up..." >&2
rm -f "$my_temp"
}
my_temp="$( mktemp )"
trap on_interrupt INT TERM
trap on_exit EXIT
echo "Running with PID $$"
counter=1
while [ "$counter" -lt 10 ]; do
date "+%Y-%m-%d %H:%M:%S | Waiting for Ctrl-C (loop $counter) ..."
echo "$counter" >"$MY_TEMP"
sleep 1
counter=$(( counter + 1 ))
done
The command trap
receives as the first argument the command to execute on
the signal. Other arguments list the signals to react to.
Note that a special signal EXIT
means normal script termination.
Hence, we do not need to call on_exit
after the loop terminates.
We use exit 17
to report termination through the Ctrl-C handler (the value
is arbitrary by itself).
Feel free to check the return value with echo $?
after the command
terminates.
The special variable $?
contains the exit code of the last command.
Using -
(dash) instead of the handler causes the respective handler to be set to
default.
Note the use of $$
which prints the current PID.
Run the above script, note its PID and run the following in a new terminal.
kill THE_PID_PRINTED_BY_THE_ABOVE_SCRIPT
The script was terminated and the clean-up routine was called.
Compare with situation when you comment out the trap
command.
Run the script again but pass -9
to kill
to specify that you want to
send signal nine (i.e., KILL
).
What happened? Answer.
While signals are a rudimentary mechanism, which passes binary events with no additional data, they are the primary way of process control in Linux.
If you need a richer communication channel, you can use D-Bus instead.
Deficiencies in signal design and implementation
Signals are the rudimentary mechanism for interprocess communication on Unix systems. Unfortunately their design has several flaws that complicate their safe usage.
We will not dive into details but you should bear in mind that signal handling can be tricky in situations where you cannot afford to lose any signal or when signals can come quickly one after another. And there is a whole can of worms when using signals in multithreaded programs.
On the other hand, for simple shell scripts where we want to clean-up on
forceful termination the pattern we have shown above is sufficient.
It guards our script when user hits Ctrl-C
because they realized that
it is working on wrong data or something similar.
But note that it contains a bug for the case when the user hits Ctrl-C
very
early during script execution.
my_temp="$( mktemp )"
# User hits Ctrl-C here
trap on_interrupt INT TERM
trap on_exit EXIT
The temporary file was already created but the handler was not yet registered
and thus the file will not be removed.
But changing the order complicates the signal handler as we need to test that
$MY_TEMP
was already initialized.
Quick check about signals
Files and storage management
Before proceeding, recall that files reside on file systems that are the structures on the actual block devices (typically, disks).
Working with file systems and block devices is necessary when installing a new system, rescuing from a broken device, or simply checking available free space.
You are already familiar with normal files and directories. But there are other types of files that you can find on a Linux system.
Symbolic links
Linux allows to create a symbolic link to another file. This special file does not contain any content by itself and merely points to another file.
An interesting feature of a symbolic link is that it is transparent to
the standard file I/O API. If you call Pythonic open
on a symbolic link, it
will transparently open the file the symbolic link points to. That is the
intended behavior.
The purpose of symbolic links is to allow different perspectives on the same files without need for any copying and synchronization.
For example, a movie player is able to play only files in directory Videos
.
However, you actually have the movies elsewhere because they are on a shared
hard drive.
With the use of a symbolic link, you can make Videos
a symbolic link to
the actual storage and make the player happy.
(For the record, we do not know about any movie player with such behaviour,
but there are plenty of other programs where such magic can make them work
in a complex environment they were not originally designed for.)
Note that a symbolic link is something else than what you may know as Desktop shortcut or similar. Such shortcuts are actually normal files where you can specify which icon to use and also contain information about the actual file. Symbolic links operate on a lower level.
To create a symbolic link, run ln -s
. For example, running the following
will create a symlink to /etc/passwd
named users.txt
. Note that running
cat users.txt
will open users.txt
even though the kernel will supply the
contents of /etc/passwd
.
ln -s /etc/passwd users.txt
Special files
There are also other special files that represent physical devices or files that serve as a spy-hole into the state of the system.
The reason is that it is much simpler for the developer that way. You do not need special utilities to work with a disk, you do not need a special program to read the amount of free memory. You simply read the contents of a well-known file and you have the data.
It is also much easier to test such programs because you can easily give them mock files by changing the file paths – a change that is unlikely to introduce a serious bug into the program.
Usually Linux offers the files that reveal state of the system in a textual
format.
For example, the file /proc/meminfo
can look like this:
MemTotal: 7899128 kB
MemFree: 643052 kB
MemAvailable: 1441284 kB
Buffers: 140256 kB
Cached: 1868300 kB
SwapCached: 0 kB
Active: 509472 kB
Inactive: 5342572 kB
Active(anon): 5136 kB
Inactive(anon): 5015996 kB
Active(file): 504336 kB
Inactive(file): 326576 kB
...
This file is nowhere on the disk but when you open this path, Linux creates the contents on the fly.
Notice how the information is structured: it is a textual file, so reading it requires no special tools and the content is easily understood by a human. On the other hand, the structure is quite rigid: each line is a single record, keys and values are separated by a colon. Easy for machine parsing as well.
File system hierarchy
We will now briefly list some of the key files you can find on virtually any Linux machine.
Do not be afraid to actually display contents of the files we mention here.
hexdump -C
is really a great tool.
/boot
contains the bootloader for loading the operating system.
You would rarely touch this directory once the system is installed.
/dev
is a very special directory where hardware devices have their
file counterparts.
You will probably see there a file sda
or nvme0
that represents your
hard (or SSD) drive.
Unless you are running under a superuser,
you will not have access to these
files, but if you would hexdump
them, you would see the bytes as they
are on the actual hard drive.
It is important to note that these files are not physical files on your disk (after all, it would mean having a disk inside a disk). When you read from them, the kernel recognizes that and returns the right data.
This directory also contains several special but very useful files for software development.
/dev/urandom
returns random bytes indefinitely.
It is probably internally used inside your favorite programming language
to implement its random()
function.
Try to run hexdump
on this file (and recall that <Ctrl>-C
will
terminate the program once you are tired of the randomness).
/dev/null
is your local black hole: it discards everything written to it.
/etc/
contains system-wide configuration.
Typically, most programs in UNIX systems are configured via text files.
The reasoning is that an administrator needs to learn only one tool – a good
text editor – for system management.
The advantage is that most configuration files have support for comments and
it is possible to comment even on the configuration.
For an example of such a configuration file, you can have a look at
/etc/systemd/system.conf
to get the feeling.
Perhaps the most important file is /etc/passwd
that contains a list of user
accounts.
Note that it is a plain text file where each row represents one record and
individual attributes are simply separated by a colon :
.
Very simple to read, very simple to edit, and very simple to understand.
In other words, the KISS principle in practice.
/home
contains home directories for normal user accounts (i.e., accounts
for real – human – users).
/lib
and /usr
contain dynamic libraries, applications, and system-wide
data files.
/var
is for volatile data. If you would install a database or a web server
on your machine, its files would be stored here.
/tmp
is a generic location for temporary files.
This directory is automatically cleaned at each reboot, so do not use it for permanent
storage. Many systems also automatically remove files which were not modified in the
last few days.
/proc
is a virtual file system that allows controlling and reading of
kernel (operating system) settings.
For example, the file /proc/meminfo
contains quite detailed information about
RAM usage.
Again, /proc/*
are not normal files, but virtual ones.
Until you read them, their contents do not exist physically anywhere.
Mounts and mount-points
Each file system (that we want to access) is accessible as a directory somewhere (compared to a drive letter in other systems, for example).
When we can access /dev/sda3
under /home
we say that /dev/sda3
is mounted under /home
, /home
is then called the mount point,
/dev/sda3
is often called a volume.
Most devices are mounted automatically during boot.
This includes /
(root) where the system is as well as /home
where your
data reside.
File systems under /dev
or /proc
are actually special file systems that are
mounted to these locations.
Hence, the file /proc/uptime
does not physically exist (i.e., there is no
disk block with its content anywhere on your hard drive) at all.
The file systems that are mounted during boot are listed in /etc/fstab
.
You will rarely need to change this file on your laptop and this file was
created for you during installation.
Note that it contains volume identification (such as path to the partition),
the mount point and some extra options.
When you plug-in a removable USB drive, your desktop environment will typically
mount it automatically.
Mounting it manually is also possible using the mount
utility.
However, mount
has to be run under root
to work
(this thread explains several aspects
why mounting a volume could be a security risk).
Therefore, you need to play with this on your installations where you can become
root
.
It will not work on any of the shared machines.
Mounting disks manually
sudo mkdir /mnt/flash
sudo mount /dev/sdb1 /mnt/flash
Your data shall be visible under /mnt/flash
.
To unmount, run the following command:
sudo umount /mnt/flash
Note that running mount
without any arguments prints a list of currently active mounts.
For this, root privileges are not required.
Disk space usage utilities
The basic utility for checking available disk space is df
(disk free).
Filesystem 1K-blocks Used Available Use% Mounted on
devtmpfs 8174828 0 8174828 0% /dev
tmpfs 8193016 0 8193016 0% /dev/shm
tmpfs 3277208 1060 3276148 1% /run
/dev/sda3 494006272 7202800 484986880 2% /
tmpfs 8193020 4 8193016 1% /tmp
/dev/sda1 1038336 243188 795148 24% /boot
In the default execution (above), it uses one-kilobyte blocks.
For a more readable output, run it with -BM
or -BG
(megas and gigas)
or with -h
to let it select the most suitable unit.
We will return to the topic of storage management in the last lab too.
Check you understand it all
File archiving and compression
A somewhat related topic to the above is how Linux handles file archival and compression.
Archiving on Linux systems typically refers to merging multiple files into one (for easier transfer) and compression of this file (to save space). Sometimes, only the first step (i.e., merging) is considered archiving.
While these two actions are usually performed together, Linux keeps the distinction as it allows combination of the right tools and formats for each part of the job. Note that on other systems where the ZIP file is the preferred format, these actions are blended into one.
The most widely used program for archiving is tar
.
Originally, its primary purpose was archiving on tapes, hence the name: tape archiver.
It is always run with an option specifying the mode of operation:
-c
to create a new archive from existing files,-x
to extract files from the archive,-t
to print the table of files inside the archive.
The name of the archive is given via the -f
option; if no name is specified,
the archive is read from standard input or written to standard output.
As usually, the -v
option increases verbosity. For example, tar -cv
prints names
of files added to the archive, tar -cvv
prints also file attributes (like ls -l
).
(Everything is printed to stderr, so that stdout can be still used for the archive.)
Plain tar -t
prints only file names, tar -tv
prints also file attributes.
An uncompressed archive can be created this way:
tar -cf archive.tar dir_to_archive/
A compressed archive can be created by piping the output of tar
to gzip
:
tar -c dir_to_archive/ | gzip >archive.tar.gz
As this is very frequent, tar
supports a -z
switch, which automatically calls
gzip
, so that you can write:
tar -czf archive.tar.gz dir_to_archive/
tar
has further switches for other (de)compression programs: bzip2
, xz
, etc..
Most importantly, the -a
switch chooses the (de)compression program according
to the name of the archive file.
If you want to compress a single file, plain gzip
without tar
is often used.
Some tools or APIs can even process gzip-compressed files transparently.
To unpack an archive, you can again pipe gzip -d
(decompress) to tar
,
or use -z
as follows:
tar -xzf archive.tar.gz
We recommend to install atool
as a generic wrapper around tar
, gzip
,
unzip
and plenty of other utilities to simplify working with archives.
For example:
apack archive.tar.gz dir_to_archive/
aunpack archive.tar.gz
Note that atool
will not overwrite existing files by default
(which is another very good reason for using it).
To view the list of files inside an archive, you can execute als
.
find
While ls(1)
and wild-card expansion are powerful, sometimes we need to select
files using more sophisticated criteria.
There comes the find(1)
program useful.
Without any arguments, it lists all files in current directory, including files in nested directories.
With -name
parameter you can limit the search to files matching given wildcard
pattern.
Following command finds all alpha.txt
files in current directory and in any
subdirectory (regardless of depth).
find -name alpha.txt
Why the following command for finding all *.txt
files would not work?
find -name *.txt
find
has many options – we will not duplicate its manpage here but mention
those that are worth remembering.
-delete
immediately deletes the found files.
Very useful and very dangerous.
-exec
runs a given program on every found file.
You have to use {}
to specify the found filename and terminate the command
with ;
(since ;
terminates commands in shell too, you will need to escape it).
find -name '*.md' -exec wc -l {} \;
Note that for each found file, new invocation of wc
happens. This can be altered
by changing the command terminator (\;
) to +
. See the difference between
invocation of the following two commands:
find -name '*.md' -exec echo {} \;
find -name '*.md' -exec echo {} +
Caveats
By default, find
prints one filename per-line.
However, filename can even contain the newline character (!) and thus the
following idiom is not 100% safe.
find -options-for-find | while read filename; do
do_some_complicated_things_with "$filename"
done
If you want to be really safe, use -print0
and IFS= read -r -d $'\0' filename
as that would use the only safe delimiter – \0
.
Alternatively, you can pipe the output of find -print0
to xargs --null
.
However, if you are working with your own files or the pattern is safe,
the above loop is fine (just do not forget that directories are
files too and they can contain \n
in their names too).
Shell also allows you to export a function and call back to it from
inside xargs
.
The invocation pattern looks awful but it is a safe approach if you want
to execute a complex operation on top of found files.
my_callback_function() {
echo ""
echo "\$0 = $0"
echo "\$@ =" "$@"
}
export -f my_callback_function
find . -print0 | xargs -0 -n 1 bash -c 'my_callback_function "$@"' arg_zero arg_one
Recall that you can define functions directly in shell and the above can be actually created interactively without storing it as a script.
Tasks to check your understanding
We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.
Learning outcomes and after class checklist
This section offers a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain basic access rights in Unix operating systems
-
explain what individual access rights
r
,w
andx
mean for normal files and what for directories -
explain what is a process signal
-
explain difference between normal files, directories, symbolic links, device files and system-state files (e.g. from
/proc
filesystem) -
list fundamental top-level directories on a typical Linux installation and describe their function
-
explain in general terms how the directory tree is formed by mounting individual (file) subsystems
-
explain why Linux maintains separation of archiving and compression programs (e.g.
tar
andgzip
) -
explain what is a set-uid bit
-
explain what is a process and how it differs from an executable file
-
explain difference of ownership of a file and a running process
-
optional: provide a high-level overview of POSIX ACLs
Practical skills
Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
view and change basic access permissions of a file
-
use
ps
to view list of existing processes (including-e
,-f
and--forest
switches) -
use
pgrep
to find specific processes -
send a signal to a running process
-
use
htop
to interactively monitor existing processes -
mount disks using the
mount
command (both physical disks as well as images) -
get summary information about disk usage with
df
command -
use either
tar
oratool
to work with standard Linux archives -
use
find
with basic predicates (-name
,-type
) and actions (-exec
,-delete
)
This page changelog
-
2025-04-09: One more example with
find
. -
2025-04-09: Fix learning outcomes.