Lab #13 | NSWI177 Labs | D3S

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Preflight checklist
Containers
Why we have all the systemctl, dnf, podman, pip, …
Printing and scanning in Linux
Tasks to check your understanding
Learning outcomes

The main topic for this lab is the use of containers: of lightweight virtual machines that are very useful for testing and development. We will build on this topic in the last lab when we will setup continuous integration on top of it.

The extra topic about printing and scanning can be safely skipped if you do not have enough time for it.

Preflight checklist

You know that software is installed via package managers on Linux.

Containers

If you are familiar with containers then note that in the introductory part of the text we intentionally ignore the difference between an image and a container. We believe it is a bit easier for the first steps.

Containers are another approach for isolation. We have already seen project sandboxing and many of you have tried running a virtualized Linux installation.

Containers are somewhere in between. They offer an isolated environment that generally behaves as a fully virtualized machine. From the implementation point of view, they are closer to virtual environment as processes inside a container are visible from the host system. We can imagine them as if we gave the container one directory (containing all the usual subdirectories such as /dev, /proc or /home) to run in without an option to escape.

Because of the above, containers can run only the applications written for the same operating system (unlike a full-fledged virtual machine).

Because of their separation from the host system, containers are extremely useful in many scenarios. Note that using a fully virtualized machine (e.g., VirtualBox or QEMU) is an option too, but containers are light-weight and thus have a smaller overhead (e.g., faster start-up time).

The separation from the host system is very high: without extra configuration, the container cannot access host’s file systems and cannot listen on host’s ports for incoming connections. But it can initiate outgoing connections (e.g., to fetch packages that are to be installed). A container can be also limited in the amount of RAM it can use. By default, container processes are scheduled as normal processes (e.g., they have the same priority) but it is also possible to limit their CPU usage (e.g., throttle them as low-priority jobs).

A typical example is the need to run an isolated server that you need for development. You can imagine a database server or a web server here. You can certainly install such server system-wide (recall lab 09) but it does not provide the isolation and the easiness of removal. Recall how it works with virtual environments: removing a single directory cleans up the whole environment.

Similarly, removing a container is a simple and fast operation and you can start with a fresh one in matter of seconds.

Using a container also has the advantage that you can specify how exactly the container shall look like: what processes it spawns, on which ports it listens etc. Such specification can be easily codified (like with requirements.txt) and thus easily reproduced on a different machine.

Container images are also often used when you need to ship a complex application which requires several services to execute correctly. Instead of providing a detailed manual or a VirtualBox image, you provide a ready-to-be-run container. The user then launches the whole container and internally, the container takes care of the rest, exposing the final service. For example, the whole GitLab server can be downloaded and hosted as a container.

Docker and Podman

In this lab, we will show the basics of Linux containers based on Docker and Podman. Both implementations are virtually the same. Their main commands (docker and podman) support exactly the same arguments and have the same semantics in most cases.

The main difference is that Docker is a bit older (though still actively developed) and was intended for system-wide containers (e.g., when you wanted to run a self-hosted instance of GitLab). Podman is a bit younger and it uses newer features of the Linux kernel which allow it to execute containers without superuser privileges (that is actually still quite a new feature of Linux). Also, Podman integrates better with the rest of the system.

In this sense, Podman is the perfect choice for a developer. You need a database server? Use Podman to get the right container and start it. Your database is clean and ready to be used. Without a need for superuser – root – privileges (this is often called rootless mode).

On the other hand, if you run an older version of Linux or the container requires some Docker-specific features, Docker might be a better choice.

Terminology …

There are two main concepts related to this lab. An image and a container. They are somewhat similar to a class and an object (instance), or an executable and a running process.

The image is like a hard disk for the isolated environment. It contains all the necessary files, including executables as well as data files.

To run it, we create a container. The container starts with the same state as the image, but it contains the running processes that might be modifying its state. Unless explicitly stated otherwise, the changes done by the container are not propagated to the image: instead, the container starts with a copy of the image (files) and modifies the copy.

Processes inside the container are isolated from the outside (the host) and the container does not see processes of the host.

On the other hand, processes in the container are visible in the host system. Root directory of the container corresponds to a subdirectory of the host. User IDs in the container are translated to a range of user IDs of the host. The same applies to group IDs.

Docker/Podman containers usually run processes inside the container with privileges of container’s root user, which looks as a normal user (usually with a very high UID) in the host system.

Image stacking

New container images are often derived from existing ones. For example, there are base images with a bare system and from these, special-purpose images are derived. This simplifies configuration, because we can start with a working base system instead of building everything from scratch.

To save space, the derived images contain only differences against their base. The differences are merged (overlaid) with the base image when the container is created.

This improves performance and saves not only disk space, but also memory (when you run multiple containers with the same base, only a single instance of base’s files is cached). Also, when you are downloading a new image and you already have the base, only the differences are downloaded.

This mechanism is similar to what Git does. It behaves as if every commit has its own copy of the project’s directory tree. Internally, it records differences between directory trees and their files.

Distributions and Alpine

The images can be built on the top of different distributions. For this reason, containers are an easy way to test your program in multiple distributions without having to setup triple-(or higher-) boot or having to manage multiple virtual machines.

You will notice that many containers are built on the top of a distribution called Alpine Linux. That is a minimalistic distribution designed for size and simplicity – its has about 6MB and the distribution does not use any complex configuration.

Alpine uses Apk (Alpine package manager) for its own packages. For example, the following command installs curl (which is not installed by default):

apk add curl

Setting up Docker/Podman

Install Docker or Podman.

To determine which one, the following command would help you.

grep cgroup /proc/filesystems

If you can see only the following line, then your kernel has not loaded cgroups v2 that are required for Podman.

nodev   cgroup

However, if you can see the following, you have cgroups v2 enabled and you should use Podman.

nodev   cgroup
nodev   cgroup2

Then proceed with the installation. Note that new versions of Fedora already switched to cgroup v2 and Podman is the only option to use. Hence, install with sudo dnf install podman.

All the following examples in this lab will use podman. If your distribution does not support Podman, replace with sudo docker.

Podman: setup of `/etc/subuid` and `/etc/subgid`

As we explained above, Podman needs a range of free user and group IDs on the host to map the container’s UIDs and GIDs to.

The superuser can assign blocks of UIDs/GIDs to regular users, which can be used for this purpose. These are called sub-UIDs/sub-GIDs and their assignment is recorded in /etc/subuid and /etc/subgid.

First of all, please check if your /etc/subuid already contains something like intro:100000:65536. If it does, you already have everything set up and you can skip the rest of this section.

Otherwise, make sure that the files exist and create new assignments in them using usermod:

sudo touch /etc/subuid /etc/subgid
sudo usermod --add-subuids 100000-165536 --add-subgids 100000-165536 YOUR_LOGIN

System (packages) upgrade may break Podman for various reasons. If this happens to you, you may try to run podman system migrate which is able to fix most of the errors related to transition to a newer version.

Docker: starting the service

For Docker, you need to ensure that docker is up and running. Typically, the following commands would be sufficient:

sudo package-manager-of-your-distribution install docker
sudo systemctl enable docker
sudo systemctl start docker

Basic health check

Execute podman info to get basic information about your system. You will see something like this:

host:
  arch: amd64
  ...
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    ...
  ...
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  ...
  os: linux
...
store:
  graphRoot: $HOME/.local/share/containers/storage
  ...
  runRoot: /run/user/1000/containers
  volumePath: $HOME/.local/share/containers/storage/volumes
version:
  APIVersion: 3.0.0
  ...

When debugging issues with Podman, always paste this information (unedited) into the Issue description (obviously, as a text inside ```, not as a screenshot!).

To check that you can execute containers, try the following command:

podman run --rm docker.io/library/alpine:latest cat /etc/os-release

If you see something like the following, you have everything set up. Otherwise feel free to open an Issue on the Forum and we will try to help you (do not forget to state which distribution you are using).

Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob 4abcf2066143 done   |
Copying config 05455a0888 done   |
Writing manifest to image destination
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.1
PRETTY_NAME="Alpine Linux v3.19"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

The first half of output is related to the download of the image. Only the second half of the output corresponds to the output of the command. Feel free to run the above command one more time (since the image is already downloaded) to get the following:

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.1
PRETTY_NAME="Alpine Linux v3.19"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

Podman is partially available in IMPAKT labs and the installation (albeit with some limitations) should be good enough for our purposes.

But it is much more comfortable to use your own machine.

If you run podman on linux.ms.mff.cuni.cz, always remove unused images. While the system has enough space for experimenting, the images can easily fill up the whole disk. Use podman images and podman rmi IMAGE_ID to remove them once you do need them (see below for further details).

Prepare for the labs

Before starting further experiments with Podman, ensure you have up-to-date copy of the examples repository.

We will be using the subdirectory 13/.

If you are running the examples in IMPAKT/Rotunda, clone the repository into /tmp as -v will not work for files on an AFS volume.

Running the first container

The first execution will be a bit more complex to give you a taste of what is possible. We will explain the details in the following sections.

The following assumes you are inside the directory 13 in the examples repository. It will launch an Nginx web server.

podman run --rm --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro docker.io/library/nginx:1.26

You will see similar output to the following.

Trying to pull docker.io/library/nginx:1.26...
Getting image source signatures
Copying blob 97573c8fbb99 done   |
Copying blob 13808c22b207 done   |
Copying blob 1201812f08b3 done   |
Copying blob 60f70a034a14 done   |
Copying blob 86824adef66e done   |
Copying blob 7f96f3100df0 done   |
Copying blob 4749f4ee1e34 done   |
Copying config e8d7a2a5de done   |
Writing manifest to image destination
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2024/04/24 12:23:15 [notice] 1#1: using the "epoll" event method
2024/04/24 12:23:15 [notice] 1#1: nginx/1.26.0
2024/04/24 12:23:15 [notice] 1#1: built by gcc 12.2.0 (Debian 12.2.0-14)
2024/04/24 12:23:15 [notice] 1#1: OS: Linux 6.6.7-arch1-1
2024/04/24 12:23:15 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 524288:524288
2024/04/24 12:23:15 [notice] 1#1: start worker processes
2024/04/24 12:23:15 [notice] 1#1: start worker process 24
2024/04/24 12:23:15 [notice] 1#1: start worker process 25
2024/04/24 12:23:15 [notice] 1#1: start worker process 26
2024/04/24 12:23:15 [notice] 1#1: start worker process 27
2024/04/24 12:23:15 [notice] 1#1: start worker process 28
2024/04/24 12:23:15 [notice] 1#1: start worker process 29
2024/04/24 12:23:15 [notice] 1#1: start worker process 30
2024/04/24 12:23:15 [notice] 1#1: start worker process 31

Open http://localhost:8080/ in your browser. You should see a NSWI177 Test Page in the browser.

If you see 403 Forbidden instead, append ,Z to the -v. Thus, the command would contain -v ./web:/usr/share/nginx/html:ro,Z. This is needed (and generally a good practice) when you are running on a machine with SELinux enabled in enforcing mode (default installation of Fedora but not on the USB disks from us).

When running on linux.ms.mff.cuni.cz you will need to specify a unique port number (only one application can listen at given port).

Virtually any number is fine as long as it is greater than 1024 and does not collide with anything else.

If you know SSH port forwarding (we will talk about it in next lab) you can set it up to view the results in your (graphical) browser but for basic testing even curl is enough.

Terminate the execution by killing Podman with Ctrl-C.

Note that the running Nginx webserver was printing its log – i.e., the list of accessed pages – to stdout.

Now open the page web/index.html in your browser. Again, you shall see a NSWI177 Test Page, but the URL would point to your local filesystem (i.e., file:///home/.../examples/13/web/index.html).

The above example illustrated three important features that are available with containers:

The web server in the container does not need any configuration or system-wide installation.
The container can listen on ports of the host system and forward network communication inside the container.
The container can access host’s files and use them.

All very good features for development, testing as well as distribution of your software.

Pulling and inspecting the images

The first thing that needs to be done when starting a container is to get its image. While Podman is able to pull the image as a part of the run subcommand, it is sometimes useful to fetch it as a separate step.

The command podman images prints a list of images that are present on your system. The output may look like this.

REPOSITORY                        TAG                  IMAGE ID      CREATED        SIZE
docker.io/library/alpine          latest               9ed4aefc74f6  2 weeks ago    7.34 MB
docker.io/library/nginx           1.20.0               7ab27dbbfbdf  6 days ago     137 MB
docker.io/library/fedora          34                   8d788d646766  2 weeks ago    187 MB
...

The repository refers to the on-line repository we fetched the image from. The tag is basically a version string. The image id is a unique identification of the image, it is generally derived from a cryptographic hash of the image contents. The remaining columns are self-descriptive.

When you execute podman pull IMAGE:TAG, Podman will fetch the image without starting any container. If you use latest as a tag, the latest available version will be fetched.

Pull docker.io/library/python:3-alpine and check that it has appeared in podman images afterwards.

Shorter image names

If you paste the following content into /etc/containers/registries.conf.d/unqualified.conf, you will not need to type docker.io/ in front of every image name. It is called an unqualified search and it is tried first for every image name.

unqualified-search-registries = ["docker.io"]

Companies can have their own repositories and you may set up multiple repositories here if you wish to try more of them when fully-qualified name is not provided.

Image repository

If you wonder where the images are coming from, have a look at https://hub.docker.com/. Anyone can upload their images there for others to use.

Similarly to Python package index, you may find malicious images here. At least, the containers are running isolated, so the chances of misbehaviour are limited a little bit (compared to pip install that you execute in the context of a normal user).

Images from the library group are official images endorsed by Docker itself and hence are relatively trustworthy.

Running containers

After the image is pulled, we can create a container from it.

We will start with an Alpine image because it is very small and thus very fast.

podman run --interactive --tty alpine:latest /bin/sh

If all went fine, you should see an interactive prompt / # and inspecting /etc/os-release should show you the following text (version numbers may differ):

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19.1
PRETTY_NAME="Alpine Linux v3.19"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

The run subcommand starts a container from a specified image. With --interactive and --tty (that are often combined into single -it) we specify that we want to attach a terminal to the container as we would use it interactively. The last part of the command is the program to run.

Inside the container, we can execute any commands we wish. We are securely contained and the changes will not affect the host system.

Install curl and check that you have functional network access. Solution.

Open a second terminal so that we can inspect how the container looks from the outside.

Inside the container, execute sleep 111 and in the other terminal (that is running in the host) execute ps -ef --forest. You shall see lines like the following:

student    1477313       1  0 16:29 ?        00:00:00 /usr/bin/conmon ...
student    1477316 1477313  0 16:29 pts/0    00:00:00  \_ /bin/sh
student    1477370 1477316  0 16:33 pts/0    00:00:00      \_ sleep 111

This confirms that the processes inside a container are visible from the outside.

Run ps -ef inside a container (or look into /proc there). What do you see? Is there something surprising? Solution.

Execute also podman ps. That prints list of running containers.

CONTAINER ID  IMAGE                            COMMAND  CREATED        STATUS            PORTS   NAMES
643b5e7cea06  docker.io/library/alpine:latest  /bin/sh  4 minutes ago  Up 4 minutes ago          practical_bohr

Container ID is again a unique identification, the other columns are self-descriptive. Note that since we have not specified a name, Podman assigned a random one.

If you terminate the session inside the container (exit or Ctrl-D), you will return to the host terminal.

Execute podman ps again. It is empty: the container is not running. If you add --all, you will see that the STATUS has changed.

Exited (130) 1 second ago

Note that if we would execute podman run ... again, we would start a new container. Try it now.

We will describe the container life cycle later on, if you wish to remove the container now, execute podman rm NAME. As NAME use the randomly assigned one or the CONTAINER ID.

Single shot runs

You can pass any command to podman run to be executed. If you know that you would be removing the container immediately afterwards, you can add --rm to tell Podman to remove it automatically once it finishes execution.

podman run --rm alpine:latest cat /etc/os-release

If you want to pass a more complicated command, it is better via sh -c. Change the above command to first cd to etc and then call cat os-release. Why the following does not work podman run --rm alpine:latest cd /etc && cat os-release? Solution.

Managing container life cycle

The containers are actually rather similar to services that we have talked about in Lab 09.

Starting a container

After we have terminated the interactive session, the container exited. We can call podman start CONTAINER to start it again.

Each container has a so-called entry point that is executed when the container is started. For a service-style container (e.g., with a web server), the service would be started again.

For our Alpine example, the entry point is /bin/sh (shell), so nothing interesting will happen.

Check that the container is running with podman ps.

Attaching to a running container

When the container is running, we can attach to it. podman attach basically connects the stdout of the entrypoint to your terminal. With our Alpine container, we can run command again inside the container.

We can also call podman exec -it CONTAINER CMD that connects to the running container in a new terminal (like a new tab). For us, running the following would work (replace with your container name).

podman exec -it practical_bohr /bin/sh

Run again ps -ef inside the container. Which processes do you see? Solution.

Terminating the exec-ed shell returns us back to the host. Terminating the attach-ed shell terminates the whole container.

Containers in background (with names)

For service-style containers (e.g. nginx that provides the webserver), we often want to run them in daemon mode – in background.

That is possible with a --detach option to the run command.

We will also add a name webserver to it so we can easily refer it.

podman run --detach --name webserver --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro  nginx:1.20.0

We will explain the -v and --publish later on.

This command starts the container and terminates. The webserver is running in the background. Check that you can again access http://localhost:8080/ in your browser.

You can stop such container with podman stop webserver. Kind of similar to systemctl stop .... Not a coincidence.

Check that after stopping the webserver, http://localhost:8080/ no longer works.

Starting the container again is possible with podman start webserver.

`start` and `stop` and stdout

Note that both start and stop print the name of the container that was started (stopped) on stdout. That is useful when executed in scripts, for interactive use we can simply ignore the output.

Clean-up actions

When we are done with a container, we can remove it (but first, we need to stop it).

Executing the following command would remove webserver container completely.

podman rm webserver

You can also remove pull-ed images using rmi subcommand.

For example, to remove the nginx:1.20.0, you can execute the following command.

podman rmi nginx:1.20.0

Note that Podman will refuse to remove an image if it is used by an existing container. Recall that the images are stacked and hence Podman cannot remove the underlying layers.

Limiting the isolation

By default, container is an isolated world. If you want to access it from the outside, you have to exec into it (for terminal-style work) or publish its services to the outside.

Port forwarding (a.k.a. port publishing)

For server-style containers (e.g. Nginx one we used above), that means exposing some of ports to the host computer. That is done with the --publish argument where you specify which port on the host (e.g., 8080) shall be forwarded into the container: to which port and which protocol (e.g., 80 and tcp).

Therefore, the argument --publish 8080:80/tcp means that we expect that the container itself offers a service on its port 80 and we want to make this (container’s) port available as 8080.

If you know what it SSH port forwarding (more about that on next labs) this is a rather similar concept.

We can start the nginx container without --publish, but it does not make much sense. Why? Solution.

Volume mounts

Another option how to break the container isolation is to bind a certain directory into the container. There are several options how to do that, we will show the --volume (or -v) parameter.

Podman installation in IMPAKT/Rotunda requires that the mounted directory resides outside AFS. /tmp is a good choice.

It takes (again colon-separated) three arguments: source directory on the host, mapping inside the container and options.

Our example ./web:/usr/share/nginx/html:ro thus specified that local (host) directory web shall be visible under /usr/share/nginx/html inside the container in read-only mode. It is very similar to normal mounts you already know.

If you specify rw instead of ro, you can modify the host files inside the container.

Volume mounting is useful for any service-style container. A typical example is a database server. You start the container and you give it a mounted volume. To this volume (directory), it will store the actual database (the data files). Thus, when the container terminates, your data are actually persistent as they were stored outside of the container.

This has a huge advantage for testing service updates. You stop the container, make a backup of the data directory and start a new container (with a newer version) on the top of the same data directory. If everything works fine, you are good to go. Otherwise, you can stop the new container, restore from the backup and return to the old version.

Very simple and effective.

Check your understanding

Select all true statements related to containers. You need to have enabled JavaScript for the quiz to work.

Exercise

Install the fscat command system-wide inside a container.

We recommend to use python:3.12-alpine.

Note that you will not need to set up any virtual environment in this case: the whole machine (container) is yours. You can install things system-wide.

Hint.

Solution.

Why we have all the `systemctl`, `dnf`, `podman`, `pip`, …

At this moment there might be certain confusion why there are so many concepts around that are basically dealing with the same issues.

We have package managers to install software (dnf install). But some software we can install also through language-specific managers (pip install).
Web server can be started via systemctl start or via creating a container.
We have virtual environments for software development but we have also containers and full-fledged virtual machines.
…

The truth is that some concepts and tools are consequences of historical development while others tackle some of the issues from different angles.

Feel free to return to this text at some later stage, e.g., after digesting the topic of containers a bit.

We will try to briefly explain why knowing about them all makes sense. Please, prefix each sentence with most of, usually or other similar adjective if you think that our generalization is too wide :-).

First of all it is important to distinguish needs of an end user (who can be even a webserver admin) from needs of a developer.

End user typically wants easy installation and is happy with running one version. They expect that upgrading is a seamless process.
Developers may have multiple versions of the same software installed, and might be even running two different versions at the same time. Upgrading (e.g., in the sense of upgrading of the required libraries) is a fragile process where plenty of things need to be tested.
And we can toss in also testers that need to ensure that the software works on wide range of hardware and software platforms.

All these groups are actually using the same software (piece of code), but their requirements are different.

Package managers thus deliver a well-tested version of the program to the user machine that is ready to be launched. Using a package manager means that there is a central authority that prevents installation of conflicting files and simplifies mass upgrades.

Developers do not want to install the program system-wide but they want to execute it as if it was installed system-wide (because that is how the program will be used in the end). Virtual environment provides a clean environment that is good enough for emulating a clean install into the same system as the developer uses. It is also very lightweight as the files remain on the same filesystem and thus configuration of other tools – such as IDE – is straightforward.

But this isolation (at the level of virtual environment sandbox) is rather thin as it does not isolate from other installed applications (recall that $PATH is extended, not replaced inside a virtual environment) or from network configuration. Containers provide higher level of isolation but at an extra cost. Access to files is possible but needs extra setup, the container usually does not contain any other software to simplify routine tasks etc.

On the other hand, the container provides a cleaner environment for testing because the user can be in full control of what is being installed.

Containers as well as virtual environments can happily coexist, even in multiple copies which again simplifies testing and development.

Altogether the mentioned tools fulfil different roles in the software development cycle. Theoretically, we can use containers for virtually everything as their isolation (at least partially) voids the needs for package managers as well as virtual environments but the easiness of use plays a major role here.

Always select the right tool for the job. The needs of administrators, users, developers and testers are different. And that is why there are so many tools around that are seemingly solving the same task. All of them have their use.

Printing and scanning in Linux

Below is a quick overview of Linux support for printing and scanning. Today, most devices are connected over network and – at least their basic functions – will work out of the box on Linux.

Extra packages (these you might need to download from the vendor pages) are usually needed only for extra features.

Printing with CUPS

Printing in Linux is handled by the CUPS subsystem that works out-of-the box with virtually every printer supporting IPP (internet printing protocol) and supports also many legacy printers.

Simple sudo dnf install cups installs the basic subsystem, extra drivers might be needed for specific models. OpenPrinting.org contains a searchable database to determine which (if any) drivers are needed. For example, for most HP printers, you would need to install the hplip package.

You typically want CUPS up and running on your system all the time, hence you need to enable it:

sudo systemctl enable --now cups

CUPS has a nice web interface that you can use to configure your printers. For many modern network-connected printers, even that is often unnecessary as they will be auto-discovered correctly.

If you have started CUPS already, try visiting http://localhost:631/. Under the Administration tab, you can add new printers. Selecting the right model helps CUPS decide which options to show in the printing dialog and enables proper functioning of grayscale printing and similar features.

Scanning images and documents with Sane

Scanner support on Linux is handled with SANE (Scanner Access Now Easy). As with printing, most scanners will be autodetected and if you already know GIMP, it has SANE support.

Add it with sudo dnf install xsane-gimp.

Actual scanning of the image can be done from File -> Create -> XSane dialog where you select your device, scanning properties (e.g., resolution or colors) and then you can start the actual scan.

Tasks to check your understanding

We expect you will solve the following tasks before attending the labs so that we can discuss your solutions during the lab.

Install the following Python package into a container (recall that we can use directly pip for installation).

Recall why using virtual environment does not make sense here.

After installation, check that you can run the newly installed program nswi177-lab13.

We recommend to use either fedora:37 image or Alpine for the installation. You may need to install python3 first, though.

Solution.

Start the Apache web server on the top of the 13/web directory. Use this httpd image. Verify that you are really using the Apache web server.

Solution.

The purpose of this task is to demonstrate how containers can be easily used for checking that your project is in a good state. Even if you use virtual environments etc., it is important to verify that your project can be installed into a clean environment.

Your task is to write commands into 13/test-in-alpine.txt that would clone repository fscat and run its tests.

Then run your script using the command line below and check that the tests were executed (the tests are executed via pytest -v tests/).

podman run --rm alpine:3.17 /bin/sh -c "$( cat 13/test-in-alpine.txt )"

Please, ensure that you do not redirect output of BATS tests and that you run the Python tests with -v so that you can see the following in the output (... is a placeholder for other messages, though).

1..3
ok 1 Works with a tarball
ok 2 Failure on bad filesystem path
ok 3 Failure on bad filename path
...
tests/test_fscat.py::test_cat_from_tar PASSED                            [ 25%]
tests/test_fscat.py::test_raises_on_invalid_filesystem_path PASSED       [ 50%]
tests/test_fscat.py::test_raises_on_invalid_filename_path PASSED         [ 75%]
tests/test_fscat.py::test_raises_when_filename_is_directory PASSED       [100%]

Your script must use set -e to exit on command failure so that failing tests are detected.

It is perfectly fine to use set -x to trace the execution (we highly recommend to use that switch as the first command in 13/test-in-alpine.txt file).

We highly recommend that you solve this task interactively first and then use history command to view what commands you have executed and from these you build the final script.

As a footnote: once you complete this task, note how containers make it easy to test your project across different Linux flavors: substitution of apk with dnf and alpine with fedora allows you to test it on Fedora with virtually no extra work.

Solution.

Learning outcomes

Learning outcomes provide a condensed view of fundamental concepts and skills that you should be able to explain and/or use after each lesson. They also represent the bare minimum required for understanding subsequent labs (and other courses as well).

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain what is a container
compare container with a virtual machine and a process
explain in what situations can be leveraged container isolation
explain container life-cycle
explain why using virtual environments (or other types of sandboxin) inside a container is typically not needed
explain a difference between a running container and a container image

Practical skills

Practical skills are usually about usage of given programs to solve various tasks. Therefore, you should be able to …

start interactive Podman container
start service-style Podman container
expose container ports
mount a volume into a container
clean unused containers and images
optional: use printing and scanning in Linux

set -e

apk add git python3 py3-pip bats

git clone https://gitlab.mff.cuni.cz/teaching/nswi177/2024/common/fscat.git

cd fscat

python3 -m pip install --break-system-packages  .
python3 -m pip install --break-system-packages -r requirements_test.txt

bats tests/base.bats
python3 -m pytest -v -W ignore::DeprecationWarning

PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
   25 root      0:00 ps -ef

/proc would be rather empty (or emptier than usual) as there are only two processes visible in the container.

Also note that the process IDs are starting from 1 (another thing that the kernel maps into the containers) and PID 1 is /bin/sh. Recall that PID 1 is the first process of the machine and its termination means machine shutdown.

podman run --rm alpine:latest /bin/sh -c 'cd /etc && cat os-release'

The command podman run --rm alpine:latest cd /etc && cat os-release will not work because the && cat os-release is executed by the host shell, i.e., not inside the container (inside the container, we only change current directory and terminate).

podman pull python:3.12-alpine

We run the container (consider removing --rm if you want to experiment with the same container more).

podman run -it --rm python:3.12-alpine /bin/sh

Inside the container, we first install the required package (see hint) and then build and install the utility.

apk add git
python3 -m pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2024/common/fscat.git

The command was installed system-wide into /usr/local/bin together with required Python modules.

fscat ... # Works

We will use the Alpine Linux based image for this exercise.

podman pull httpd:alpine

podman run --rm -it  -v ./web/:/usr/local/apache2/htdocs/:ro  -p 8080:80/tcp httpd:alpine

And in second terminal, run:

curl --silent -v http://localhost:8080 >/dev/null

PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
   25 root      0:00 ps -ef

/proc would be rather empty (or emptier than usual) as there are only two processes visible in the container.

set -e

apk add git python3 py3-pip bats

git clone https://gitlab.mff.cuni.cz/teaching/nswi177/2024/common/fscat.git

cd fscat

python3 -m pip install --break-system-packages  .
python3 -m pip install --break-system-packages -r requirements_test.txt

bats tests/base.bats
python3 -m pytest -v -W ignore::DeprecationWarning

podman run --rm alpine:latest /bin/sh -c 'cd /etc && cat os-release'

podman pull python:3.12-alpine

We run the container (consider removing --rm if you want to experiment with the same container more).

podman run -it --rm python:3.12-alpine /bin/sh

Inside the container, we first install the required package (see hint) and then build and install the utility.

apk add git
python3 -m pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2024/common/fscat.git

The command was installed system-wide into /usr/local/bin together with required Python modules.

fscat ... # Works

We will use the Alpine Linux based image for this exercise.

podman pull httpd:alpine

podman run --rm -it  -v ./web/:/usr/local/apache2/htdocs/:ro  -p 8080:80/tcp httpd:alpine

And in second terminal, run:

curl --silent -v http://localhost:8080 >/dev/null

Preflight checklist

Containers

Docker and Podman

Terminology …

Image stacking

Distributions and Alpine

Setting up Docker/Podman

Podman: setup of /etc/subuid and /etc/subgid

Docker: starting the service

Basic health check

Prepare for the labs

Running the first container

Pulling and inspecting the images

Shorter image names

Image repository

Running containers

Single shot runs

Managing container life cycle

Starting a container

Attaching to a running container

Containers in background (with names)

start and stop and stdout

Clean-up actions

Limiting the isolation

Port forwarding (a.k.a. port publishing)

Volume mounts

Check your understanding

Exercise

Why we have all the systemctl, dnf, podman, pip, …

Printing and scanning in Linux

Printing with CUPS

Scanning images and documents with Sane

Tasks to check your understanding

Learning outcomes

Conceptual knowledge

Practical skills

Podman: setup of `/etc/subuid` and `/etc/subgid`

`start` and `stop` and stdout

Why we have all the `systemctl`, `dnf`, `podman`, `pip`, …