Přeložit do češtiny pomocí Google Translate ...
Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Table of contents
- Light introduction to containers
- Setting up
- Running the first container
- Pulling and inspecting the images
- Running containers
- Managing container lifecycle
- Clean-up actions
- Limiting the isolation
- Exercise
- GitLab CI
- Graded tasks
- Changelog
Today lab will focus on containers – a very light-weight virtual machines. In the end, we will use this knowledge to setup a GitLab pipeline to execute code – such as tests – for each commit and keep our code healthy (in the green).
Light introduction to containers
Note that in the introductory part of the text we intentionally ignore the difference between an image and a container. We believe it is a bit easier for the first steps.
Containers are another approach for isolation. We have already seen project sandboxing and many of you have tried running virtualized Linux installation.
Containers are somewhere in between.
They offer an isolated environment that generally behave as a fully virtualized
machine.
From the implementation point of view, they are closer to virtual environment
as processes inside a container are visible from the host system.
We can imagine them as if we gave the container one directory
(containing all the usual subdirectories such as /dev
, /proc
or /home
)
to run in without an option to escape.
Because of the above, containers can run only the applications written for the same operating system (unlike a full-fledged virtual machine).
Because of their separation from the host system, containers are extremely useful in many scenarios. Note that using a fully virtualized machine (e.g. VirtualBox or QEMU) is an option too, but containers are light-weight and thus have a smaller overhead (e.g., faster start-up time).
A typical example is the need to run an isolated server that you need for development. You can imagine a database server or a web server here. You can certainly install such server system wide (recall lab 11) but it does not provide the isolation and the easiness of removal. Recall how virtual environment was nice that removing one directory cleaned-up the workspace for you.
Similarly, removing a container is a simple and fast operation and you can start with a fresh one in matter of seconds.
Using a container also has the advantage that you can specify exactly how the container
shall look like: what processes it spawns, on which ports it listens etc.
Such specification can be easily codified (similar to requirements.txt
) and thus
easily reproduced on a different machine.
Container images are also often used when you need to ship a complex application that requires several services to execute correctly. Instead of providing a detailed manual or a VirtualBox image, you provide a ready-to-be-run container. The user then launches the whole container and internally, the container takes care of the rest, exposing the final service. For example, the whole GitLab can be downloaded and used as a container.
Docker and Podman
In this lab we will show a basic use of a Linux containers based on
Docker
and Podman.
These solutions are virtually the same.
Their main commands (docker
and podman
) support exactly the same arguments
and have the same semantics in most cases.
The main difference is that Docker is a bit older (though still actively developed) and was intended for system-wide containers (e.g., when you wanted to run a self-hosted instance of GitLab). Podman is a bit younger and uses newer features of Linux kernel that allowed it to execute containers without superuser privileges (that is actually still quite a new feature of Linux).
In this sense, Podman is the perfect choice for a developer. You need a database server? Use Podman to get the right container and start it. Your database is clean and ready to be used. Without a need for superuser – root – privileges (this is also often called rootless mode).
On the other hand, if you run an older version of Linux or the container requires some Docker-specific features, Docker might be a better choice.
Terminology …
There are two main concepts related to this lab. An image and a container. They are somewhat similar to a class and an object (instance).
The image is like a hard disk for the isolated environment. It contains all the necessary files, including executables as well as data files.
To run it, we create a container. The container starts with the same state as the image but contains the running processes that might be modifying its state. Unless explicitly stated otherwise, the changes done by the container are not propagated to the image: instead, the container starts with a copy of the image (files) and modifies the copy.
Processes inside the container are isolated from the outside (the host) and the container do not see processes of the host.
On the other hand, processes in the container are visible in the host system.
As a security measure, the kernel uses artificial user ids for the processes
in the container.
Therefore, a superuser (root
) in a container looks like a normal user
(usually with very high uid) to the host system.
In other words, kernel provides a mapping between a container user and an
outside (host’s) user.
As a matter of fact, in most cases, when you execute anything inside the container, you execute it as local (e.g. in-container) superuser.
It is up to the user to configure this mapping: only the user is able to tell whether some user ID (uid) is free or not. Typically, each (host system) user is assigned a range of free sub uids that can be used inside the containers.
Sidenote: image stacking
A usual practice for the images is to derive new images from existing ones. For example, there are base images with bare system and from these, special purpose ones are derived.
That simplifies the configuration, because the image can start with a functional base system instead of starting from scratch.
To save space, the derived images contain only the differences from the base ones. Only when the container is created, the actual file content is merged (overlayed). That is mostly for performance reasons and also a reason why downloading of some images can be unproportionately faster: if they add only few new files and you already have the base image downloaded, you only download a small difference instead of a whole image.
It is somehow similar to Git commits: each Git commit describes change from the previous one. Derived image also contains only the changes – deltas – to the image that is below it in the stack.
Setting up
Before staring with Podman, ensure you have up-to-date copy of the
examples repository.
We will be using the subdirectory 13/
.
Install Docker or Podman.
To determine which one, the following command would help you.
grep cgroup /proc/filesystems
If you can see only the following line, then your kernel has not loaded cgroups v2 that are required for Podman.
nodev cgroup
However, if you can see the following, you have cgroups v2 enabled and you should use Podman.
nodev cgroup
nodev cgroup2
Then proceed with the installation.
Note that new versions of Fedora already switched to cgroup v2 and Podman
is the only option to use.
Hence, install with sudo dnf install podman
.
All the following examples in this lab will use podman
.
If your distribution does not support Podman, replace with sudo docker
.
Podman: /etc/subuid
preparation
As we explained above, Podman needs a range of free user IDs to map process from the container to a real user IDs of the host.
The commands below first ensure that the required file exists (usually not needed but it will not hurt anything) and then add a range of free uids for your username.
Update: check first contents of /etc/subuid
and /etc/subgid
.
If they already contains something like intro:100000:65536
, then the following
commands are not needed (i.e., things are already set-up).
sudo touch /etc/subuid /etc/subgid
sudo usermod --add-subuids 100000-165536 --add-subgids 100000-165536 YOUR_LOGIN
Update: You will need to re-login (i.e., logout from the GUI completely and login again) for the above change to take effect (reboot shall not be needed).
Update: System (packages) upgrade may break Podman for various reasons.
If that happens to you, you may try to run podman system migrate
as that is able
to fix most of the errors related to transition to a newer version.
Docker: starting the service
For Docker, we need to ensure that dockerd
is up and running.
Typically, following commands would be sufficient.
sudo package-manager-of-your-distribution install docker
sudo systemctl enable dockerd
sudo systemctl start dockerd
Basic health check
Execute podman info
to get a basic information about your system.
You will see something like this.
host:
arch: amd64
...
cgroupManager: systemd
cgroupVersion: v2
conmon:
...
...
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
...
os: linux
...
store:
graphRoot: $HOME/.local/share/containers/storage
...
runRoot: /run/user/1000/containers
volumePath: $HOME/.local/share/containers/storage/volumes
version:
APIVersion: 3.0.0
...
When debugging issues with Podman, always paste this information (unedited) into
the Issue description (obviously, as a text inside ```
, not as a screenshot!).
Running the first container
The first execution will be a bit more complex to give you a taste of what is possible. We will explain the details in the following sections.
The following assumes you are inside the directory 13
in the
examples repository.
It will launch an Nginx web server.
podman run --rm --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro docker.io/library/nginx:1.20.0
You will see similar output to the following.
Trying to pull docker.io/library/nginx:1.20.0...
Getting image source signatures
Copying blob 525e372d6dee done
Copying blob 69692152171a done
Copying blob b141b026b9ce done
Copying blob 8d70dc384fb3 done
Copying blob 965615a5cec8 done
Copying blob 6e60219fdb98 done
Copying config 7ab27dbbfb done
Writing manifest to image destination
Storing signatures
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/05/18 13:15:55 [notice] 1#1: using the "epoll" event method
2021/05/18 13:15:55 [notice] 1#1: nginx/1.20.0
2021/05/18 13:15:55 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6)
2021/05/18 13:15:55 [notice] 1#1: OS: Linux 5.10.16-arch1-1
2021/05/18 13:15:55 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 524288:524288
2021/05/18 13:15:55 [notice] 1#1: start worker processes
2021/05/18 13:15:55 [notice] 1#1: start worker process 26
2021/05/18 13:15:55 [notice] 1#1: start worker process 27
2021/05/18 13:15:55 [notice] 1#1: start worker process 28
2021/05/18 13:15:55 [notice] 1#1: start worker process 29
Open http://localhost:8080/ in your browser. You should see a NSWI177 Test Page in the browser.
Update: if you see 403 Forbidden instead, append ,Z
to the -v
.
Thus, the command would contain -v ./web:/usr/share/nginx/html:ro,Z
.
This is needed (and generally a good practice) when you are running on a machine
with SELinux enabled in enforcing mode (default installation of Fedora but
not on the USB disks from us).
Terminate the execution by killing Podman with Ctrl-C
.
Note that the running Nginx webserver was printing its log – i.e. list of accessed pages – to stdout.
Also open the page web/index.html
in your browser.
Again, you shall see a NSWI177 Test Page, but the URL would point to your local
filesystem (i.e., file:///home/.../examples/13/web/index.html
).
The above example illustrated three important features that are available with containers.
- The web server in the container have not needed any configuration or system wide installation.
- The container can access local ports and forward network communication into it.
- The container can access local files and use them.
All very good features for development, testing as well as distribution of your software.
Pulling and inspecting the images
The first thing that needs to be done when starting a container is to get
its image.
While Podman is able to pull the image as part of the run
subcommand,
it is sometimes useful to fetch it as a separate step.
The command podman images
prints a list of images that are present on your system.
The output may look like this.
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/nginx 1.20.0 7ab27dbbfbdf 6 days ago 137 MB
docker.io/library/fedora 34 8d788d646766 2 weeks ago 187 MB
...
The repository refers to the on-line repository we fetched the image from, tag is basically a version string. Image id is a unique identification of the image, it is generally derived from checksum of the image itself. The remaining columns are self-descriptive.
The tag (delimited by colon) is often a specific version string or latest
to denote
the latest available version.
When you execute podman pull IMAGE
, Podman will fetch the image without starting
any container.
Pull docker.io/library/alpine:latest
and check that it has appeared in podman images
afterward.
Shorter image names
If you paste the following content into /etc/containers/registries.conf.d/unqualified.conf
,
you will not need to type docker.io/
in front of every image name.
It is called an unqualified search and it is tried first for every image name.
unqualified-search-registries = ["docker.io"]
Companies can have their own repositories and you may set-up multiple repositories here if you wish to try more of them when fully-qualified name is not provided.
Image repository
If you wonder where the images are coming from, have a look at https://hub.docker.com/. Anyone can upload their images there for others to use.
Similarly to Python package index, you may find malicious
images here.
At least, the containers are running isolated so the chances of misbehaviour
are limited a little bit (compared to pip install
that you execute in the context
of a normal user).
Images from the library
group are official images endorsed by Docker itself
and hence are relatively trustworthy.
Distributions and Alpine
The images can be built on top of different distributions. In this sense, containers are easy way to test your program in multiple distributions without having to setup a triple-(or higher-) boot or having to manage multiple virtual machines.
You will notice that many containers are built on top of a distribution called Alpine Linux. That is a very small distribution that is often used as a baseline image for many other images. The reason is mostly its size and simplicity – base image is about 6MB and the distribution does not use any complex configuration.
Alpine uses Apk (Alpine package manager) for its own packages. For example, following command installs cURL (that is not installed by default).
apk add curl
You will see how to run an Alpine container in the next section.
Running containers
After the image is pulled, we can create a container from it.
We will start with an Alpine image because it is very small and thus very fast.
podman run --interactive --tty alpine:latest /bin/sh
If all went fine, you should see an interactive prompt / #
and
inspecting /etc/os-release
should show you the following
text (version numbers may differ).
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.13.5
PRETTY_NAME="Alpine Linux v3.13"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
The run
subcommand starts a container from a specified image.
With --interactive
and --tty
(that are often combined into
single -it
) we specify that we want to attach a terminal to
the container as we would use it interactively.
The last part of the command is the program to run.
Inside the container, we can execute any commands we wish. We are securely contained and the changes will not affect the host system.
Install curl
and check that you have functional network
access.
Solution.
Open a second terminal so that we can inspect how the container looks from the outside.
Inside the container, execute sleep 111
and in the other
terminal (that is running in the host) execute ps -ef --forest
.
You shall see lines like the following:
student 1477313 1 0 16:29 ? 00:00:00 /usr/bin/conmon ...
student 1477316 1477313 0 16:29 pts/0 00:00:00 \_ /bin/sh
student 1477370 1477316 0 16:33 pts/0 00:00:00 \_ sleep 111
This confirms that the processes inside a container are visible from the outside.
Run ps -ef
inside a container (or look into /proc
there).
What do you see?
Solution.
Execute also podman ps
.
That prints list of running containers.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
643b5e7cea06 docker.io/library/alpine:latest /bin/sh 4 minutes ago Up 4 minutes ago practical_bohr
Container ID is again a unique identification, rest of the columns are self-descriptive. Note that since we have not specified a name, Podman assigned a random one.
If you terminate the session inside the container (exit
or Ctrl-D
),
you will return to the host terminal.
Execute podman ps
again.
It is empty: the container is not running.
If you add --all
, you will see that the STATUS
has changed.
Exited (130) 1 second ago
Note that if we would execute podman run ...
again, we would start
a new container.
Try it now.
We will describe the container lifecycle later on, if you wish to remove
the container now, execute podman rm NAME
.
Instead of NAME
, you can use the randomly assigned one or CONTAINER ID
.
Single shot runs
You can pass any command to podman run
to be executed.
If you know that you would be removing the container immediately afterwards,
you can add --rm
to tell Podman to remove it automatically once it finishes execution.
podman run --rm alpine:latest cat /etc/os-release
If you want to pass a more complicated command, it is better via sh -c
.
Change the above command to first cd
to etc
and then call cat os-release
.
Why the following does not work podman run --rm alpine:latest cd /etc && cat os-release
?
Solution.
Managing container lifecycle
Starting a container
After we have terminated the interactive session, the container exited.
We can call podman start CONTAINER
to start it again.
Each container has a so-called entry point that is executed when the container is started. For a service-style container (e.g., with a web server), the service would be started again.
For our Alpine example, the entrypoint is /bin/sh
(shell), so
nothing interesting will happen.
Check that the container is running with podman ps
.
Attaching to a running container
When the container is running, we can attach to it.
podman attach
basically connects the stdout of the entrypoint
to your terminal.
With our Alpine container, we can run command again inside
the container.
We can also call podman exec -it CONTAINER CMD
that connects
to the running container in a new terminal (like a new tab).
For us, running the following would work (replace with your container name).
podman exec -it practical_bohr /bin/sh
Run again ps -ef
inside the container.
Which processes do you see?
Solution.
Terminating the exec
-ed shell returns us back to the host.
Terminating the attach
-ed shell terminates the whole container.
Containers in background (with names)
For service-style containers (e.g. nginx
that provides the webserver), we
often want to run them in daemon mode – in background.
That is possible with --detach
parameter to the run
command.
We will also add a name webserver
to it so we can easily refer it.
podman run --detach --name webserver --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro nginx:1.20.0
We will explain the -v
and --publish
later on.
This command starts the container and terminates. The webserver is running in the background. Check that you can again access http://localhost:8080/ in your browser.
You can stop such container with podman stop webserver
.
Kind of similar to systemctl stop ...
.
Not a coincidence.
Check that after stopping the webserver, http://localhost:8080/ no longer works.
Starting the container again is possible with podman start webserver
.
start
and stop
and stdout
Note that both start
and stop
print the name of the container that
was started (stopped) on stdout.
That is useful when executed in scripts, for interactive use we can
simply ignore the output.
Clean-up actions
When we are done with a container, we can remove it
(but first, we need to stop
it).
Executing the following command would remove webserver
container completely.
podman rm webserver
You can also remove pull
-ed images using rmi
subcommand.
For example, to remove the nginx:1.20.0
, you can execute the following command.
podman rmi nginx:1.20.0
Note that Podman will refuse to remove an image if it is used by existing container. Recall that the images are stacked and hence Podman cannot remove the underlying layers.
Limiting the isolation
By default, container is an isolated world.
If you want to access it from the outside, you have to exec
into it
(for terminal-style work) or publish its services to the outside.
Port forward (a.k.a. port publishing)
For server-style containers (e.g. Nginx one we used above), that means
exposing some of ports to the host computer.
That is done with the --publish
argument where you specify which
port on the host (e.g., 8080
) shall be forwarded into the container:
to which port and which protocol (e.g., 80
and tcp
).
Therefore, the argument --publish 8080:80/tcp
means that we expect
that the container itself offers service on its port 80
and we want to
make this (container’s) port available as 8080
.
It is similar to SSH port forwarding with -L
.
We can start the nginx
container without --publish
but it does not
make much sense.
Volume mounts
Another option how to break the container isolation is to bind a certain
directory into the container.
There are several options how to do that, we will show the
--volume
(or -v
) parameter.
It takes (again colon-separated) three arguments: source directory on the host, mapping inside the container and options.
Our example ./web:/usr/share/nginx/html:ro
thus specified that local
(host) directory web
shall be visible under /usr/share/nginx/html
inside the container in read-only mode.
It is very similar to normal mounts you already know.
If you specify rw
instead of ro
, you can modify the files inside the
container.
Volume mounting is useful for any service-style container. A typical example is a database server. You start the container and you give it a mounted volume. To this volume (directory), it will store the actual database (the data files). Thus, when the container terminates, your data are actually persistent as they were stored outside of the container.
This has a huge advantage for testing service updates. You stop the container, make a backup of the data directory and start a new container (with a newer version) on top of the same data directory. If everything works fine, you are good to go. Otherwise, you can stop the new container and restore from backup and return to the old one.
Very simple and effective.
Exercise
Start Apache web server on top of the 13/web
directory.
Use this httpd image.
Verify that you are really using Apache web server.
Solution.
Bind the examples repository into the container and install the
timestamp2iso
command
We recommend to use python:3.9-alpine
.
Note that you will not need to setup any virtual environment
in this case: the whole machine (container) is yours.
You can install things system-wide.
Hint.
Solution.
GitLab CI
You have already used continuous-integration features of GitLab. GitLab pipeline is one of them.
If you have never heard the term continuous integration, then
it is the following in a nutshell.
To ensure that the software you build is in healthy state, you
should run tests on it often and fix broken things as soon as
possible (because the cost of bug fixes rises dramatically with each
day they are undiscovered).
The answer to this is that the developer shall run tests on
each commit.
Since that is difficult to enforce, it is better to do this
automatically.
CI in its simplest form refers to state where automated tests
(e.g. BATS-based or using Python Nose) are executed automatically
after each push to the origin/master
branch, e.g. to GitLab.
In this text you will see how to setup GitLab CI to your needs.
The important thing to know is that GitLab CI can run on top of
Docker images.
Hence, to setup a GitLab pipeline, you choose a Podman image
to run on and which commands to execute.
And GitLab will run
the container for you and run your script
in it.
And depending on the outcome of the whole script (i.e., its exit code), it will mark the pipeline as passing or failing.
.gitlab-ci.yml
The configuration of the GitLab CI is stored inside file .gitlab-ci.yml
that has to be stored in the root directory of the project.
Your submission repository contains one of the simplest setups possible.
# Automated tests
nswi177-tests:
image: mffd3s/nswi177-base:latest
script:
- ./tools/run_tests.sh
It specifies a pipeline job nswi177-tests (you will see this name
in the web UI) that is executed using
mffd3s/nswi177-base image
and it executes a single script: ./tools/run_tests.sh
.
Note that GitLab will mount the Git repository into the container first and change current directory to it.
Emulate the run locally. Hint. Solution.
Note that the command you created for running the script locally on top of the given image is virtually identical to the one executed by GitLab. GitLab does some extra caching and other performance-related tweaks, but conceptually, there is nothing more. And your code is tested in a reproducible way in a clean container (that is, in a sense, undistinguishable from a full virtual machine).
Exercise
Add your own pipeline to GitLab that would check that you never use
/usr/bin/python
in a shebang.
Hint.
Solution.
Other bits
Notice how using GitLab pipeline is easy. You find the right image, specify your script and GitLab takes care of the rest.
From now on, every project you create on GitLab should have a pipeline that runs the tests (this includes Shellcheck, Pylint etc.). Set it up NOW for your assignments in other courses. Set it up for your Individual Software Project (NPRG045) next year. Use the chance to have your code regularly tested. It will save your time in the long run.
If you are unsure about which image to choose, official images are a good start. The script can have several steps where you install missing dependencies before running your program.
Recall that you do not need to create a virtual environment: the whole machine is yours (and would be removed afterwards), so you can install things globally. Therefore, a typical pipeline setup for a Python project would look like this (compare to the exercise above):
pipeline-name:
image: python:3.9
script:
- cd subdir/with/project/if/needed
- pip install -r requirements.txt
- ./setup.py build
- ./setup.py install
- use-of-the-installed-program
As you have probably guessed, GitLab will merge individual items into a single
script, executed with set -e
.
Graded tasks
13/shellcheck.sh
(and .gitlab-ci.yml
) (40 points)
Write a script that runs ShellCheck for all scripts in your repository.
Update your .gitlab-ci.yml
to execute this script with each commit (push).
The pipeline shall fail if any of your scripts contains ShellCheck issues.
Name the pipeline shellcheck
so we can easily find it.
Feel free to reuse code from check_is_shellchecked
function in our tests
for your implementation.
Also consider reusing parts of the script for checking for bad shebang from
the exercise above.
13/command.txt
(20 points)
Image docker.io/mffd3s/nswi177-labs-2021-command:latest
contains a command nswi177-task-command
.
Run this command with your GitLab username and paste its output into 13/command.txt
.
13/volume.txt
(20 points)
Image docker.io/mffd3s/nswi177-labs-2021-volume:latest
contains a command nswi177-task-volume
.
Pull the image and run this command in a container based on this image.
Mount your submission repository under /srv/nswi177/
.
Note that your submission repository must have been cloned via SSH.
If everything is fine, the command will print two hexadecimal strings.
Copy them into 13/volume.txt
.
13/port.txt
(20 points)
Image docker.io/mffd3s/nswi177-labs-2021-port:latest
contains a webserver on port 80.
Access this webserver and copy content of its reply (to GET /
)
to this file.
Deadline: June 14, AoE
Solutions submitted after the deadline will not be accepted.
Note that at the time of the deadline we will download the contents of your project and start the evaluation. Anything uploaded/modified later on will not be taken into account!
Note that we will be looking only at your master branch (unless explicitly specified otherwise), do not forget to merge from other branches if you are using them.
Changelog
2021-05-26: SELinux (,Z
), double entries in /etc/subuid
and podman system migrate
.
2021-05-24: Re-login after usermod
.