Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.
Today lab will focus on containers – a very light-weight virtual machines. In the end, we will use this knowledge to setup a GitLab pipeline to execute code – such as tests – for each commit and keep our code healthy (in the green).
Setup
Before staring with Podman, ensure you have up-to-date copy of the
examples repository.
We will be using the subdirectory 14/
.
Podman is not available in IMPAKT labs (actually, it is installed but
you will not be able to execute anything). Feel free to use the shared
machine linux.ms.mff.cuni.cz
.
But it is much more comfortable to use your own machine as you do not have
to setup further SSH port forwards etc.
To check that your setup is okay, try the following command:
podman run --rm docker.io/library/alpine:latest cat /etc/os-release
If you see something like the following, you have everything set up. Otherwise feel free to open an Issue on the Forum and we will try to help you (do not forget to state which distribution you are using).
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob df9b9388f04a done
Copying config 0ac33e5f5a done
Writing manifest to image destination
Storing signatures
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.4
PRETTY_NAME="Alpine Linux v3.15"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
If you run podman
on linux.ms.mff.cuni.cz
always remove unused images.
While the system has enough space for experimenting, the images can easily
fill-up the whole disk. Use podman images
and podman rmi IMAGE_ID
to
remove them once you do need them (see below for further details).
Running the first container
The first execution will be a bit more complex to give you a taste of what is possible. We will explain the details in the following sections.
The following assumes you are inside the directory 14
in the
examples repository.
It will launch an Nginx web server.
podman run --rm --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro docker.io/library/nginx:1.20.0
You will see similar output to the following.
Trying to pull docker.io/library/nginx:1.20.0...
Getting image source signatures
Copying blob 525e372d6dee done
Copying blob 69692152171a done
Copying blob b141b026b9ce done
Copying blob 8d70dc384fb3 done
Copying blob 965615a5cec8 done
Copying blob 6e60219fdb98 done
Copying config 7ab27dbbfb done
Writing manifest to image destination
Storing signatures
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/05/18 13:15:55 [notice] 1#1: using the "epoll" event method
2021/05/18 13:15:55 [notice] 1#1: nginx/1.20.0
2021/05/18 13:15:55 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6)
2021/05/18 13:15:55 [notice] 1#1: OS: Linux 5.10.16-arch1-1
2021/05/18 13:15:55 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 524288:524288
2021/05/18 13:15:55 [notice] 1#1: start worker processes
2021/05/18 13:15:55 [notice] 1#1: start worker process 26
2021/05/18 13:15:55 [notice] 1#1: start worker process 27
2021/05/18 13:15:55 [notice] 1#1: start worker process 28
2021/05/18 13:15:55 [notice] 1#1: start worker process 29
Open http://localhost:8080/ in your browser. You should see a NSWI177 Test Page in the browser.
If you see 403 Forbidden instead, append ,Z
to the -v
.
Thus, the command would contain -v ./web:/usr/share/nginx/html:ro,Z
.
This is needed (and generally a good practice) when you are running on a machine
with SELinux enabled in enforcing mode (default installation of Fedora but
not on the USB disks from us).
Terminate the execution by killing Podman with Ctrl-C
.
Note that the running Nginx webserver was printing its log – i.e., the list of accessed pages – to stdout.
Now open the page web/index.html
in your browser.
Again, you shall see a NSWI177 Test Page, but the URL would point to your local
filesystem (i.e., file:///home/.../examples/14/web/index.html
).
The above example illustrated three important features that are available with containers:
- The web server in the container does not need any configuration or system-wide installation.
- The container can listen on ports of the host system and forward network communication inside the container.
- The container can access host’s files and use them.
All very good features for development, testing as well as distribution of your software.
Pulling and inspecting the images
The first thing that needs to be done when starting a container is to get
its image.
While Podman is able to pull the image as a part of the run
subcommand,
it is sometimes useful to fetch it as a separate step.
The command podman images
prints a list of images that are present on your system.
The output may look like this.
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/nginx 1.20.0 7ab27dbbfbdf 6 days ago 137 MB
docker.io/library/fedora 34 8d788d646766 2 weeks ago 187 MB
...
The repository refers to the on-line repository we fetched the image from. The tag is basically a version string. The image id is a unique identification of the image, it is generally derived from a cryptographic hash of the image contents. The remaining columns are self-descriptive.
When you execute podman pull IMAGE:TAG
, Podman will fetch the image without starting
any container. If you use latest
as a tag, the latest available version will be fetched.
Pull docker.io/library/python:3-alpine
and check that it has appeared in podman images
afterwards.
Shorter image names
If you paste the following content into /etc/containers/registries.conf.d/unqualified.conf
,
you will not need to type docker.io/
in front of every image name.
It is called an unqualified search and it is tried first for every image name.
unqualified-search-registries = ["docker.io"]
Companies can have their own repositories and you may set up multiple repositories here if you wish to try more of them when fully-qualified name is not provided.
Image repository
If you wonder where the images are coming from, have a look at https://hub.docker.com/. Anyone can upload their images there for others to use.
Similarly to Python package index, you may find malicious
images here.
At least, the containers are running isolated, so the chances of misbehaviour
are limited a little bit (compared to pip install
that you execute in the context
of a normal user).
Images from the library
group are official images endorsed by Docker itself
and hence are relatively trustworthy.
Running containers
After the image is pulled, we can create a container from it.
We will start with an Alpine image because it is very small and thus very fast.
podman run --interactive --tty alpine:latest /bin/sh
If all went fine, you should see an interactive prompt / #
and
inspecting /etc/os-release
should show you the following
text (version numbers may differ):
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.13.5
PRETTY_NAME="Alpine Linux v3.13"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"
The run
subcommand starts a container from a specified image.
With --interactive
and --tty
(that are often combined into
single -it
) we specify that we want to attach a terminal to
the container as we would use it interactively.
The last part of the command is the program to run.
Inside the container, we can execute any commands we wish. We are securely contained and the changes will not affect the host system.
Install curl
and check that you have functional network
access.
Solution.
Open a second terminal so that we can inspect how the container looks from the outside.
Inside the container, execute sleep 111
and in the other
terminal (that is running in the host) execute ps -ef --forest
.
You shall see lines like the following:
student 1477313 1 0 16:29 ? 00:00:00 /usr/bin/conmon ...
student 1477316 1477313 0 16:29 pts/0 00:00:00 \_ /bin/sh
student 1477370 1477316 0 16:33 pts/0 00:00:00 \_ sleep 111
This confirms that the processes inside a container are visible from the outside.
Run ps -ef
inside a container (or look into /proc
there).
What do you see? Is there something surprising?
Solution.
Execute also podman ps
.
That prints list of running containers.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
643b5e7cea06 docker.io/library/alpine:latest /bin/sh 4 minutes ago Up 4 minutes ago practical_bohr
Container ID is again a unique identification, the other columns are self-descriptive. Note that since we have not specified a name, Podman assigned a random one.
If you terminate the session inside the container (exit
or Ctrl-D
),
you will return to the host terminal.
Execute podman ps
again.
It is empty: the container is not running.
If you add --all
, you will see that the STATUS
has changed.
Exited (130) 1 second ago
Note that if we would execute podman run ...
again, we would start
a new container.
Try it now.
We will describe the container life cycle later on, if you wish to remove
the container now, execute podman rm NAME
.
Instead of NAME
, you can use the randomly assigned one or CONTAINER ID
.
Single shot runs
You can pass any command to podman run
to be executed.
If you know that you would be removing the container immediately afterwards,
you can add --rm
to tell Podman to remove it automatically once it finishes execution.
podman run --rm alpine:latest cat /etc/os-release
If you want to pass a more complicated command, it is better via sh -c
.
Change the above command to first cd
to etc
and then call cat os-release
.
Why the following does not work podman run --rm alpine:latest cd /etc && cat os-release
?
Solution.
Managing container life cycle
Starting a container
After we have terminated the interactive session, the container exited.
We can call podman start CONTAINER
to start it again.
Each container has a so-called entry point that is executed when the container is started. For a service-style container (e.g., with a web server), the service would be started again.
For our Alpine example, the entry point is /bin/sh
(shell), so
nothing interesting will happen.
Check that the container is running with podman ps
.
Attaching to a running container
When the container is running, we can attach to it.
podman attach
basically connects the stdout of the entrypoint
to your terminal.
With our Alpine container, we can run command again inside
the container.
We can also call podman exec -it CONTAINER CMD
that connects
to the running container in a new terminal (like a new tab).
For us, running the following would work (replace with your container name).
podman exec -it practical_bohr /bin/sh
Run again ps -ef
inside the container.
Which processes do you see?
Solution.
Terminating the exec
-ed shell returns us back to the host.
Terminating the attach
-ed shell terminates the whole container.
Containers in background (with names)
For service-style containers (e.g. nginx
that provides the webserver), we
often want to run them in daemon mode – in background.
That is possible with a --detach
option to the run
command.
We will also add a name webserver
to it so we can easily refer it.
podman run --detach --name webserver --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro nginx:1.20.0
We will explain the -v
and --publish
later on.
This command starts the container and terminates. The webserver is running in the background. Check that you can again access http://localhost:8080/ in your browser.
You can stop such container with podman stop webserver
.
Kind of similar to systemctl stop ...
.
Not a coincidence.
Check that after stopping the webserver, http://localhost:8080/ no longer works.
Starting the container again is possible with podman start webserver
.
start
and stop
and stdout
Note that both start
and stop
print the name of the container that
was started (stopped) on stdout.
That is useful when executed in scripts, for interactive use we can
simply ignore the output.
Clean-up actions
When we are done with a container, we can remove it
(but first, we need to stop
it).
Executing the following command would remove webserver
container completely.
podman rm webserver
You can also remove pull
-ed images using rmi
subcommand.
For example, to remove the nginx:1.20.0
, you can execute the following command.
podman rmi nginx:1.20.0
Note that Podman will refuse to remove an image if it is used by an existing container. Recall that the images are stacked and hence Podman cannot remove the underlying layers.
Limiting the isolation
By default, container is an isolated world.
If you want to access it from the outside, you have to exec
into it
(for terminal-style work) or publish its services to the outside.
Port forwarding (a.k.a. port publishing)
For server-style containers (e.g. Nginx one we used above), that means
exposing some of ports to the host computer.
That is done with the --publish
argument where you specify which
port on the host (e.g., 8080
) shall be forwarded into the container:
to which port and which protocol (e.g., 80
and tcp
).
Therefore, the argument --publish 8080:80/tcp
means that we expect
that the container itself offers a service on its port 80
and we want to
make this (container’s) port available as 8080
.
It is similar to SSH port forwarding with -L
.
We can start the nginx
container without --publish
, but it does not
make much sense. Why?
Solution.
Volume mounts
Another option how to break the container isolation is to bind a certain
directory into the container.
There are several options how to do that, we will show the
--volume
(or -v
) parameter.
It takes (again colon-separated) three arguments: source directory on the host, mapping inside the container and options.
Our example ./web:/usr/share/nginx/html:ro
thus specified that local
(host) directory web
shall be visible under /usr/share/nginx/html
inside the container in read-only mode.
It is very similar to normal mounts you already know.
If you specify rw
instead of ro
, you can modify the files inside the
container.
Volume mounting is useful for any service-style container. A typical example is a database server. You start the container and you give it a mounted volume. To this volume (directory), it will store the actual database (the data files). Thus, when the container terminates, your data are actually persistent as they were stored outside of the container.
This has a huge advantage for testing service updates. You stop the container, make a backup of the data directory and start a new container (with a newer version) on the top of the same data directory. If everything works fine, you are good to go. Otherwise, you can stop the new container, restore from the backup and return to the old version.
Very simple and effective.
Exercise
Apache web server
Start the Apache web server on the top of the 14/web
directory.
Use this httpd image.
Verify that you are really using the Apache web server.
Solution.
Python applications
Install the timestamp2iso
command system wide.
We recommend to use python:3.9-alpine
.
Note that you will not need to set up any virtual environment in this case: the whole machine (container) is yours. You can install things system-wide. Hint. Solution.
GitLab CI
We will now see how to actually configure CI on your GitLab repositories.
In this course we will focus on the simplest configuration where we want to execute tests after each commit. GitLab can be configured for more complex tasks where software can be even deployed to a virtual cloud machine but that is unfortunately out of scope.
If you are interested in this topic, GitLab has an extensive documentation for continuous integration and continuous deployment (CI/CD). The documentation is often densely packed with a lot of information, but it is a great source of knowledge not only about GitLab, but about many software engineering principles in general.
.gitlab-ci.yml
The configuration of the GitLab CI is stored inside file .gitlab-ci.yml
that has to be stored in the root directory of the project.
Your submission repository contains a bit more complex setup where we fetch actual configuration on-line so that only active tasks and quizzes are evaluated (without needing you to keep the repository up-to-date).
But the configuration for the timestamp2iso project now contains a very simple GitLab CI configuration.
base-tests:
image: python:3.9-alpine
script:
- apk add bats
- pip install .
- ./tests/base.bats
It specifies a pipeline job base-tests (you will see this name in the web UI) that is executed using python:3.9-alpine and it executes three commands. The first one installs a dependency, the second one installs the actual package (the project) and the last one executes simple BATS tests.
Note that GitLab will mount the Git repository into the container first
and then execute the commands inside the clone.
The commands are executed with set -e
: the first failing command
terminates the whole pipeline.
Emulate the run locally. Hint. Solution.
Note that the command you created for running the script locally on top of the given image is virtually identical to the one executed by GitLab. GitLab does some extra caching and other performance-related tweaks, but conceptually, there is nothing more. And your code is tested in a reproducible way in a clean container (that is, in a sense, undistinguishable from a full virtual machine).
Exercise
Add your own pipeline to GitLab that would check that you never use
/usr/bin/python
in a shebang.
Hint.
Solution.
Other bits
Notice how using the GitLab pipeline is easy. You find the right image, specify your script, and GitLab takes care of the rest.
From now on, every project you create on GitLab should have a pipeline that runs the tests (this includes Shellcheck, Pylint etc.). Set it up NOW for your assignments in other courses. Set it up for your Individual Software Project (NPRG045) next year. Use the chance to have your code regularly tested. It will save your time in the long run.
If you are unsure about which image to choose, official images are a good start. The script can have several steps where you install missing dependencies before running your program.
Recall that you do not need to create a virtual environment: the
whole machine is yours (and would be removed afterwards), so you can install
things globally.
Recall the example above where we executed pip install
without
starting a virtual environment.
There can be multiple jobs defined that are run in parallel (actually, there can be quite complex dependencies between them, but in the following example, all jobs are started at once).
The example below shows a fragment of .gitlab-ci.yml
that tests the project
on multiple Python versions.
# Default image if no other is specified
image: python:3.10
stages:
- test
# Commands executed before each "script" section (for any job)
before_script:
# To have a quick check that the version is correct
- python --version
# Install the project
- python -m pip install ...
# Run unit tests under different versions
unittests3.7:
stage: test
image: "python:3.7"
script:
- pytest --log-level debug tests/
unittests3.8:
stage: test
image: "python:3.8"
script:
- pytest --log-level debug tests/
unittests3.9:
stage: test
image: "python:3.9"
script:
- pytest --log-level debug tests/
unittests3.10:
stage: test
image: "python:3.10"
script:
- pytest --log-level debug tests/
Graded tasks (deadline: May 29)
14/shellcheck.sh
(+ .gitlab-ci.yml
) (60 points)
Write a script that runs ShellCheck for all scripts in your repository.
Update your .gitlab-ci.yml
to execute this script with each commit (push).
The pipeline shall fail if any of your scripts contains ShellCheck issues.
Name the pipeline shellcheck
so we can easily find it.
Feel free to reuse code from
assert_is_shellchecked
function
in our tests
for your implementation.
Also consider reusing parts of the script for checking for bad shebangs from
the exercise above.
UPDATE: feel free to add your pipeline (job) definition at the end of
existing .gitlab-ci.yml
(so that existing pipelines are still executed).
You will need to add stage: tests
to your pipeline definition
(otherwise you might get shellcheck job: chosen stage does not exist; available stages are .pre, tests, .post error).
See unittests3.10
pipeline definition above for an example.
14/command.txt
(15 points)
Image registry.gitlab.com/mffd3s/nswi177/labs-2022-command:latest
contains a command nswi177-task-command
.
Run this command with your GitLab username and paste its output into 14/command.txt
.
14/volume.txt
(25 points)
Image registry.gitlab.com/mffd3s/nswi177/labs-2022-volume:latest
contains a command nswi177-task-volume
.
Mount your submission repository under /srv/nswi177/
into a container
using this image and run this command.
If everything is fine, the command will print two hexadecimal strings.
Copy them into 14/volume.txt
.
Note that your submission repository must have been cloned via SSH.
Learning outcomes
Conceptual knowledge
Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …
-
explain what is a container (compared to virtual machine and a process)
-
explain where the container isolation is useful
-
explain container life-cycle
-
explain principles of continuous integration (and reasons why it exists)
-
explain why further sandboxing (e.g. virtualenv) is not needed inside a container
Practical skills
Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …
-
start interactive Podman container
-
start service-style Podman container
-
expose container ports
-
mount a volume into a container
-
clean unused containers and images
-
prepare single-job GitLab CI configuration to build and test a Python program