Docker, Podman a GitLab CI (14) | Cvičení | NSWI177

Informace níže se nevztahují k současnému semestru. Stránka pro aktuální semestr je zde.

Cvičení: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Čtení před cvičením
Příprava
Running the first container
Pulling and inspecting the images
Running containers
Managing container life cycle
Clean-up actions
Limiting the isolation
Cvičení
GitLab CI
Hodnocené úlohy (deadline: 29. května)
Učební výstupy

Toto cvičení se zaměří na kontejnery – velmi lehké virtuální stroje. Na konci cvičení pak použijeme jejich znalost k tomu, abychom nastavili pipeline na GitLabu tak, aby spouštěla náš kód – třeba testy – při každém commitu a udržela náš kód v dobrém (zeleném) stavu.

Čtení před cvičením

Příklady bude možné spouštět na sdíleném stroji linux.ms.mff.cuni.cz. Nicméně je mnohem pohodlnější rozběhat je na vašem vlastním počítači.

Instalační část je jasně označena a můžete se k ní vrátit kdykoliv později, abyste dokončili konfiguraci na svém stroji.

Lehký úvod ke kontejnerům

V úvodní části textu naschvál zanedbáváme rozdíl mezi obrazem a kontejnerem. Věříme, že v prvních krocích to věci zjednoduší.

Kontejnery jsou dalším možností pro oddělování. Zatím jsme viděli izolovaní projektu sandboxingem a mnoho z vás si vyzkoušelo virtualizovanou instalaci Linuxu.

Kontejnery jsou někde na pomezí. Nabízí izolované prostředí které se v podstatě chová jako virtualizovaný stroj. Z implementačního hlediska jsou blíže virtuálním prostředím, protože procesy uvnitř kontejneru jsou viditelné z hostitele. Můžeme si představit, že kontejner dostane jeden adresář (včetně obvyklých podadresářů jako /dev, /proc nebo /home) a z něj nemůže uniknout.

Kvůli tomu mohou kontejnery spouštět jen aplikace napsané pro tentýž operační systém (na rozdíl od plnohodnotného virtuálního stroje).

Protože jsou právě kontejnery odděleny od hostitelského systému, jsou velmi užitečné v mnoha situacích. Pochopitelně je vždy možnost použít i plně virtualizovaný stroj (např. VirtualBox nebo QEMU), ale kontejnery jsou lehčí a mají menší režii (třeba rychlejší start).

Oddělení od hostitelského systému je poměrně velké: bez další konfigurace se nemůže kontejner dostat na hostitelův systém souborů a nemůže poslouchat na žádných portech (pro příchozí spojení). Ale může začít odchozí komunikaci (např. ke stažení balíčků, které je potřeba nainstalovat). Kontejner může být též omezen množstvím paměti RAM, kterou může použít. Ve výchozím nastavení jsou procesy uvnitř kontejnery plánovány jako obyčejné procesy (tj. mají stejnou prioritu) ale je možné omezit i jejich přístup k CPU (čili je zpomalit jako nedůležité úlohy).

Typický příklad je potřeba spustit izolovaný server, který potřebujete pro vývoj. Můžete si představit třeba databázový nebo webový server. Určitě můžete takový server nainstalovat do systému (vzpomenťe si na cvičení 10). Ale tam není od systému nijak oddělený a odinstalace také není úplně přímočará. Vzpomeňte si, jak fungují virtuální prostředí: odstraněním jednoho adresáře odstraníme kompletně celé prostředí.

Podobně, odstranění kontejneru je jednoduchá a rychlá operace a nový můžete nastartovat během pár vteřin.

Používání kontejneru vám taky umožní přesně určit, jak bude vše vypadat: které procesy se spustí, na kterém portu bude poslouchat atd. Tato konfigurace pak můžeme být lehce kodifikována (podobně jako requirements.txt) a tedy snadno zopakována na jiném stroji.

Obrazy kontejnerů jsou také často používány když potřebujete distribuovat složitou aplikace, která vyžaduje běh několika služeb. Namísto psaní detailního manuálu nebo obrazu disku pro VirtualBox můžete dodat kontejner, který stačí jen spustit. Uživatel pak spustí celý kontejner a ten si uvnitř sám vyřeší zbytek a navenek zpřístupní cílovou službu. Například celý GitLab lze stáhnout a používat v kontejneru.

Docker a Podman

Na tomto cvičení se podíváme na základy Linuxových kontejnerů postavených nad Dockerem a Podmanem. Obě implementace jsou v podstatě stejné. Jejich hlavní příkazy (docker a podman) podporují úplně stejné argumenty a mají skoro vždy úplně stejnou sémantiku.

Hlavní rozdíl je, že Docker je trochu starší (ale také je ještě pořád vyvíjen) a byl zamýšlen pro systémové kontejnery (např. pokud byste si chtěli pustit vlastní GitLab). Podman je o něco mladší a využívá nové vlastnosti Linuxového jádra, které mu dovolují spouštět kontejnery bez práv superuživatele (což je pořád ještě poměrně nová funkce Linuxu). Navíc, Podman se krásně integruje do zbytku systému.

Z tohoto pohledu je Podman perfektní volbou pro vývojáře. Potřebujete databázový server? Použijte Podman s tím správným kontejnerem a spusťte ho. Vaše databáze je připravená k použití. Bez potřeby práv superuživatele root (často se setkáte s označením rootless mode).

Na druhou stranu, pokud máte o něco starší verzi Linuxu nebo kontejner vyžaduje specifické vlastnosti Dockeru, Docker může být volbou pro vás.

Terminologie …

V tomto cvičení jsou důležité dva koncepty. Obraz (image) a kontejner (container). Jsou trochu podobné třídě a objektu (instanci).

Obraz je jako pevný disk pro izolované prostředí. Obsahuje všechny potřebné soubory: spustitelné i datové soubory.

Abychom ho spustili, vytvoříme kontejner. Kontejner je spuštěn se stejným stavem jako obraz, ale má i běžící procesy, které mohou měnit jeho stav. Pokud o to explicitně nepožádáme, změny provedené kontejnerem nejsou uloženy zpátky na obraz: místo toho je kontejner spuštěn s kopií obrazu a mění tuto kopii.

Procesy v kontejneru jsou izolovány od okolí (hostitele) a kontejner nevidí procesy hostitele.

Na druhou stranu, procesy v kontejneru jsou viditelné na hostiteli. Kořený adresář kontejneru odpovídá nějaký podadresáři na hostiteli. ID uživatelů v kontejneru jsou přeložena na uživatelská ID hostitele. To samé platí o skupinách.

Docker/Podman obvykle spouští své procesy s privilegii kontejnerového uživatele root, který se navenek – v hostiteli – tváří jako obyčejný uživatel (obvykle s velmi vysokým UID).

Poznámka pod čarou: skládání obrazů

Nové obrazy jsou obvykle odvozeny od jiných, již existujících. Například existují základní obrazy systému a z nich jsou vytvářeny další specializované obrazy. To zjednodušuje konfiguraci, protože můžeme začít s předpřipraveným stavem a nemusíme začínat od nuly.

Pro ušetření místa jsou odvozené obrazy ukládány jen jako rozdíl oproti základnímu (rodičovskému) obrazu. Rozdíly jsou pak překryty (overlay) se základní obrazem, když se vytváří kontejner.

To zlepšuje výkon a šetří místo na disku i v paměti (pokud vám běží více kontejnerů se stejným základním obrazem tak stačí jedna sdílená instance základního obrazu). Také když stahujete nový obraz můžete stahovat jen rozdíly pokud už jste dříve stáhli základní obraz.

Tento mechanismus je trochu podobný tomu, co dělá Git. Tváří se, že každý commit je vlastně kompletní kopie celého stromu souborů v projektu. Interně si ale zaznamenává jen rozdíly mezi soubory.

Distribuce a Alpine

Obrazy můžou být vytvořeny nad různými distribucemi. Díky tomu jsou kontejnery snadnou cestou, jak vyzkoušet váš program v různých distribucích bez nutnosti instalovat triple- (nebo více-) boot nebo se starat o několik virtuálních strojů.

Brzy uvidíte, že mnoho kontejnerů je postaveno nad distribucí Alpine Linux. To je minimalisticky navržená distribuce (velikostí i složitostí) – má okolo 6MB a nemá žádnou složitou konfiguraci.

Alpine používá Apk (Alpine package manager) pro správu balíčků. Například následující příkaz nainstaluje curl (který není nainstalován ve výchozím stavu):

apk add curl

GitLab CI

Ve skutečnosti jste už funkce pro continuous integration v GitLabu používali. Pipeline v GitLabu je totiž jedna z nich.

Pokud jste nikdy předtím neslyšeli termín continuous integration, tak v kostce jde o následující. Abychom zajistili, že software na kterém pracujete je v rozumné stavu, měli byste co nejčastěji pouštět testy a opravovat chyby co nejdříve je to možné (protože cena za opravu chyby dramaticky roste s každým dnem, kdy o nich nevíte). Řešením je, že vývojář by měl pouštět všechny testy při každém commitu. To se ale těžko vynucuje a tak je lepší dělat to automaticky. CI v té nejjednodušší podobě tedy znamená, že automatické testy (třeba BATS nebo Python Nose) jsou spuštěny po každém push do větve origin/master, například na GitLabu.

V tomto cvičení uvidíte, jak si nastavit GitLab CI podle vašich potřeb.

Důležité je vědět, že GitLab CI může běžet nad Podmanovými kontejnery. Takže pro nastavení GitLabové pipeline si vyberete obraz pro Podman a příkazy, které je potřeba v takovém kontejneru spustit. GitLab pak vytvoří daný kontejner a spustí v něm vaše příkazy.

Podle výsledku celého skriptu (tj. jeho exit kódu) pak označí buď pipeline jako procházející nebo selhanou.

Nastavení Dockeru/Podmanu

Nainstalujte Docker nebo Podman.

Následující příkaz by vám měl pomoci rozhodnout, který z nich vlastně potřebujete.

grep cgroup /proc/filesystems

Pokud uvidíte jen následující řádku, váš kernel nezná cgroups v2, které potřebuje Podman.

nodev	cgroup

Pokud ale uvidíte následující, máte cgroups v2 povolené a měli byste používat Podman.

nodev	cgroup
nodev	cgroup2

Pak pokračujte s instalací. Nejnovější verze Fedory už přesly na cgroup v2 a instalace Podmanu je jedinou možností. Takže instalujte s sudo dnf install podman.

Všechy příklady budou používat příkaz podman. Pokud vaše distribuce Podman nepodporuje, nahraďte jej sudo docker.

Podman: nastavení `/etc/subuid` a `/etc/subgid`

Jak jsme vysvětlovali výše, Podman potřebuje určitý rozsah volný ID uživatelů a skupin, aby do nich mohl namapovat UID a GID z kontejneru.

Superuživatel může bloky UIDů/GIDů přidělovat obyčejným uživatelům, kteří je pak mohou takto využívat. Říká se tomu sub-UID/sub-GID a jejich nastavení je zaznamenáno v souborech /etc/subuid a /etc/subgid.

Nejdříve si, prosím, zkontrolujte, zda-li váš /etc/subuid neobsahuje něco jako intro:100000:65536. Pokud ano, máte už vše připravené a můžete zbytek této sekce přeskočit.

Jinak se ujistěte, že tento soubor existuje a vytvořte nové přiřazení pomocí usermod:

sudo touch /etc/subuid /etc/subgid
sudo usermod --add-subuids 100000-165536 --add-subgids 100000-165536 YOUR_LOGIN

Systémová (balíčková) aktualizace může občas Podman z různých důvodů rozbít. Pokud se vám to stane, zkuste vždy nejdříve spustit podman system migrate, který obvykle vyřeší většinu chyb spojených s přechodem na novější verzi.

Docker: spuštění služby

For Docker, you need to ensure that docker is up and running. Typically, the following commands would be sufficient:

sudo package-manager-of-your-distribution install docker
sudo systemctl enable docker
sudo systemctl start docker

Základní test funkčnosti

Spusťte podman info čímž získáte základní informace o systému. Pravděpodobně uvidíte něco jako:

host:
  arch: amd64
  ...
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    ...
  ...
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  ...
  os: linux
...
store:
  graphRoot: $HOME/.local/share/containers/storage
  ...
  runRoot: /run/user/1000/containers
  volumePath: $HOME/.local/share/containers/storage/volumes
version:
  APIVersion: 3.0.0
  ...

Až budete ladit problémy s Podmanem, vždy vložte tuto informaci (bez dalších úprav) do popisu Issue (pochopitelně, vkládejte text do ```, nikoliv jako screenshot!).

Abyste ověřili, že můžete kontejnery spustit, zkuste následující příkaz:

podman run --rm docker.io/library/alpine:latest cat /etc/os-release

Pokud uvidíte něco jako následující výpis, vše je připraveno. Jinak klidně otevřete Issue na Foru a pokusíme se to nějak vyřešit (nezapomeňte říct, kterou distribuci používáte).

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.4
PRETTY_NAME="Alpine Linux v3.15"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

Kvíz před cvičením

Soubor s kvízem je ve složce 14 v tomto GitLabím projektu.

Zkopírujte si správnou jazykovou mutaci do vašeho projektu jako 14/before.md (tj. budete muset soubor přejmenovat).

Otázky i prostor pro odpovědi jsou v souboru, odpovědi vyplňte mezi značky **[A1]** a **[/A1]**.

Pipeline before-14 na GitLabu zkontroluje, že jste odevzdali odpovědi ve správném formátu. Ze zřejmých důvodů nemůže zkontrolovat skutečnou správnost.

Odevzdejte kvízy před začátkem cvičení 14.

Příprava

Before staring with Podman, ensure you have up-to-date copy of the examples repository. We will be using the subdirectory 14/.

Podman is not available in IMPAKT labs (actually, it is installed but you will not be able to execute anything). Feel free to use the shared machine linux.ms.mff.cuni.cz. But it is much more comfortable to use your own machine as you do not have to setup further SSH port forwards etc.

To check that your setup is okay, try the following command:

podman run --rm docker.io/library/alpine:latest cat /etc/os-release

Pokud uvidíte něco jako následující výpis, vše je připraveno. Jinak klidně otevřete Issue na Foru a pokusíme se to nějak vyřešit (nezapomeňte říct, kterou distribuci používáte).

Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob df9b9388f04a done
Copying config 0ac33e5f5a done
Writing manifest to image destination
Storing signatures
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.15.4
PRETTY_NAME="Alpine Linux v3.15"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

If you run podman on linux.ms.mff.cuni.cz always remove unused images. While the system has enough space for experimenting, the images can easily fill-up the whole disk. Use podman images and podman rmi IMAGE_ID to remove them once you do need them (see below for further details).

Running the first container

The first execution will be a bit more complex to give you a taste of what is possible. We will explain the details in the following sections.

The following assumes you are inside the directory 14 in the examples repository. It will launch an Nginx web server.

podman run --rm --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro docker.io/library/nginx:1.20.0

You will see similar output to the following.

Trying to pull docker.io/library/nginx:1.20.0...
Getting image source signatures
Copying blob 525e372d6dee done
Copying blob 69692152171a done
Copying blob b141b026b9ce done
Copying blob 8d70dc384fb3 done
Copying blob 965615a5cec8 done
Copying blob 6e60219fdb98 done
Copying config 7ab27dbbfb done
Writing manifest to image destination
Storing signatures
/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
10-listen-on-ipv6-by-default.sh: info: Enabled listen on IPv6 in /etc/nginx/conf.d/default.conf
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/05/18 13:15:55 [notice] 1#1: using the "epoll" event method
2021/05/18 13:15:55 [notice] 1#1: nginx/1.20.0
2021/05/18 13:15:55 [notice] 1#1: built by gcc 8.3.0 (Debian 8.3.0-6)
2021/05/18 13:15:55 [notice] 1#1: OS: Linux 5.10.16-arch1-1
2021/05/18 13:15:55 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 524288:524288
2021/05/18 13:15:55 [notice] 1#1: start worker processes
2021/05/18 13:15:55 [notice] 1#1: start worker process 26
2021/05/18 13:15:55 [notice] 1#1: start worker process 27
2021/05/18 13:15:55 [notice] 1#1: start worker process 28
2021/05/18 13:15:55 [notice] 1#1: start worker process 29

Open http://localhost:8080/ in your browser. You should see a NSWI177 Test Page in the browser.

If you see 403 Forbidden instead, append ,Z to the -v. Thus, the command would contain -v ./web:/usr/share/nginx/html:ro,Z. This is needed (and generally a good practice) when you are running on a machine with SELinux enabled in enforcing mode (default installation of Fedora but not on the USB disks from us).

When running on linux.ms.mff.cuni.cz you will need to specify a unique port number (only one application can listen at given port).

Virtually any number is fine as long as it is greater than 1024 and does not collide with anything else.

You may also wish to set-up a SSH port forwarding for that port from linux.ms.mff.cuni.cz so that you can see the result in a graphical browser.

But curl would work fine too :-).

Terminate the execution by killing Podman with Ctrl-C.

Note that the running Nginx webserver was printing its log – i.e., the list of accessed pages – to stdout.

Now open the page web/index.html in your browser. Again, you shall see a NSWI177 Test Page, but the URL would point to your local filesystem (i.e., file:///home/.../examples/14/web/index.html).

The above example illustrated three important features that are available with containers:

The web server in the container does not need any configuration or system-wide installation.
The container can listen on ports of the host system and forward network communication inside the container.
The container can access host’s files and use them.

All very good features for development, testing as well as distribution of your software.

Pulling and inspecting the images

The first thing that needs to be done when starting a container is to get its image. While Podman is able to pull the image as a part of the run subcommand, it is sometimes useful to fetch it as a separate step.

The command podman images prints a list of images that are present on your system. The output may look like this.

REPOSITORY                        TAG                  IMAGE ID      CREATED        SIZE
docker.io/library/nginx           1.20.0               7ab27dbbfbdf  6 days ago     137 MB
docker.io/library/fedora          34                   8d788d646766  2 weeks ago    187 MB
...

The repository refers to the on-line repository we fetched the image from. The tag is basically a version string. The image id is a unique identification of the image, it is generally derived from a cryptographic hash of the image contents. The remaining columns are self-descriptive.

When you execute podman pull IMAGE:TAG, Podman will fetch the image without starting any container. If you use latest as a tag, the latest available version will be fetched.

Pull docker.io/library/python:3-alpine and check that it has appeared in podman images afterwards.

Shorter image names

If you paste the following content into /etc/containers/registries.conf.d/unqualified.conf, you will not need to type docker.io/ in front of every image name. It is called an unqualified search and it is tried first for every image name.

unqualified-search-registries = ["docker.io"]

Companies can have their own repositories and you may set up multiple repositories here if you wish to try more of them when fully-qualified name is not provided.

Image repository

If you wonder where the images are coming from, have a look at https://hub.docker.com/. Anyone can upload their images there for others to use.

Similarly to Python package index, you may find malicious images here. At least, the containers are running isolated, so the chances of misbehaviour are limited a little bit (compared to pip install that you execute in the context of a normal user).

Images from the library group are official images endorsed by Docker itself and hence are relatively trustworthy.

Running containers

After the image is pulled, we can create a container from it.

We will start with an Alpine image because it is very small and thus very fast.

podman run --interactive --tty alpine:latest /bin/sh

If all went fine, you should see an interactive prompt / # and inspecting /etc/os-release should show you the following text (version numbers may differ):

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.13.5
PRETTY_NAME="Alpine Linux v3.13"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://bugs.alpinelinux.org/"

The run subcommand starts a container from a specified image. With --interactive and --tty (that are often combined into single -it) we specify that we want to attach a terminal to the container as we would use it interactively. The last part of the command is the program to run.

Inside the container, we can execute any commands we wish. We are securely contained and the changes will not affect the host system.

Install curl and check that you have functional network access. Solution.

Open a second terminal so that we can inspect how the container looks from the outside.

Inside the container, execute sleep 111 and in the other terminal (that is running in the host) execute ps -ef --forest. You shall see lines like the following:

student    1477313       1  0 16:29 ?        00:00:00 /usr/bin/conmon ...
student    1477316 1477313  0 16:29 pts/0    00:00:00  \_ /bin/sh
student    1477370 1477316  0 16:33 pts/0    00:00:00      \_ sleep 111

This confirms that the processes inside a container are visible from the outside.

Run ps -ef inside a container (or look into /proc there). What do you see? Is there something surprising? Solution.

Execute also podman ps. That prints list of running containers.

CONTAINER ID  IMAGE                            COMMAND  CREATED        STATUS            PORTS   NAMES
643b5e7cea06  docker.io/library/alpine:latest  /bin/sh  4 minutes ago  Up 4 minutes ago          practical_bohr

Container ID is again a unique identification, the other columns are self-descriptive. Note that since we have not specified a name, Podman assigned a random one.

If you terminate the session inside the container (exit or Ctrl-D), you will return to the host terminal.

Execute podman ps again. It is empty: the container is not running. If you add --all, you will see that the STATUS has changed.

Exited (130) 1 second ago

Note that if we would execute podman run ... again, we would start a new container. Try it now.

We will describe the container life cycle later on, if you wish to remove the container now, execute podman rm NAME. Instead of NAME, you can use the randomly assigned one or CONTAINER ID.

Single shot runs

You can pass any command to podman run to be executed. If you know that you would be removing the container immediately afterwards, you can add --rm to tell Podman to remove it automatically once it finishes execution.

podman run --rm alpine:latest cat /etc/os-release

If you want to pass a more complicated command, it is better via sh -c. Change the above command to first cd to etc and then call cat os-release. Why the following does not work podman run --rm alpine:latest cd /etc && cat os-release? Solution.

Managing container life cycle

Starting a container

After we have terminated the interactive session, the container exited. We can call podman start CONTAINER to start it again.

Each container has a so-called entry point that is executed when the container is started. For a service-style container (e.g., with a web server), the service would be started again.

For our Alpine example, the entry point is /bin/sh (shell), so nothing interesting will happen.

Check that the container is running with podman ps.

Attaching to a running container

When the container is running, we can attach to it. podman attach basically connects the stdout of the entrypoint to your terminal. With our Alpine container, we can run command again inside the container.

We can also call podman exec -it CONTAINER CMD that connects to the running container in a new terminal (like a new tab). For us, running the following would work (replace with your container name).

podman exec -it practical_bohr /bin/sh

Run again ps -ef inside the container. Which processes do you see? Solution.

Terminating the exec-ed shell returns us back to the host. Terminating the attach-ed shell terminates the whole container.

Containers in background (with names)

For service-style containers (e.g. nginx that provides the webserver), we often want to run them in daemon mode – in background.

That is possible with a --detach option to the run command.

We will also add a name webserver to it so we can easily refer it.

podman run --detach --name webserver --publish 8080:80/tcp -v ./web:/usr/share/nginx/html:ro  nginx:1.20.0

We will explain the -v and --publish later on.

This command starts the container and terminates. The webserver is running in the background. Check that you can again access http://localhost:8080/ in your browser.

You can stop such container with podman stop webserver. Kind of similar to systemctl stop .... Not a coincidence.

Check that after stopping the webserver, http://localhost:8080/ no longer works.

Starting the container again is possible with podman start webserver.

`start` and `stop` and stdout

Note that both start and stop print the name of the container that was started (stopped) on stdout. That is useful when executed in scripts, for interactive use we can simply ignore the output.

Clean-up actions

When we are done with a container, we can remove it (but first, we need to stop it).

Executing the following command would remove webserver container completely.

podman rm webserver

You can also remove pull-ed images using rmi subcommand.

For example, to remove the nginx:1.20.0, you can execute the following command.

podman rmi nginx:1.20.0

Note that Podman will refuse to remove an image if it is used by an existing container. Recall that the images are stacked and hence Podman cannot remove the underlying layers.

Limiting the isolation

By default, container is an isolated world. If you want to access it from the outside, you have to exec into it (for terminal-style work) or publish its services to the outside.

Port forwarding (a.k.a. port publishing)

For server-style containers (e.g. Nginx one we used above), that means exposing some of ports to the host computer. That is done with the --publish argument where you specify which port on the host (e.g., 8080) shall be forwarded into the container: to which port and which protocol (e.g., 80 and tcp).

Therefore, the argument --publish 8080:80/tcp means that we expect that the container itself offers a service on its port 80 and we want to make this (container’s) port available as 8080. It is similar to SSH port forwarding with -L.

We can start the nginx container without --publish, but it does not make much sense. Why? Solution.

Volume mounts

Another option how to break the container isolation is to bind a certain directory into the container. There are several options how to do that, we will show the --volume (or -v) parameter.

It takes (again colon-separated) three arguments: source directory on the host, mapping inside the container and options.

Our example ./web:/usr/share/nginx/html:ro thus specified that local (host) directory web shall be visible under /usr/share/nginx/html inside the container in read-only mode. It is very similar to normal mounts you already know.

If you specify rw instead of ro, you can modify the files inside the container.

Volume mounting is useful for any service-style container. A typical example is a database server. You start the container and you give it a mounted volume. To this volume (directory), it will store the actual database (the data files). Thus, when the container terminates, your data are actually persistent as they were stored outside of the container.

This has a huge advantage for testing service updates. You stop the container, make a backup of the data directory and start a new container (with a newer version) on the top of the same data directory. If everything works fine, you are good to go. Otherwise, you can stop the new container, restore from the backup and return to the old version.

Very simple and effective.

Cvičení

Apache web server

Start the Apache web server on the top of the 14/web directory. Use this httpd image. Verify that you are really using the Apache web server. Solution.

Python applications

Install the timestamp2iso command system wide.

We recommend to use python:3.9-alpine.

Note that you will not need to set up any virtual environment in this case: the whole machine (container) is yours. You can install things system-wide. Hint. Solution.

GitLab CI

We will now see how to actually configure CI on your GitLab repositories.

In this course we will focus on the simplest configuration where we want to execute tests after each commit. GitLab can be configured for more complex tasks where software can be even deployed to a virtual cloud machine but that is unfortunately out of scope.

If you are interested in this topic, GitLab has an extensive documentation for continuous integration and continuous deployment (CI/CD). The documentation is often densely packed with a lot of information, but it is a great source of knowledge not only about GitLab, but about many software engineering principles in general.

`.gitlab-ci.yml`

The configuration of the GitLab CI is stored inside file .gitlab-ci.yml that has to be stored in the root directory of the project.

Your submission repository contains a bit more complex setup where we fetch actual configuration on-line so that only active tasks and quizzes are evaluated (without needing you to keep the repository up-to-date).

But the configuration for the timestamp2iso project now contains a very simple GitLab CI configuration.

base-tests:
  image: python:3.9-alpine
  script:
    - apk add bats
    - pip install .
    - ./tests/base.bats

It specifies a pipeline job base-tests (you will see this name in the web UI) that is executed using python:3.9-alpine and it executes three commands. The first one installs a dependency, the second one installs the actual package (the project) and the last one executes simple BATS tests.

Note that GitLab will mount the Git repository into the container first and then execute the commands inside the clone. The commands are executed with set -e: the first failing command terminates the whole pipeline.

Emulate the run locally. Hint. Solution.

Note that the command you created for running the script locally on top of the given image is virtually identical to the one executed by GitLab. GitLab does some extra caching and other performance-related tweaks, but conceptually, there is nothing more. And your code is tested in a reproducible way in a clean container (that is, in a sense, undistinguishable from a full virtual machine).

Cvičení

Add your own pipeline to GitLab that would check that you never use /usr/bin/python in a shebang. Hint. Solution.

Other bits

Notice how using the GitLab pipeline is easy. You find the right image, specify your script, and GitLab takes care of the rest.

From now on, every project you create on GitLab should have a pipeline that runs the tests (this includes Shellcheck, Pylint etc.). Set it up NOW for your assignments in other courses. Set it up for your Individual Software Project (NPRG045) next year. Use the chance to have your code regularly tested. It will save your time in the long run.

If you are unsure about which image to choose, official images are a good start. The script can have several steps where you install missing dependencies before running your program.

Recall that you do not need to create a virtual environment: the whole machine is yours (and would be removed afterwards), so you can install things globally. Recall the example above where we executed pip install without starting a virtual environment.

There can be multiple jobs defined that are run in parallel (actually, there can be quite complex dependencies between them, but in the following example, all jobs are started at once).

The example below shows a fragment of .gitlab-ci.yml that tests the project on multiple Python versions.

# Default image if no other is specified
image: python:3.10

stages:
  - test

# Commands executed before each "script" section (for any job)
before_script:
    # To have a quick check that the version is correct
    - python --version
    # Install the project
    - python -m pip install ...

# Run unit tests under different versions
unittests3.7:
  stage: test
  image: "python:3.7"
  script:
    - pytest --log-level debug tests/

unittests3.8:
  stage: test
  image: "python:3.8"
  script:
    - pytest --log-level debug tests/

unittests3.9:
  stage: test
  image: "python:3.9"
  script:
    - pytest --log-level debug tests/

unittests3.10:
  stage: test
  image: "python:3.10"
  script:
    - pytest --log-level debug tests/

Hodnocené úlohy (deadline: 29. května)

`14/shellcheck.sh` (+ `.gitlab-ci.yml`) (60 bodů)

Napište skript, který spustí ShellCheck nad všemi skripty ve vašem repozitáři.

Upravte váš .gitlab-ci.yml tak, aby spouštěl tento skript při každém commitu (push). Pipeline má selhat, pokud libovolný skript obsahuje nějaký ShellCheckový problém. Pojmenujte pipeline shellcheck, abychom ji mohli dobře najít.

Můžete se inspirovat nebo použít části kódu z funkce assert_is_shellchecked z našich testů. Také zvažte, zda nepoužít části kódu z příkladu na testování shebangu výše.

AKTUALIZACE: vaší definice pipeline můžete s klidem přidat na konec existujícího .gitlab-ci.yml (takže současné pipeline jsou stále aktivní). Budete muset přidat stage: tests k definici pipeline (jinak se můžete potkat s chybou shellcheck job: chosen stage does not exist; available stages are .pre, tests, .post). Podívejte se na definici pipeline unittests3.10 výše pro konkrétní příklad.

`14/command.txt` (15 bodů)

Obraz registry.gitlab.com/mffd3s/nswi177/labs-2022-command:latest obsahuje příkaz nswi177-task-command.

Spusťte tento příkaz s vaším GitLabovým loginem a zkopírujte jeho výstup do 14/command.txt.

`14/volume.txt` (25 bodů)

Obraz registry.gitlab.com/mffd3s/nswi177/labs-2022-volume:latest obsahuje příkaz nswi177-task-volume. Připojte váš repozitář s úkoly jako /srv/nswi177/ v kontejneru a spusťte v něm tento příkaz.

Pokud je vše ok, příkaz vytiskne dva hexadecimální řetězce. Zkopírujte je do 14/volume.txt.

Váš repozitář musí být naklonovaný přes SSH.

Učební výstupy

Znalosti konceptů

Znalost konceptů znamená, že rozumíte významu a kontextu daného tématu a jste schopni témata zasadit do většího rámce. Takže, jste schopni …

vysvětlit, co je to kontejner (porovnat s virtuálním strojem a procesem)
vysvětlit, kde se hodí izolace, kterou nabízí kontejnery
vysvětlit životní cyklus kontejneru
vysvětlit principy continous integration (a důvody, proč existuje)
vysvětlit, proč další sandboxování (např. virtualenv) není potřeba uvnitř kontejneru

Praktické dovednosti

Praktické dovednosti se obvykle týkají použití daných programů pro vyřešení různých úloh. Takže, dokážete …

spustit interaktivní kontejner v Podmanu
spustit kontejner Podmanu se službou
zpřístupnit (expose) porty kontejneru
připojit svazek dovnitř kontejneru
vymazat nepoužívané obrazy a kontejnery
připravit konfigurace pro GitLab CI, která sestaví a otestuje Pythoní program

podman run --rm -v .:/root/repo python:3.9-alpine /bin/sh -c "cd /root/repo && apk add bats && pip install . && ./tests/base.bats"

Note that -v takes . as we assume we are inside the tool repository. We mount it to some directory (recall that by default, you run as root inside the container) and then execute the script mentioned in .gitlab-ci.yml.

find [0-9][0-9]/ -type f | while read -r fname; do if head -n 1 "$fname" | grep -q '^#!/usr/bin/python'; then echo "Bad Python shebang for $fname."; fi; done

We can safely assume that we do not have crazy file names in our repository, so we can use find | while read filename (instead of using the safer null-byte separator).

Here is the shell script:

#!/bin/bash

set -ueo pipefail

find [0-9][0-9]/ -type f | (
    exit_code=0
    while read -r fname; do
        if head -n 1 "$fname" | grep -q '^#!/usr/bin/python'; then
            echo "Bad Python shebang for $fname." >&2
            exit_code=1
        fi
    done
    exit $exit_code
)

And here is the extension of your .gitlab-ci.yml. We need to specify the extra stage: tests to execute this job along with other jobs from the upstream repository.

# Python shebang
check-bad-python-shebang:
  image: mffd3s/nswi177-base:latest
  stage: tests
  script:
    - bin/check_python_shebang.sh

PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
   25 root      0:00 ps -ef

/proc would be rather empty (or emptier than usual) as there are only two processes visible in the container.

Also note that the process IDs are starting from 1 (another thing that the kernel maps into the containers) and PID 1 is /bin/sh. Recall that PID 1 is the first process of the machine and its termination means machine shutdown.

podman run --rm alpine:latest /bin/sh -c 'cd /etc && cat os-release'

The command podman run --rm alpine:latest cd /etc && cat os-release will not work because the && cat os-release is executed by the host shell, i.e. not inside the container (inside the container, we only change current directory and terminate).

We will use the Alpine Linux based image for this exercise.

podman pull httpd:alpine

podman run --rm -it  -v ./web/:/usr/local/apache2/htdocs/:ro  -p 8080:80/tcp httpd:alpine

And in second terminal, run:

curl --silent -v http://localhost:8080 >/dev/null

PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
   25 root      0:00 ps -ef

/proc would be rather empty (or emptier than usual) as there are only two processes visible in the container.

podman pull python:3.9-alpine

We run the container (consider removing --rm if you want to experiment with the same container more).

podman run -it --rm python:3.9-alpine /bin/sh

Inside the container, we first install the required package (see hint) and then build and install the utility.

apk add git
python -m pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2022/common/timestamp2iso.git

The command was installed system-wide into /usr/local/bin together with required Python modules.

timestamp2iso 1 day ago

Note that timestamp2iso would warn us about timezone issues. For this exercise, we will ignore that.

podman run --rm -v .:/root/repo python:3.9-alpine /bin/sh -c "cd /root/repo && apk add bats && pip install . && ./tests/base.bats"

find [0-9][0-9]/ -type f | while read -r fname; do if head -n 1 "$fname" | grep -q '^#!/usr/bin/python'; then echo "Bad Python shebang for $fname."; fi; done

We can safely assume that we do not have crazy file names in our repository, so we can use find | while read filename (instead of using the safer null-byte separator).

Here is the shell script:

#!/bin/bash

set -ueo pipefail

find [0-9][0-9]/ -type f | (
    exit_code=0
    while read -r fname; do
        if head -n 1 "$fname" | grep -q '^#!/usr/bin/python'; then
            echo "Bad Python shebang for $fname." >&2
            exit_code=1
        fi
    done
    exit $exit_code
)

And here is the extension of your .gitlab-ci.yml. We need to specify the extra stage: tests to execute this job along with other jobs from the upstream repository.

# Python shebang
check-bad-python-shebang:
  image: mffd3s/nswi177-base:latest
  stage: tests
  script:
    - bin/check_python_shebang.sh

podman run --rm alpine:latest /bin/sh -c 'cd /etc && cat os-release'

We will use the Alpine Linux based image for this exercise.

podman pull httpd:alpine

podman run --rm -it  -v ./web/:/usr/local/apache2/htdocs/:ro  -p 8080:80/tcp httpd:alpine

And in second terminal, run:

curl --silent -v http://localhost:8080 >/dev/null

podman pull python:3.9-alpine

We run the container (consider removing --rm if you want to experiment with the same container more).

podman run -it --rm python:3.9-alpine /bin/sh

Inside the container, we first install the required package (see hint) and then build and install the utility.

apk add git
python -m pip install git+http://gitlab.mff.cuni.cz/teaching/nswi177/2022/common/timestamp2iso.git

The command was installed system-wide into /usr/local/bin together with required Python modules.

timestamp2iso 1 day ago

Note that timestamp2iso would warn us about timezone issues. For this exercise, we will ignore that.

Čtení před cvičením

Lehký úvod ke kontejnerům

Docker a Podman

Terminologie …

Poznámka pod čarou: skládání obrazů

Distribuce a Alpine

GitLab CI

Nastavení Dockeru/Podmanu

Podman: nastavení /etc/subuid a /etc/subgid

Docker: spuštění služby

Základní test funkčnosti

Kvíz před cvičením

Příprava

Running the first container

Pulling and inspecting the images

Shorter image names

Image repository

Running containers

Single shot runs

Managing container life cycle

Starting a container

Attaching to a running container

Containers in background (with names)

start and stop and stdout

Clean-up actions

Limiting the isolation

Port forwarding (a.k.a. port publishing)

Volume mounts

Cvičení

Apache web server

Python applications

GitLab CI

.gitlab-ci.yml

Cvičení

Other bits

Hodnocené úlohy (deadline: 29. května)

14/shellcheck.sh (+ .gitlab-ci.yml) (60 bodů)

14/command.txt (15 bodů)

14/volume.txt (25 bodů)

Učební výstupy

Znalosti konceptů

Praktické dovednosti

Podman: nastavení `/etc/subuid` a `/etc/subgid`

`start` and `stop` and stdout

`.gitlab-ci.yml`

`14/shellcheck.sh` (+ `.gitlab-ci.yml`) (60 bodů)

`14/command.txt` (15 bodů)

`14/volume.txt` (25 bodů)