Lab #10 | NSWI177

Information below is not for the current semester. The current semester can be found here.

Other labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Přeložit do češtiny pomocí Google Translate ...

Lab #10 (Apr 27 – May 1)

Before class

Make sure you have completed all the exercises from previous labs. These ones will carry-on where we stopped last week.
Read this Wikipedia article about Continuous integration.

Topic

Git and remote branches.
Keeping forks up-to-date.
GitLab: CI.

Exercises

Last time we started with a fork of the teaching/nswi177/2020-summer/upstream/examples repository and worked in it. In this lab, we will simulate that work in the upstream repository (i.e. the one you forked from) continues and you want to keep your repository up-to-date.

That is a common task, by the way. You work on a new feature but you do not want to miss important updates that are happening in master. As a matter of fact, failing to keep your branch up-to-date with master can complicate merging later on. Depending on the size and activity of the project, it might make sense to merge upstream changes every week or even every day!

In some cases, you may even pull changes from different forks. If you see that someone else is working on a new feature, you may want to try it out and test how it works with your changes.

With Git, all this is possible and (maybe surprisingly) there is very little difference whether you merge your own (local) branch or changes of someone else working in a complete different fork.

To merge changes from a different repository than the default one (e.g. a different project on GitLab), we need to set-up so called remotes.

remote is Git name for saying that your local clone also knows about other forks and can tell you whether there are differences. This is overly simplified way of looking at things but is sufficient for the how-do-you-do of Git remotes.

To see your remotes, run (inside your local clone of your fork of the examples repository)

git remote show

It would probably print only origin. That is the default remote: when you do git pull or git push, it uses origin. Thus, you were using remotes even without knowing about it ;-).

Let’s run

git remote -v show

It will print what are the specific URLs where the remote is located. As a matter of fact, you will probably see two remotes now: one for push, one for fetch (pull). You can even configure Git to pull from a different repository than you are pushing too. Not very useful for us at the moment, though.

We will now add a new remote to our repository. This will link it with a different project and we would be able to compare changes between them (again, a simplified view).

Let’s run

git remote add upstream git@gitlab.mff.cuni.cz:teaching/nswi177/2020-summer/upstream/examples.git

We just added a remote named upstream that points to the given address (i.e. the original project). Note that Git is silent in this case.

Run git remote show again. How it changed?

By adding the remote, no data were exchanged yet. You have to tell Git to do everything, nothing happens automagically.

Let’s now fetch the changes from our new remote.

git fetch upstream

You should see the typical summary when cloning/pulling changes in Git, this time they referred to data from the upstream repository.

However, in your working tree (directory), nothing changed. That is fine, we only asked to fetch the changes, not apply them.

However, run git branch and git branch --all to see which branches you now have access to.

We will now investigate how the newly added remote differs.

Let’s start with showing commits on the remote:

git log remotes/upstream/lab/10/csv-calc-tests

Wow: git log can show commits on certain branch only (yes, the remote/... is actually a branch name: after all, you have seen git branch -a). And it also works on files (e.g. git log README.md). It is quite powerful command.

Fine. What about how the code differs? That is actually even more important: you want to see which changes to the code were made and whether it would be possible to merge them at all.

git diff remotes/upstream/lab/10/csv-calc-tests

You ought to see a patch that displays that the newly added remote differs in one file only: automated tests were added.

The tests look pretty good – we want them in our project too.

Let’s merge the remote branch, then:

git merge remotes/upstream/lab/10/csv-calc-tests

Since there shall be no conflicts (i.e. both branches – master and remotes/upstream/lab/10/csv-calc-tests changed different files), the merge should be automatically completed.

Check your project directory: is the tests.sh file there?

Advanced hint: if you do not like the commit message (generally, when you commit and you immediately realize that your commit message has a typo), you can change it. Just type git commit --amend to edit your last commit. If you have not pushed your changes, it will work flawlessly. Otherwise, it is a bit more complicated and it should not be tried by beginners.

Using the same approach, prepare for merge (i.e. do not run git merge yet) with upstream/lab/10/csv-calc-hotfix.

As you probably noticed, the second branch extended the tests but it also contains a typo fix for the README file.

But you already fixed the typo last week (if not, fix it before merging!).

So? The merge will lead to so-called conflict that we would need to resolve manually.

That is quite common and there is no need to be afraid of it. Git is able to help you a lot – when there are changes to different parts of a file, Git is able to merge the changes without any problems. But when both branches change the same lines, it is up to you to resolve it.

Note that even if both branches contain exactly the same fix (but introduced by different commits), Git fails on the safe side and informs you about the conflict.

Run the merge command now:

git merge remotes/upstream/lab/10/csv-calc-hotfix

This merge will end with an error and Git will inform you about the conflict.

What the merge command tells you exactly?

Run also git status and investigate its output.

We need to resolve the conflict. Edit the file and run

git add README.md

to mark the file as resolved from any conflicts.

Let’s finish the merge now by running commit as with any normal commit.

Do not forget to push the changes to your repository.

10.

How would the graphical representation of the commits in GitLab look like now?

Try to sketch it on a paper before opening the Graphs page in GitLab.

11.

On your own, merge with lab/10/ci branch of the upstream repository.

What new files appeared in your repository?

Hint.

12.

The last merge brought a new file, called .gitlab-ci.yml into your repository (to the root of it).

If you have not yet pushed to your fork, push the last merge there now as well.

13.

Unless something went terribly wrong, after a while you should see a green tick next to your last commit in GitLab UI.

If you see a blue stopwatch-like icon, wait for a while.

If you a red X-mark, something is broken.

If you do not see any icon (even after a while), it is time to verify that you did all the steps (is also the .gitlab-ci.yml in the root of your project visible on GitLab?) and if you think so, contact us.

14.

What is the green or red icon?

By adding the .gitlab-ci.yml file to our repository, we have enabled continuous integration for our project. GitLab picks this file and runs the script in it for each commit.

The script typically executes tests, tries to package the software and sometimes can even deploy the application to production environment!

That is called continuous integration (CI) and continuous deployment (CD): our code is tested and shipped with every commit we make (actually, with every last commit we push to GitLab).

For big pieces of software, such automated pipeline can run for several hours. For our purposes, we would see results within minutes and we are not aiming for automated deployment yet :-)

We will start with simple things, such as Pylinting our code regularly or running our tests.

15.

But enough of theory, let’s look at what actually happened.

Click on the green icon. You should see two jobs: one called csv_calc-linter, the other called csv_calc-tests (depending on where you are, you may need to first click on Status icon and the see the two stages).

Open both of them.

You should see something that looks like a dump from a terminal. And somewhere near the bottom you should see execution of shellcheck and execution of ./tests.sh. And output from these tools.

What is happening there?

GitLab created a virtual machine for each of the jobs, installed GNU/Linux into it and then executed commands inside it. After the commands finished, the virtual machine was automatically destroyed.

That means that each of the jobs was running in a completely clean environment. That is extremely important as it practically checks that you have specified all dependencies (in requirements.txt, for example). And it also ensures that you have actually committed all files, set correct rights on them etc.

Quite a lot of things, actually. If you script works in CI, you can be pretty sure that your setup is fine and you have not forgotten anything.

16.

Let’s have a look at how we have configured the virtual machines and have we have told GitLab to actually run shellcheck.

Open .gitlab-ci.yml in your editor and try to understand what is there without reading further.

Ok, have you at least tried looking in the file? Open it NOW.

Basically, for less than 15 lines we told GitLab to set-up virtual machine for us and run code in it. Not bad, don’t you think?

The file is in YAML format that you already know from the SSG task.

There are several top-level settings and then configuration for each of the jobs.

The top-level configuration specifies that we are using a virtual machine with Fedora (image: ). So, we are actually not installing the system per-se but we rather use a prepared image of installed system. You can imagine it as if you have installed Fedora on your machine and made a bit-copy of your hard-drive at the moment you finished the installation.

The jobs have two parts: before_script and script. Both of these specify a list of commands that are executed. Typically, script contains the actual command related to your project, before_script prepares the virtual machine.

Here, in before_script we install dependencies. You already know DNF, so there shall not be any surprise.

Notice that there is no sudo or similar action to switch to root. Our whole script is running with root privileges. Not typical, but it is a usual approach with most CI/CD solutions and since the script is contained in a virtual machine, it is quite safe.

And in the script, we run the actual commands.

17.

To test how CI works when something breaks, insert an intentional error somewhere.

Easiest way to do this is to break the test itself. For example, change the expected output CSV in tests.sh. Commit this change (remember to use a descriptive commit message) and push it to GitLab.

If everything works as expected, you should receive an e-mail informing about the failure. If not, check your notification settings in GitLab.

Open an issue for this (artificial) problem. Fix the issue in a separate branch and close it via a merge request. Notice that CI is executed for all branches and for merge requests too.

That is great as it can prevent you from merging bad code at all In big teams, the policy can be that pushing to master directly is prohibited and any change must go through a merge request. The merge requests are then setup in such way to prevent merges where CI tests failed.

Notice how everything is nicely connected in the web UI. If you have closed the issue via commit message, you should see link to respective commits in the issue description and also the merge request.

18.

On your own, add more jobs to CI.

That includes editing the .gitlab-ci.yml file, committing the changes and pushing them to GitLab.

Add the following jobs:

Run tests in SSG (name the job e.g. ssg-tests). This requires nosetests to be installed: install it with DNF first.
Run Pylint on the SSG code. Again, this would require you to first install the right tool.

19.

Note that we have emphasized to always develop your Python projects in virtualenv.

In GitLab CI, it is a bit different: since your code runs with root privileges and the machine would be destroyed, it is simpler to install things directly.

Hence, instead of the trio virtualenv venv, . ./venv/bin/activate and pip install ... you typically use only pip install ... and install things system-wide.

Because you start with a clean-state of the machine (i.e. only basic packages are installed) and you destroy it without reusing it, it is perfectly okay to do it like that.

And it nicely simulates what would happen if somebody installs your project system-wide.

20.

Add a reasonable checks to your task repository too to keep your code in good quality.

Running shellcheck */*.sh or similar might be a good start.