[NSWI004] Several notes on Git, GitLab and CI

Vojtech Horky horky at d3s.mff.cuni.cz
Mon Nov 9 10:32:12 CET 2020


Hello.

Dne 09. 11. 20 v 9:51 Lukáš Bastián napsal(a):
> Hi,
> 
> might have some additional questions and remarks regarding the git 
> workflow that you want us to use:
> 
> 1. One of the ways to develop is to have so-called feature branches - 
> the problem with those is commits can add up and to keep master clean 
> after merging the pull request (or merge request on GitLab) you should 
> squash the commits but that is messing with the history if multiple 
> people committed to the feature branch (I don't think it does on GitHub 
> but on GitLab, I think I saw it happening). How should we deal with 
> that? Should we use squash or does it impact the grading script? 
> (Depending on the answer I might need to contact you separately and 
> solve some issues with points).

Personally, I am not a big fan of commit squashing - you should keep 
even your feature branch in such state that merging it as is (with 
--no-ff) should not break things. I have no problem with history changes 
in not-yet-merged branches.

We definitely do not enforce that you have to squash/rebase/... in your 
team or even use branches. Use what you like, what you are comfortable with.

However, our activity points scripts are counting only commits that made 
it into master. Hence, if you squash your branch, you will have only one 
commit. I do not think it is possible to count it in any other way (i.e. 
no way to see the commits before the squash as the branch would seem 
dead or would be GCed anyway).

If squashing the commits is a big deal and you want to have it, let's 
see how the points will look at the end of the semester. If you would 
(by then) miss some points, we can look at the merge requests and 
manually recompute.


>     It is a good practice to always commit code that compiles. For
>     master/main branch, the rule is usually to commit code that compiles and
>     passes the tests.
> 
> In this case, this means completing the task - which doesn't go hand in 
> hand with the fact that you are trying to get us to collaborate and 
> "work as a team in a real company" (or at least that's what I felt like) 
> - I wouldn't be able to reasonably share code with my colleagues - there 
> are ways to get around this by having let's say a02-master and we all 
> branch from there but that might be problematic again when it comes to 
> what I mentioned above - squash on merge and the change of author when used.

The word usually applies here. Note that some of your colleagues are 
seeing Git for the first time in this course hence I decided to add what 
is the usual rule in long-running projects. I.e. there is a difference 
when your project is being born and thousands of lines are added 
compared to a maintenance of a mature project. Sorry, I should have made 
this point more clear.


>     As a matter of fact, if you commit/push every minute to debug your
>     knowledge of C syntax and/or for every word in one code comment, two
>     things may happen. We may consider this as a gaming of the activity
>     points. And we might be forced to setup some type of accounting for the
>     use of CI to ensure fairness (I already received a complaint from GitLab
>     administrator about overloading the CI machines).
> 
> Is there a way to set up the CI so that doesn't build before opening a 
> PR or that it cancels the previous pipeline if a new commit is made? 
> That is the approach that is used in the company I work for to prevent 
> what we are encountering here. At least on the feature branches - master 
> should always run with each commit/merge given the small team size - 
> there is also the option to make the build periodical if there were 
> changes in the last X amount of time.

The pipelines are configured that redundant jobs shall be canceled 
automatically. You can also add interruptible [1] flag to CI manually.

[1] https://gitlab.mff.cuni.cz/help/ci/yaml/README.md#interruptible

But I do not think it should be necessary. If you work normally and 
simply do not push every commit (if you are in a private feature branch, 
there really is no reason to do so) all should be fine.


> Mostly my comments come from a place of
> a) interest and previous experience with some GitHub workflows - from 
> what I see the unit of work in your opinion should be a commit which in 
> my opinion is not easily accessible given the nature of the task and the 
> need to collaborate - I am used to using a PR as a unit of work which 
> would probably require some changes in the CI pipeline.

I am not sure I follow what you mean by your understanding of PR.

However, I strongly believe that commit should represent a logical unit 
of change. In this sense: if I add a code that detects available memory, 
it is a logical unit even if all tests still fail.

And personally I would add it directly to master as it will be needed by 
my team mates very soon. But I would create a PR later on when 
refactoring the code.


> And also b) slight frustration because I am one of the ones that were 
> probably overloading the GitLab machines (although I tried to minimize 
> it once I had a look at the queue, saw what was happening, and was 
> canceling some runs manually) - but the reason for it was that the 
> instructions on how to reproduce the dev environment on a local Linux VM 
> were not enough for me to successfully do it so I develop on a feature 
> branch in the mentioned VM which gives me more freedom when it comes to 
> reasonable IDE etc, push whenever I want to because I will open a PR and 
> squash anyway, and then pull the branch on a Rotunda machine and debug 
> and test there. The CI run results are not really what interests me at 
> that point but they run every time (hence my comments about automatic 
> cancellation on a new commit) and when I included some printing the logs 
> were getting out of hand (which is probably where most of the pressure 
> on the GitLab CI machine comes from - the amount of logs generated).

I am sorry to hear that the instructions were not clear enough. Perhaps 
can you elaborate on this a bit more?

You can also disable CI for a specific commit but I believe that if you 
are using Git to synchronize code between two machines to only allow you 
to code on one and test on another (if I understand the issue 
correctly), then your setup should be fixed first. Just to make it more 
comfortable for you.

Note that you can also mount the remote disks via SSHFS and VSCode can 
also work in remote mode somehow. Perhaps your colleagues that are using 
it that way (I see several .vscode directories on lab.d3s) can share 
some links and comments about this.

Hope this explain things a bit more.

Cheers,
- VH


> 
> Looking forward to your reply hoping it will bring more clarity so I can 
> adjust and prevent future problems of this nature.
> 
> Regards
> Lukáš Bastián
> 
> 
> On Mon, Nov 9, 2020 at 7:07 AM Vojtech Horky <horky at d3s.mff.cuni.cz 
> <mailto:horky at d3s.mff.cuni.cz>> wrote:
> 
>     Hello,
> 
>     just few notes in no particular ordering.
> 
>     It is possible to setup your Git commit identity (e-mail) in GitLab
>     instead of the default one.
> 
>     When you are using Git for the first time on a given machine (e.g.
>     lab.d3s.mff.cuni.cz <http://lab.d3s.mff.cuni.cz> or Rotunda
>     servers), you should set your Git name
>     and e-mail via "git config" command (probably with the --global
>     switch).
>     Note that Git will warn you during every commit that you have not
>     done so.
> 
>     It is a good practice to always commit code that compiles. For
>     master/main branch, the rule is usually to commit code that compiles
>     and
>     passes the tests.
> 
>     GitLab CI is not your development environment. You are supposed to
>     develop, debug and test on your machine and push to GitLab only
>     reasonable commits. There is no reason to push every single commit to
>     GitLab to see whether it compiles.
> 
>     As a matter of fact, if you commit/push every minute to debug your
>     knowledge of C syntax and/or for every word in one code comment, two
>     things may happen. We may consider this as a gaming of the activity
>     points. And we might be forced to setup some type of accounting for the
>     use of CI to ensure fairness (I already received a complaint from
>     GitLab
>     administrator about overloading the CI machines).
> 
>     Note that it is completely fine and highly recommended to create very
>     small, focused commits but each commit should represent a
>     compact/logical/atomic change of your project. Think always about a
>     reviewer that clicks on your commit - are there only changes related to
>     the topic and are there all the changes related to the topic?
> 
>     Note that "git add -p" allows you to split big change into multiple
>     commits quite easily.
> 
>     As a further reading, I would recommend [1] about good commit messages
>     and perhaps [2] as well as it reiterates some notes about committing in
>     general.
> 
>     [1] https://chris.beams.io/posts/git-commit/
>     [2]
>     https://koukia.ca/git-some-commit-best-practices-and-how-to-undo-your-recent-commits-d13c9dc3144f
> 
>     Hope this helps,
>     - VH
>     _______________________________________________
>     NSWI004 mailing list
>     NSWI004 at d3s.mff.cuni.cz <mailto:NSWI004 at d3s.mff.cuni.cz>
>     https://d3s.mff.cuni.cz/mailman/listinfo/nswi004
> 
> 
> _______________________________________________
> NSWI004 mailing list
> NSWI004 at d3s.mff.cuni.cz
> https://d3s.mff.cuni.cz/mailman/listinfo/nswi004
> 


More information about the NSWI004 mailing list