Simple scripts, Git from command line (3) | Labs | NSWI177

Information below is not for the current semester. The current semester can be found here.

Labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14.

Before class reading
Scripts
Git on the command line
Running tests locally
Graded tasks (deadline: Mar 13)
Learning outcomes

The goal of this lab is to introduce you to the Git command-line client and how to write reusable scripts.

Do not forget that the Before class reading is mandatory and there is a quiz that you are supposed to complete before coming to the labs.

Before class reading

There are two big topics for this lab. In the first one, we will demonstrate how Linux is suited for interpreted languages. In the second one, we will make our work with GitLab much more efficient and see how to transfer files from it and back to it via a command line client.

Linux scripting

A script in the Linux environment is any program that is interpreted when being run (i.e., the program is distributed as a source code). In this sense, there are shell scripts (the language is the shell as you have seen it last time), Python, Ruby or PHP scripts.

The advantage of so-called scripting languages is that they do require only a text editor for development and that they are easily portable. Disadvantage is that you need to install the interpreter first. Fortunately, Linux typically comes with many interpreters preinstalled and starting with a scripting language is thus very easy.

Simple shell scripts

To write a shell script, we simply write the commands into a file (instead of typing them in a terminal).

Therefore, a simple script that prints some information about your system could be as simple as the following.

cat /proc/cpuinfo
cat /proc/meminfo

If you store this into a file first.sh, then you can execute it with the following command.

bash first.sh

Notice that we have executed bash as that is the shell program (interpreter) that we are using and the name of the input file.

It will cat those two files (note that we could have executed a single cat with two arguments as well).

Recall that your factor.py script can be executed with the following command (again, we run the right interpreter).

python3 factor.py

Shebang and executable bit

Running scripts by specifying the interpreter to use (i.e., the command to run the script file with) is not very elegant. There is an easier way: we mark the file as executable and Linux handles the rest.

Actually, when we execute the cat command or mc, there is a file (usually in the /bin or /usr/bin directory) that is named cat or mc and marked executable. (For now, imagine the special executable mark as a special file attribute.) Notice that there is no extension.

However, marking the file as executable is only the first half of the solution. Imagine that we create the following content and store it into a file hello.py marked as executable.

print("Hello")

And then we want to run it.

But wait! How will the system know which interpreter to use? For binary executables (e.g., originally from C sources), it is easy as the binary is (almost) directly in the machine code. But here we need an interpreter first.

In Linux, the interpreter is specified via so-called shebang or hashbang. As a matter of fact, you have already encountered it several times: When the first line of the script starts with #! (hence the name hash and bang), Linux expects a path to the interpreter after it and will run this interpreter and ask it to execute the script. If there is no shebang, the behavior is not well defined.

The Linux kernel refuses to execute shebang-less scripts. But if you run them from the shell, the shell will try interpreting them as shell scripts. It is good practice not to rely on this behavior.

For shell scripts, we will be using #!/bin/bash, for Python we need to use #!/usr/bin/env python3. We will explain the env later on; for now, please just remember to use this version.

Note that most interpreters use # to denote a comment which means that no extra handling is needed to skip the first line (as it is really not needed by the interpreter).

You will often encounter #!/bin/sh for shell scripts. For most scripts is actually does not matter: simple constructs works the same, but /bin/bash offers some nice extensions. We will be using /bin/bash in this course as the extensions are rather nice.

You may need to use /bin/sh if you are working on older systems or you need to have your script portable to different flavours of Unix systems.

To complicate things a bit more, on some systems /bin/sh is the same as /bin/bash as it is really a superset.

Bottom line is: unless you know what you are doing, stick with #!/bin/bash shebang for now.

Now back to the original question: how is the script executed. The system takes the command from the shebang, appends the actual filename of the script as a parameter, and runs that. When the user specifies more arguments (such as --version), it is appended as well.

For example, if hexdump were actually a shell script, it would start with the following:

#!/bin/bash

# Rest of the code

Executing hexdump -C file.gif would then actually execute the following command:

/bin/bash hexdump -C file.gif

Notice that the only magic thing behind shebang and executable files is that the system assembles a longer command line.

The user does not need to care about the implementation language.

Git principles

So far, our interaction with GitLab was over its GUI. We will switch to the command line for higher efficiency now.

Recall that GitLab is built on top of Git which is the actual versioning system used.

Git offers a command-line client that can download the whole project to your machine, track changes in it, and then upload it back to the server (GitLab in our case, but there are other products, too).

We will go through the actual commands during the labs, here we will describe a high-level overview of the operations that a developer typically performs.

The `git` command

Virtually everything around Git is performed by its git command. Its first argument is always the actual action – often called a subcommand – that we want to perform. For example, there is git config to configure Git and git commit to perform a commit.

There is always a built-in help available via the following command:

git SUBCOMMAND --help

Manual pages are also available as man git-SUBCOMMAND.

Git has over 100 subcommands available. Don’t panic, though. We will start with less than 10 of them and even quite advanced usage requires knowledge of no more than 20 of them.

Working with Git locally

The very first operation you need to perform is so called clone. During cloning, you copy your project source code from the server (GitLab) to your local machine. The server may require authentication for cloning to happen.

Cloning also copies the whole history of the project. Once you clone the project, you can view all the commits you have made so far. Without need for an internet connection.

The clone is often called a working copy. As a matter of fact, the clone is a 1:1 copy, so if someone deleted the project, you would be able to recreate the source code without any problem. (That is not true about the Issues or the Wiki as it applies only to the Git-versioned part of the project.)

As you will see, the whole project as you see it on GitLab becomes a directory on your hard-drive. As usually, there are also GUI alternatives to the commands we will be showing here, but we will devote our attention to the CLI variants only.

Basic workflow

After the project is cloned, you can start editing files. This is completely orthogonal to Git and until you explicitly tell Git to do something, it does not touch your files at all.

It is also important to note that Git will not fetch updates from the server automatically for you. That is, if you clone the project and then modify something on GitLab directly, the changes will not propagate to your working copy unless you explicitly ask for it.

Once you are finished with your changes (e.g., you fixed a certain bug), it is time to tell Git about the new revision.

There are commands that would allow you to review the changes. You will see a list of modified files (i.e. their content differs from last commit) and you can also see a so called diff (sometimes also called a patch) that describes the change.

The diff will typically look like this:

--- 01/factor.py
+++ 01/factor.py
@@ -2,5 +2,7 @@

 def main():
-    print('-')
+    x = get_number()
+    for i in get_factors(x):
+        print(i)

 if __name__ == '__main__':

How to read it? It is a piece of plain text that contains the following information:

the file where the change happened
the context of the change
- line numbers (-2,5 +2,7)
- lines without modifications (starting with space)
the actual change
- lines added (starting with +)
- lines removed (starting with -)

Once you are happy with these changes, you can stage the changes. This is Git-speak for saying these files (their current content) will be in the next revision. Often, you will stage all changed files. But sometimes you may want to split the commit as you actually worked on two different things and first you commit one part and then the other.

For example, you were fixing a bug, but also encountered a typo somewhere along the way. It is possible to add them both to the same commit, but it is much better to keep the commits well organized. The first commit would be a Bugfix in XY, the second one will be Typo fix. It clearly states what the commit changed.

It is actually similar to how you create functions in a programming language. A single function should do one thing (and do it well). A single commit should capture one change.

After staging all the relevant changes, you create a commit. The commit clears the staging status and you can work on fixing another bug :-).

You basically repeat this as long as you need to make changes. Recall that each commit should capture a reasonable state of the project that is worth returning to later.

Whenever you make a commit, the commit remains local. It is not propagated back to the server. To upload the changes (commits) back to the server, you need to initiate a so called push. It uploads all new commits (i.e., those between your clone operation and now) back to the server.

Working on multiple machines

Things get a little bit more complex when you work on multiple machines (e.g., mornings at a school desktop, evenings at your personal notebook).

The initial part is the same: you create a clone at each machine. But when you pushed changes from one machine, you usually want to bring them to the other machines, too.

One obvious option is to simply remove the working copy and clone from scratch. However, Git is much smarter and it is able to do an incremental update by performing a so-called pull. The pull checks latest changes on the server, it compares them with the latest commit in your working copy, and fetches new commits from the server.

As long as you ensure that you work in the following manner, nothing will ever break:

Clone your work on machine A.
Work on machine A (and commit the result)
Push on A (to server).
Move to machine B and clone there.
Work on B (commits).
Push on B (to server).
Move to A and pull (from server).
Work on A (commits).
Push on A.
Pull on B.
Work on B.
Etc (i.e., go to 5).

Once you forgot some of the synchronizing pulls/pushes when switching between machines, problems can arise. They are easy to solve, but we will talk about that in later labs.

For now, you can always do a fresh clone and simply copy files with the new changes and commit again (not the right Git way, but it definitely works).

Before class quiz

The quiz file is available in the 03 folder of this GitLab project.

Copy the right language mutation into your project as 03/before.md (i.e., you will need to rename the file).

The questions and answers are part of that file, fill in the answers in between the **[A1]** and **[/A1]** markers.

The before-03 pipeline on GitLab will test that your answers are in the correct format. It does not check for actual correctness (for obvious reasons).

Submit your before-class quiz before start of the next lab.

After class tasks

They will be published next Monday (as happened for lab 02).

Scripts

Let us go back to the first example from the before-class reading.

cat /proc/cpuinfo
cat /proc/meminfo

If you store this into a file first.sh, then you can execute it with the following command:

bash first.sh

But you already know about the shebang, so we will update it and also mark it as an executable.

#!/bin/bash

cat /proc/cpuinfo
cat /proc/meminfo

To mark it as executable, we run the following command:

chmod +x first.sh

Now we can easily execute the script with the following command:

./first.sh

The obvious question is: why the redundant ./? It refers to the current directory (recall previous lab)?

When you type a command (e.g., cat), shell looks into so-called $PATH to actually find the file with the program. Unlike in other operating systems, shell does not look into the working directory when program cannot be found in the $PATH.

To run a program in the current directory, we need to specify its path. Luckily, it does not have to be an absolute path, but a relative one is sufficient. Hence the magic spell of ./.

If you move to another directory, you can execute it by providing a relative path too, such as ../first.sh.

Run ls in the directory now. You should see first.sh now printed in green. If not, you can try ls --color or check that you have run chmod correctly.

If you do not have a colorful terminal (unusual but still possible), you can use ls -F to distinguish file types: directories will have a slash appended, executable files will have an asterisk next to their filename.

Changing working directory

Let us modify the program a little bit.

cd /proc
cat cpuinfo
cat meminfo

Run the script again.

Notice that despite the fact that the script changed directory to /proc, when it terminates, we are still in the original directory.

Try inserting pwd to ensure that the script really is inside /proc.

This is an essential take away – every process (running program; this includes scripts) has its own current directory. When it is started, it inherits the directory from its caller (e.g., from the shell it was run from). Then it can change the current directory, but that does not affect other processes in any way. Thus, when the program terminates, the caller is still in the same directory.

Debugging the scripts

If you want to see what is happening, run the script as sh -x first.sh. Try it now. For longer scripts, it is better to print your own messages as -x tends to become too verbose and it is rather a debugging aid.

To print a message to the terminal, you can use the echo command. With few exceptions (more about these later), all arguments are simply echoed to the terminal.

Create a script echos.sh with the following content and explain the differences:

echo alpha bravo charlie
echo alpha  bravo   charlie
echo "alpha   bravo"   charlie

Answer.

Command-line arguments

Command-line arguments (such as -l for ls or -C for hexdump) are the usual way to control the behaviour of CLI tools in Linux. For us, as developers, it is important to learn how to work with them inside our programs.

We will talk about using these arguments in shell scripts later on, today we will handle them in Python.

Accessing these arguments in Python is very easy. We need to add import sys to our program and then we can access these arguments in the sys.argv list.

Therefore, the following program only prints its arguments.

#!/usr/bin/env python3

import sys

def main():
    for arg in sys.argv:
        print("'{}'".format(arg))

if __name__ == '__main__':
    main()

When we execute it (of course, first we chmod +x it), we will see the following (lines prefixed with $ denote the command, the rest is command output).

$ ./args.py
'./args.py'
$ ./args.py one two
'./args.py'
'one'
'two'
$ ./args.py "one  two"
'./args.py'
'one  two'

Note that the zeroth index is occupied by the command itself (we will not use it now, but it can be used for some clever tricks) and notice how the second and third command differs from inside Python.

It should not be surprising though, recall the previous lab and handling of filenames with spaces in them.

Other interpreters

We will now try which interpreters we can put in the shebang.

Construct an absolute (!) path (hint: man 1 realpath) to the args.py that we have used above. Use it as a shebang on an otherwise empty file (e.g. use-args) and make this file executable. Hint.

And now run it like this:

./use-args
./use-args first second

You will see that the argument zero now contains a path to your script. Argument on index one contains the outer script – use-args and only after these items are the actual command-line arguments (first and second).

This is essential – when you add a shebang, the interpreter receives the input filename as the first argument. In other words – every Linux-friendly interpreter shall start evaluating a program passed to it as a filename in the first argument.

While it may seem as an exercise in futility, it demonstrates an important principle: GNU/Linux is extremely friendly towards the creation of mini-languages. If you need to create an interpreter for your own mini-language, you only need to make sure it accepts the input filename as the first argument. And voilà, users can create their own executables on the top of it.

As another example, prepare the following file and store it as experiment (with no file extension) and make the file executable:

#!/bin/bash

echo Hello

Note that we decided to drop the extension again altogether. The user does not really need to know which language was used. That is captured by the shebang, after all.

Now change the shebang to #!/bin/cat. Run the program again. What happens? Now run it with an argument (e.g., ./experiment experiment). What happened? Answer.

Change the shebang to /bin/echo. What happened?

Git on the command line

This section will describe how to use Git on the command line as opposed to using the GUI superstructure offered by GitLab. We already described the motivation for both Git and GitLab in the previous lab. Here we will show how to access the files from the command line to improve your experience when using Git.

While it is possible to edit many files on-line in GitLab, it is much easier to have them locally and use a better editor (or IDE). Furthermore, not all tools have their on-line counterparts and you have to run them locally.

Setting your editor

Git will often need to run your editor. It is essential to ensure it uses the editor of your choice.

We will explain following steps in more detail later on, for now ensure that you add the following line to the end of ~/.bashrc file (replace mcedit with editor of your choice):

export EDITOR=mcedit

Now open a new terminal and run (including the dollar sign):

$EDITOR ~/.bashrc

If you set the above correctly, you should see again .bashrc opened in your favorite text editor.

You need to close all terminals for this change to make an effect (i.e., before you start using any of the Git commands mentioned below).

Important: never use a graphical editor for $EDITOR unless you really know what you are doing. Git expects a certain behaviour from the editor that is rarely satisfied by GUI editors but is always provided by a TUI-based one.

If you want to know why GUI editors are a bad choice, the explanation is relatively simple: Git will start a new editor a commit message (see below) and it will assume that the commit message is ready once the editor terminates. However, many GUI editors work in a mode where there is single instance running and you only open new tabs. In that case, the editor that is launched by Git actually terminates immediatelly – it only tells the existing editor to open a new file – and Git sees only an empty commit message.

Configure Git

One of the key concepts in Git is that each commit (change) is authored – i.e., it is known who made it. We will skip commit signing here and will not be considering identity forgery/theft here.

Thus, we need to tell Git who we are. The following two commands are the absolute minimum you need to execute on any machine (or account) where you want to use Git.

git config --global user.name "My real name"
git config --global user.email "my-email"

The --global flag specifies that this setting is valid for all Git projects. You can change this locally by running the same command without this flag inside a specific project. That can be useful to distinguish your free-lance and corporate identity, for example.

Note that Git does not check the validity of your e-mail address or your name (indeed, there is no way how to do it). Therefore, anything can be there. However, if you use your real e-mail address, GitLab will be able to pair the commit with your account etc. which can be quite useful.

The decision is up to you.

Cloning for the first time (`git clone`)

For the following example, we will be using the repository teaching/nswi177/2022/common/csv-templater.

Fork this repository to your own namespace (in GitLab via web browser) first. Hint.

Forking a project means creating a copy for yourself on GitLab. Create the fork – you do not have write access to our repository and we do not want you to fight over the same files anyway.

Move to your (forked) project and click on the blue Clone button. You should see Clone with SSH and Clone with HTTPS addresses.

Copy the HTTPS address and use it as the correct address for the clone command:

git clone https://gitlab.mff.cuni.cz/YOUR_LOGIN/csv-templater.git

The command will ask you for your username and password. As usual with our GitLab, please use the SIS credentials.

Note that some environments may offer you to use some kind of a keyring or another form of a credential helper. Feel free to use them, later on, we will see how to use SSH and asymetric cryptography for seamless work with Git projects without any need for username/password handling.

Note that you should have the csv-templater directory on your machine now. Move to it and see what files are there. What about hidden files? Answer.

Unless stated otherwise, all commands will be executed from the csv-templater directory.

Making a change (`git status` and `git diff`)

Fix typos on line 11 in the Python script and in the README.md and run git status before and after the change. Read carefully the whole output of this command to understand what it reports.

Create a new file, demo/people.csv with at least three columns and 4 rows. Again, check how git status reports this change in your project directory.

What have you learned? Answer.

Run git diff to see how Git tracks the changes you made. Why this output is suitable for source code changes?

Note that git diff is also extremely useful to check that the change you made is correct as it focuses on the context of the change rather than the whole file.

Making the change permanent (`git add` and `git commit`)

Now prepare for your first commit (recall that commit is basically a version or a named state of the project) – run git add csv_templater.py. We will take care of the typo in README.md later.

How git status differs from the previous state? Answer.

Make your first commit via git commit. Do not forget to use a descriptive commit message!

Note that without any other options, git commit will open your text editor. Write the commit message there and quit the editor (save the file first). Your commit is done.

For short commit messages, you may use git commit -m "Typo fix" where the whole commit message is given as argument to the -m option (notice the quotes because of the space).

How will git status look like now? Think about it first before actually running the command.

Sending the changes to the server

We will now propagate your changes back to GitLab by using git push. It will again ask for your password and after that, you should see your changes on GitLab.

Which changes are on GitLab? Answer.

Exercise

Add the second typo as a second commit from the command line.

As a third commit, add Date field to demo/call.txt.

Push now the changes to GitLab. Note that all commits were pushed at the same time.

Browsing through the commits (`git log`)

Investigate what is in the Repository -> Commits menu in GitLab. Compare it with the output of git log and git log --oneline.

Getting the changes from the server

Change the title in the README.md to also contain written in Python. But this time make the change on GitLab.

Ensure you first push your local commits.

To update your local clone of the project, execute git pull.

Note that git pull is quite powerful as it can incorporate changes that happened virtually at the same time in both GitLab web UI as well as in your local clone. However, understanding this process requires also knowledge about branches, which is out-of-scope for this lab.

Thus for now, remember to not mix changes locally and in GitLab UI (or on a different machine) without always ending with git push and starting with git pull.

Going further

The command git log shows plenty of information but often you are interested in recent changes only. You use them to refresh your mind of what you were working on etc.

Hence, the following command would actually make more sense:

git log --max-count=20 --oneline

But that is quite long and difficult to remember. Try the following instead:

git config --global alias.ls 'log --max-count=20 --oneline'

That is even worse! But with the above magic, Git will suddenly start to recognize the following subcommand:

git ls

And that could save time.

Our favorite aliases are for the following commands.

st = status
ci = commit
ll = log --format='tformat:%C(yellow)%h%Creset %an (%cr) %C(yellow)%s%Creset' --max-count=20 --first-parent

Try running them first before adding them to your Git.

Running tests locally

Because you now know about shebangs, executable bits and scripts in general, you have enough knowledge to actually run our tests locally without needing GitLab.

It should make your development faster and more natural as you do not need to wait for GitLab.

Simply execute ./bin/run_tests.sh in the root directory of your project and check the results.

You can even run only a specific subset of tests.

./bin/run_tests.sh tasks/01/factor
./bin/run_tests.sh tasks/01
./bin/run_tests.sh quizzes/02/before

Note: If you are using your own installation of Linux, you might need to install the bats package first.

Graded tasks (deadline: Mar 13)

Starting with these tasks, do not forget to mark your scripts as executable and always add a proper shebang. There will be extra tests for this.

Using Git CLI (20 points)

Use git config to temporarily change your e-mail to YOUR_GITLAB_LOGIN@nswi177.gitlab.mff.cuni.cz (surely, replace YOUR_GITLAB_LOGIN with the right one) and make one commit to your graded task repository with this e-mail. You can create a new file 03/git_cli.txt if you do not know what to change ;-).

Update: This task is checked by GitLab CI. However, GitLab tests may suddenly start failing after some time (i.e. first you see the tests as passing and they fail after several more commits). This is because GitLab does not clone the full repository history when running the pipeline (i.e., only recent commits are retrieved). We will check it on a full-depth clone so it will not impede your grading.

In other words: if this task passed once on GitLab pipeline and started to fail for no apparent reason later on, it is okay.

`03/tree.py` (30 points)

Update your script 02/tree.py to accept a command-line argument for the name of the directory to be listed. When no argument is provided, it should still print the current directory.

Also add support for a -d option to print only directories. The switch can appear before or after the directory specification or it can appear alone when listing the current directory.

`03/factor.py` (20 points)

Rework your script from the first lab to read the number from a command-line argument instead.

`03/architecture.sh` (10 points)

Update your script from the previous lab to have a proper shebang and executable bit set.

`03/git.txt` (20 points)

You will need the following repository.

https://d3s.mff.cuni.cz/f/teaching/nswi177/202122/labs/task-03.git/

There are multiple files in this repository. Copy the one mentioned in the commit messages to 03/git.txt.

In other words, clone the above repository, view existing commits and in the commit messages you will see a filename that you should copy to your own project (as 03/git.txt).

Automated tests only check presence of the file, not that you have copied the right one.

Learning outcomes

Conceptual knowledge

Conceptual knowledge is about understanding the meaning and context of given terms and putting them into context. Therefore, you should be able to …

explain what is meant by a script in Linux environment
explain what are command-line arguments
explain what is a shebang and an executable bit and how they influence script execution
explain how are parameters (arguments) passed in a script with a shebang
explain what is a Git working copy (clone)

Practical skills

Practical skills is usually about usage of given programs to solve various tasks. Therefore, you should be able to …

create a Linux script with a proper shebang
set the executable bit of a script
access command-line arguments in a Python script
configure Git (name and e-mail)
clone a Git repository over HTTPS
review changes in a Git working copy
create a commit in a Git repository
upload new changes to a Git server (e.g., GitLab) and retrieve updates from it
view summary information about previous commits
customize Git with aliases (optional)

Recall that we had to add quotes to ls (for example) when a filename contained a space. Otherwise, ls accepts a list of files as arguments.

Here it is very similar – echo accepts a list of words to print (and prints them with a single space in between). If you need to print multiple spaces, you need to surround them with quotes as otherwise the blank space is parsed by the shell as a separator.

Hence, the output (we replaced spaces with ␣ for better clarity in the following dump):

alpha␣bravo␣charlie
alpha␣bravo␣charlie
alpha␣␣␣bravo␣charlie

Note that you may also substitute echo with args.py that we talk about later on to better understand what is happening under the hood.