Other labs: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

Přeložit do češtiny pomocí Google Translate ...

Lab #6 (Mar 23 – Mar 27)

Before class

  • Complete all the exercises from lab 3.


  • Shell scripting.


Clone repository teaching/nswi177/2020-summer/upstream/examples to have local access to the files needed for following tasks. Making a fork of this repository is not needed in order to complete the exercises. We will be using scripts from the shell/ directory.
Check the contents of the argv script. In which language it is written? How many lines did you need to check for that? Hint.
Run the argv script with * as the only argument. What will be the output? What about the file with space in its name?
Run the argv script by specifying its full path (i.e. use something like ~/nswi177/examples/shell/argv instead of ./argv). How the zeroth argument differs? How it differs if you specify the interpreter explicitly (e.g. run with bash argv)?
Investigate the program env. Why is its use preferred in shebang over specifying path to the interpreter directly (e.g. why we usually see #!/usr/bin/env python3 instead of #!/usr/bin/python3)? What does env do when no arguments are specified? Hint.
What is the purpose of the program grep?
Use grep to print all lines containing sed in lorem-ipsum.txt. Check what the --color=auto option does. Solution.
Find all lines containing date in ISO format (YYYY-MM-DD) in lorem-ipsum.txt. Which switch should you use to print only the dates (i.e. not the rest of the line). Hint.Solution.
Extend the previous example. Print all dates in ISO format that are in lorem-ipsum.txt, each date on a separate line. Print only valid dates from 20th and 21st century. Hint.Solution.

sed is a filter that is able to transform each line according to a regular expression.

Assuming output from the previous exercise, following command converts the ISO date to the Czech notation.

sed 's:-0:-:g' | sed 's#\(.*\)-\(.*\)-\(.*\)#\3.\2.\1#'

Note that the first command does not use regular expressions at all and simply removes the leading zeros (note that the first character after the s (for substitute) command is the delimiter between 3 parts: pattern, replacement and flags). The second sed invocation assumes that the format of each line is YEAR-MONTH-DATE and reverses the order using backslash references to the matched text in parentheses (groups).

Also note that grep and sed use slightly different syntax in the expressions and that the syntax is also different from the one you find in Python (generally – and unfortunately – every language has its own minor differences in the syntax).

Use sed to convert the Czech date notation (from previous example) back to ISO. Do not forget to add the leading zeros. Solution.

Replace all occurences of word amet in lorem-ipsum.txt with the word tema.

Check whether

sed REPLACEMENT_SCRIPT <lorem-ipsum.txt >lorem-ipsum.txt`

would work.

Then find out what -i does.

Use git restore to remove changes to lorem-ipsum.txt from previous example.
Again, prepare the command for replacing amet with tema. But use diff to check the modifications before using the -i option. Hint.Solution.

Can you explain how the conditional works in the following snippet?


if $BE_VERBOSE; then
    echo "Launching now..."
Learn about || and && operators.
Learn about command grouping: ( commands ) and { commands }.
Look inside scopes.sh script. Make sure you understand what is happening with the variable scoping (i.e. export and subshells).
What is the difference between a pipe and a command substitution ($( cmd ))? Solution.
Investigate the difference between ./argv $( echo * ) and ./argv "$( echo * )".

Learn about find command.

Note that from the many switches there are, it is useful to remember at least -type d (or -type f), -name 'pattern' and -exec cmd {} \; that are very useful for everyday work.


Learn about xargs as a smarter alternative to $( ... ).

When is use of xargs necessary?


Make sure you understand what is happening in the following pipe (i.e. every switch and why it is needed for safe processing of non-trivially named files).

find -print0 | xargs -0 -L 2 ./argv my-argument

Compare with a more trivial variant without the zero-delimiters:

find | xargs -L 2 ./argv my-argument
Examine how find_duplicates.sh works. Does two consecutive sorting make sense? Why both -0 and -zero are needed when xargs is used?
Look at read.sh and learn what read does. Why read LOGIN NAME reads both firstname and surname? Explain why the script prints the message to standard error and why the command is not executed but printed only. Hint.
Look at bad_sum.sh and good_sum.sh. Explain the difference and why the straighforward approach in bad_sum.sh does not work as expected. Hint.
Refresh how wildcard expansion works, use wildcards.sh as a starting point.
Learn about here-doc syntax. Use heredoc.sh as a starting point.

Investigate how Python and shell scripting can be easily intermixed as shown in inlined.sh. Note that the reason here is that we will be processing a CSV file with quoting that is much more difficult to process with plain shell utilities.

What are the advantages and disadvantages of such approach?

What is ShellCheck?
Run ShellCheck on all scripts from this lesson. Investigate in how many cases the checker was correct and pointed you to a problematic piece of code.

See gitlab_commits.sh for an example of a bigger shell script.

Focus on the following idioms that are often present in shell scripts:

  • getopt parses command-line arguments (notice also how the while loop is structured
  • $VERBOSE turns on debugging messages (hint: help :)
  • $DUMP_RESPONSES is used in conditionals
  • use of cURL to post HTTP requests from CLI
  • use of jq to process JSON data

To try it, prepare config.ini file with your GitLab access token (get one here):


And then run the script as:

./gitlab_commits.sh -c config.ini -p mff -v teaching/nswi177/2020-summer/upstream/csv-templater

Note that this script uses GitLab API to query GitLab projects in a programatic way.