Main

Introduction to UNIX (2009/2010)

Annotation from SIS:
This course would give information and practice in essentials of UNIX family operating systems, mainly from the user point of view. After finishing this course, students are to be able to write a mid-range unix shell program.

15:40 Th, SU2

NSWI095 - Summer semester 2009/2010

Information for Czech students

Konzultácie

  • po dohode e-mailom, Malá Strana, druhé poschodnie, dvere č. 204

Podmienky pre udelenie zápočtu

  • Prvou podmienkou je zaradenie do skupiny - Štvrtok 15:40, SU2 (limit je 15 študentov)
  • Doporučujem prítomnosť na cvičeniach hlavne tým študentom, ktorí nemajú praktické skúsenosti s prácou v UNIXe. Aj keď nie je prítomnosť na cvičeniach podmienkou k zápočtu, nepodceňujte praktickú prípravu, ktorá je nevyhnutná pre úspešné absolvovanie skúškovej písomky.
  • Druhou podmienkou je získanie minimálne 75% bodov zo zadaných úloh. Každú úlohu je nutné vypracovať do 14 dní od jej zadania a zaslať e-mailom. Prekročenie termínu znamená 0 bodov za úlohu. Zadanie úloh sa bude postupne objavovať po skončení cvičenia na tejto stránke. Získaný počet bodov si môžete skontrolovať v SISe (aplikácia Grupík).

For more information see: http://is.cuni.cz/eng/studium/predmety/index.php?do=predmet&kod=NSWI095

Lab 1: 2010-02-25

  • Try it out:
    • UNIX on your own computer by running some Live CD e.g. OpenSUSE Linux Live KDE CD
    • You can also login into one of the u-plX machines in lab e.g. u-pl0.ms.mff.cuni.cz, use PUTTY on Windows, ssh on Linux
  • Introduction to UNIX:
    • There are many UNIX flavours, be always aware of it when working on a particular version of UNIX
    • What is a shell, there are many different shells (sh, ash, bash ...) with various features (keys, history ...)
    • Useful shortcuts in some shells:
      • CTRL+C = terminates the running process
      • CTRL+Z = suspends the running process (fg continue, bg continue in background)
      • CTRL+D = closes all file descriptors (see below what a file descriptor is)
      • CTRL+L = clear screen
      • CTRL+S = suspend printing to the terminal, CTRL-Q continue printing
      • CTRL+R (history search)
    • Through shell you can run commands (small programs like ls or large applications like firefox)
    • When executing commands from shell, you are actually starting processes. When executed, a program runs as a "foreground process" (the shell is temporarily suspended until the process terminates)
    • Background processes can be started directly from shell using "&", e.g. firefox &. In that case, the process is running in parallel to the shell and you can immediatelly continue executing other commands.
    • Processes usually communicate through file descriptors = imagine channels numbered 0,1,2 ...
    • File descriptor 0 = STDIN = Standard Input = by default shell redirects the input from the terminal
    • File descriptor 1 = STDOUT = Standard Output = by default shell redirects data from this descriptor to the terminal output, that's why you can actually see some output :-)
    • Indeed, file descriptors can also represent files
    • You can use "<" to redirect content of a file to STDIN e.g. grep root < /etc/passwd
    • You can use ">" to redirect STDOUT to a file e.g. cat /etc/passwd > mytmpfile
    • You can combine (chain) multiple processes to perform complex tasks using "pipes" e.g. cat /etc/passwd | grep qmail | grep 200 | head 3
    • When piping, there are actually multiple processes running at the same time. The operating system ensures synchronisation among the communicating processes. A process waits before reading from its STDIN until other process writes something to its STDOUT (and vice versa)
    • You can also send signals to a running process:
      • kill -9 1234 - Kills a process with process ID 1234 (PID) even if not responding (last resort)
      • kill -l - prints a list of all available signals
      • ps aux - shows a list of all running processes with their PIDs and USERs
  • Useful apps:
    • mc - Midnight Commander (available for some UNIXes, especially Linux) - to ease your day-to-day work
    • top or htop - interactively shows CPU and memory usage statistics and list of processes
    • vi or vim - text editor (Some love it, some hate it. You have to know about it anyway.)
    • When running a graphical desktop in Lab, e.g. KDE or GNOME, you can run one of the terminal emulators (shell in a window) like konsole or xterm. A useful shortcut in KDE is ALT-F2 - a fast application launcher.

Lab 2: 2010-03-04

  • UNIX directory structure, dir separator is '/', normal files vs hidden files
  • nano, mcedit, vim
  • secure access to remote server using ssh
    • GUI tunneling: ssh -X
    • running remote commands: ssh [command]
  • Difference between redirection ">" and pipe "|" (temporary files vs pipes)
  • shell wildcards (pathname expansion) *, ?, [a-z], e.g. cp *.txt abc
  • some arithmetics: wc, expr, $((..))
  • How to write a shell script
    • chmod +x script.sh
    • #!/bin/sh
    • Variables in shell (bash)
      • ${parameter:-defaultValue} Get default shell variables value
      • ${parameter:=defaultValue} Set default shell variables value
      • ${parameter:?"Error Message"} Display an error message if parameter is not set
      • ${#var} Find the length of the string
      • ${var%pattern} Remove from shortest rear (end) pattern
      • ${varpattern} Remove from longest rear (end) pattern
      • ${var:num1:num2} Substring
      • ${var#pattern} Remove from shortest front pattern
      • ${var##pattern} Remove from longest front pattern
      • ${var/pattern/string} Find and replace (only replace first occurrence)
      • ${var//pattern/string} Find and replace all occurrences
    • write a shell script that creates shell scripts

Homework:

  • (1pt) Create a shell script that copies files using ssh without using SCP/SFTP
  • (2pt) Create a shell script that computes average length of all file names in the defined directory and stores the value in a new file ~/stats/NNNN. Every execution of the script should increment NNNN by 1.

Example:

  • execution #1: ~/stats/0001
  • execution #2: ~/stats/0002
  • ...

Hints:

  • You can use printf %04d X to format a number
  • Hidden files might help.

Lab 3: 2010-03-11

  • overview of the most important approaches to inter-process communicaton in shell
  • man test
  • mkfifo
  • find -name -type -exec
  • xargs
  • sort -k -n -r -u -t, uniq, join -1 -2 -v -j -t
  • INNER JOIN
  • LEFT JOIN: -a 1
  • RIGHT JOIN: -a 2
  • INVERTED JOIN: -v

Homework:

  • (1pt) write a command or a shell script that:
    • first looks for all directories on the filesystem that contain the "bin" substring
    • and then lists all files from these directories that contain the 'sh' subsring in the file name.
    • every output line should contain a file name and a short text describing the file format (hint: use 'file' tool)
  • (3pt) consider 3 files representing database tables.
    • USRFILE is /etc/passwd
    • GRPFILE is /etc/group
    • TXTFILE third file contains on every line two fields delimited by ':' character.
      • first field is the username or group name
      • second field is some arbitrary text
      • Example: root:This is Admin
    • You have to write a shell script that:
      • For every user in USRFILE prints the following fields delimited by ':' character.
        • user name (field #1 from USRFILE)
        • full name (field #5 from USRFILE)
        • name of the default group (field #1 from GRPFILE)
        • list of users from the default group (field #4 from GRPFILE, it may be empty)
        • text from the TXTFILE if defined for the user
        • text from the TXTFILE if defined for the default group
      • then prints a list of groups that were not printed in the previous step:
        • group name (field #1 from GRPFILE)
        • text from the TXTFILE if defined for the group
  • Notes:
    • use sort and join commands, avoid iteration such as while

Lab 4: 2010-03-18

  • id, whoami, last, w
  • ln, cp -pr
  • touch -t -r, mkdir -p, tail, diff -y, tr
  • grep -v

Homework:

  • (3pt) write a shell script that processes a list of e-mails stored in a file.
    • Your script expects a [commnad] parameter and uses the STDIN as input text. (Note: The command can also accept additional parameters.)
    • The input text contains e-mail addresses - single address per line together with other junk lines, e.g.
    bla bla myaddr@host.domain bla bla myotheraddr@host.domain bla
    • Your script picks only lines containing valid email addresses and for every address it calls the given command with the address as an argument
    • Example:
    1. echo -e 'bla\na@a.a\nbla\nb@b.b' | yourscript.sh echo "FOUND: "
    FOUND: a@a.a FOUND: b@b.b

Lab 5: 2010-03-25

Regular expressions:

  man pcre
  POSIX CHARACTER CLASSES

Extended regex e.g. [a-z]{5}

  grep VS grep -E
  sed VS sed -r

Date formatting:

  date -d DATESTR +PATTERN
  date -d @UNIXTIME +PATTERN

Other stuff:

  $RANDOM
  sed -i replace in place
  find 2>&1 -exec | tail

Homework:

  • (1pt) Create two scripts dos2unix and unix2dos that convert between CRLF and LF. Propose some verification mechanism to check the correctness of your scripts.
  • (3pt) Create a script that copies files matching a file name pattern form a source directory into a destination directory that contain at least a single date in ISO format YYYY-MM-DD. For every such a file, the script should generate a random date from a given date interval and change all occurrences of ISO dates inside the file and also the file's modification time.

The input should be:

  • source directory (e.g. /usr/share/doc)
  • destination directory
  • beginning of a date interval in ISO format (e.g. 2000-01-01)
  • end of a date interval in ISO format (e.g. 2010-01-01)
  • file name pattern (e.g. *.html)
  • maximum number of files to be processed (e.g. 100)

Lab 6: 2010-04-01

  • the seq command generates a sequence of numbers, try the difference between: /bin/echo $(seq 1 1000000) and seq 1 1000000 | xargs /bin/echo
  • for I in $(seq 1 10)
  • ps -o
  • try to avoid cat /dev/stdin in scripts
  • grep -E '(A|B)' vs grep '\(A\|B\)'
  • sed -e ':L;s/a/b/;tL'

 Z rfc793.txt" spočítejte, kolik
 řádků obsahuje slovo "packet" přesně v tomto tvaru, bez uvozovek.
 Výsledek má vyjít 15. RFC je na webu ke stažení.
 http://www.rfc-editor.org/rfc/rfc793.txt

 grep -c '\(^\|[^a-zA-Z]\)packet\([^a-zA-Z]\|$\)' rfc793.txt

 ve stejném souboru spočítejte řádky, které obsahují slovo
 "packet" NEBO slovo "network", přesně v těchto tvarech.
 Výsledek je 48.

 grep -c '\(^\|[^a-zA-Z]\)\(packet\|network\)\([^a-zA-Z]\|$\)' rfc793.txt

 Totéž jako předchozí, ale nepoužívejte OR (|) rozšíření regularnich.
 výrazů..

 grep -c -e '\bpacket\b' -e '\bnetwork\b' rfc793.txt

 Spočítejte řádky z rfc793, které obsahují řetězce číslic, které jsou
 alespoň délky 3 a ve kterých jsou všechny číslice STEJNÉ. Tedy "222"
 je ten správný řetězec, "2223" také, ale "22445566" ne.
 Výsledek je 6.

 grep '000\|111\|222\|333\|444\|555\|666\|777\|888\|999' rfc793.txt |wc -l
 grep -c -E '([0-9])\1\1' rfc793.txt

 spočtěte dohromady všechna UID uživatelů systému.
 Hint: echo '1 + 1 + 1' | bc

 (getent passwd | cut -d: -f3 | tr '\n' '+'; echo 0) | bc

 Najděte a vypište všechny soubory, z /usr/local kterých nýzev
 (ne cesta) nezačíná výrazem "ina" ani "bc" a konci '.conf'

 find /usr/local -name '*.conf' -not -name 'ina*' -not -name 'bc*'

 Spočítejte soubory, které se v podstromu /tmp změnily behěm
 posledních 24 hodin.

 find /usr/local -cname 1 2>/dev/null | wc -l

Homework:

  • (2pt) V podstromu /usr/local prohledejte konfigurační soubory (*.conf) a zjistěte, kde je řetězec "standalone"). Chci výpis, kde bude u každé řadky vidět, v jakém souboru byl řetězec nalezen.
  • Hint: když zadáte grepu na příkazovou řádku více než jeden soubor jako vstup, jeho výstup je vždy ve formátu "<filename>:<line>" i kdyby byl řetězec nalezen vždy jen v jednom souboru. Dobrým trikem je proto použít jako druhý soubor prázdný soubor - třeba "/dev/null"

Lab 7: 2010-04-08

  • xargs -n1
  • xargs -I{} something {}
  • "here document", example: cat <<===some multiline text===

 vymažte všechny řádky obsahující řetězec "bash"
 sed -e '/bash/d'

 nalezněte jiný způsob řešení předchozího příkladu, s použitím option
 "-n" sedu.
 Hint: najděte si v manu "!"
 sed -ne '/bash/!p' /etc/passwd 

 vymažte prázdné řádky
 sed '/^$/d' /etc/passwd 

 před každý řádek (tj. na samostatný řádek) vypište jeho pořadové číslo
 sed -e '=' /etc/passwd

 pomocí sedu a ničeho jiného spočti radky souboru /etc/passwd
 sed -n '$=' /etc/passwd                                                                                                                 

 vypište poslední řádku
 Hint: vyberte poslední řádku, všechny ostatní nevybrané řádky
 smažte, opět viz "!"
 sed '$!d' /etc/passwd

 očíslujte řádky /etc/passwd tak, že za číslem bude :, pak
 tabelátor a poté původní řádek
 getent passwd | sed = | sed 'N;s/\n/:\t/'

 chytejte řádky obsahující řetězec "nologin" do hold space a
 ten vypište v případě, že narazíte na řetězec "tcsh". Nic jiného než
 hold space nechci vypisovat.
 Hint: sed funkce "H" a "g"
 sed -e '/nologin/H; /tcsh/!d; /tcsh/g' /etc/passwd

 reverse order of lines (emulates "tac")
 sed '1!G;h;$!d'               # method 1
 sed -n '1!G;h;$p'             # method 2

 if a line ends with a backslash, append the next line to it
 sed -e :a -e '/\\$/N; s/\\\n//; ta'

Homework:

  • (5pt) Write a utility to remove closed actions from a todo-list.

Consider a database of todo-lists with the following directory structure: GROUP/TODOLIST.txt

  • There are some directories representing groups
  • every group contains multiple todo-lists
  • every todo-list is a text file that contains multiple definitions of "actions" in the following format:
 %ACTION{ param1="" ... paramN="" }% some text %ENDACTION%
  • Actions may be open or closed, which is denoted by the parameter state="open" or state="closed"
  • A closed action contains the closing date, e.g. closed="2010-03-25"

An example of an "open" action.

 %ACTION{
   created="2010-02-05" creator="Users.G-Man" notify="Users.G-Man"
   due="2010-02-10" priority="normal" state="open" uid="000652"
   who="Users.GordonFreeman" }%
  Kill some Combine soldiers
 %ENDACTION%

An example of a "closed" action.

 %ACTION{
   closed="2010-02-24" closer="Users.GordonFreeman"
   created="2010-02-05" creator="Users.G-Man" notify="Users.G-Man"
   due="2010-02-10" priority="normal" state="closed" uid="000652"
   who="Users.GordonFreeman" }%
  Combine threat mitigated
 %ENDACTION%
  • your script should remove all closed actions that are closed more than a specified number of days

Example:

 # remove closed actions from /var/lib/tododb that are closed more than 7 days
 remove_closed_actions.sh 7 /var/lib/tododb

Lab 8: 2010-04-15

  • shell functions
  • $$, kill, trap, sleep, read
 write a script that can handle signals SIGHUP and SIGINT (by printing some message to stdout)
 when started, your script writes its Process ID to the stdout
 your script should wait in an infinite loop
  • flock - advisory locking
 write a script that tries to acquire an exclusive lock on a lock-file,
 writes its PID and sleeps for 1 second, then releases the lock
 and repeats the process again in an infinite loop.

 run 3 instances of the same script and observe the mutual exclusion.
  • time
 create a set of script in order to compare their performance.
 all scripts will list 4000 files from /usr/share/doc/ directory, every filename will be prefixed with FILE:
 - script #1: while iteration, shell variable counter, echo printer
 - script #2: while iteration, head counter, echo printer
 - script #3: while iteration, head counter, /bin/echo printer
 - script #4: find -exec iteration, head counter, echo printer
 - script #5: find + xargs iteration, head counter, echo printer
 - script #6: use find, sed and head

 now write a script that measures the performance of every script N-times,
 computes the average of sys+user time and sorts the scripts accordingly.

Lab 9: 2010-04-22

  • mkfifo
  • synchronisation of processes using a pipe read X < pipe
  • producer-consumer

(extra 5pt)

 create:
 - a named pipe for synchronisation
 - a shared list for writing/reading tasks - one task per line in a form CLASS:TEXT
 - a producer ( reads tasks from STDIN, appends to the shared list and notifies consumers )
 - a consumer that sleeps for a given time and reassigns the task
   from class X to multiple classes Y e.g. A:abc -> B:abc, B:abc
 - a consumer that writes incoming messages directly to the output file

 Consumers are identified by classes e.g. A, B, C ...
 Consumers either wait for notification or perform the assigned task.
 After being notified, a consumer reads the shared list and picks the first line assigned to its CLASS.
 Then performs the task, checks whether there are more task assigned to its class and
 finally waits for further notification.

 The data flow should look like this:
   Producer -> class A -> classes B,B -> class C -> output file

Lab 10: 2010-04-29

  • ${#VARIABLE} - content size stored in a variable
  • [ " " "<" "$L" ] - check if a variable contains some white-space junk
  • nc -l -p 8000 - listens on port 8000, uses stdin/stdout for TCP communication
  • cat NAMEDPIPE | nc -l -p 8000 | ( read or write to NAMEDPIPE )
 Create a simple HTTP server:

 REQUEST: use a normal pipe for reading the request from the client
   the client sends first line e.g.
     GET /someimage.png HTTP/1.1
   then few lines containing headers e.g.
     Host: localhost:9000
     Connection: keep-alive
     Cache-Control: max-age=0
     Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
   and finally an empty line.

 RESPONSE: use a named pipe for sending response to the client
   server responds with
   HTTP/1.0 200 OK
   Server: My bash HTTP server
   Content-length: <size of the data>
   <newline>
   <data>

Homework:

  • (5pt) Finish the HTTP server and implement the following features
    • Add support for executable files that provide dynamic content
    • Implement a fallback mechanism that executes index.sh or index.html in case the request evaluates to a directory
    • Implement an error page
    • Based on request parameters set environment variables for the script executed e.g. /test.sh?A=1&B=2 should execute the test.sh script with two environment variables A=1 and B=2
    • You can download more testing data from here.
    • Note: you don't have to solve the concurrency problem of netcat (if multiple HTTP requests aren't working properly)
    • Who fixes the concurrency problem gets extra (5pt)

Lab 11: 2010-05-06

Lab 12: 2010-05-13

soubor emp.data (1. sloupec je jméno zaměstnance, 2. je hodinová mzda, 3. je počet odpracovaných hodin; soubor převzat z klasické knihy "Aho, Kernighan, Weinberger: The AWK Programming Language")

  (a1) základní informace, historie, struktura AWK programu, patterns/actions,
  datové typy, BEGIN/END, pole ve vstupních řádkách, built-in proměnné,
  syntaxní analýza programu, ...

  (a2) vypište soubor emp.data
  ## awk '{ print }' emp.data


  (a3) vypište jména a celkový plat těch zaměstnanců, kteří odpracovali
  alespoň jednu hodinu. Jako další příklad pak vypište naopak jen jména
  zaměstnanců, kteří nepracovali.
  ## awk '$3 > 0 { print $1, $2 * $3 }' emp.data
  ## awk '$3 == 0 { print $1 }' emp.data


  (a4) pro každého zaměstnance vypište počet slov na příslušné řádce (zde tedy
  budou čísla stejná). Příklad na built-in proměnnou "NF".
  ## awk '{ print $1, NF }' emp.data


  (a5) před každou řádku vypište pořadové číslo řádky
  ## awk '{ print NR, $0 }' emp.data


  (a6) pro každého zaměstnance vypište řádku, jejíž formát bude tento:
  total pay for Katy is 40
  ## awk '{ print "total pay for", $1, "is", $2 * $3 }' emp.data


  (a7) vypište takové zaměstnance společně s výší mzdy, kteří si vydělali více
  než $50
  ## awk '$2 * $3 > 50 { print $1, $2 * $3 }' emp.data


  (a8) vypište řádky souboru emp.data, ale před první řádek vypište tuto
  hlavičku a za ní prázdný řádek:
  NAME    RATE    HOURS
  ## awk 'BEGIN { print "NAME\tRATE\tHOURS"; print "" }; {print}' emp.data


  (a9) vypište pouze jednu řádku, na které budou jména všech zaměstanců; toto
  je příklad na spojování řetězců.
  ## awk '{ employees = employees $1 " " } END { print employees }' emp.data


  (a10) vypište poslední řádku souboru emp.data. Příklad na to, že "$0" jako
  proměnná obsahující celou řádku si neponechá svojí hodnotu pro END podmínku.
  ## awk '{ line = $0 }; END { print line }' emp.data


  (a11) vypište průměrný plat zaměstnanců, kteří vydělávají více než $6 za
  hodinu. Stejný příklad pak zkuste pro ty, co vydělávají více než $4 za
  hodinu. Použijte "if" příkaz pro ochranu dělení nulou.
  ## awk '$2 > 6 { ++n; sum += $2 * $3 }
  ##      END    { if (n == 0)
  ##                 print "no such employee"
  ##               else
  ##                 print sum / n
  ##             }' emp.data


  #---------------------------------------------------------------------------
  # AWK 2
  #---------------------------------------------------------------------------

  (a12) vypište UID/GID z /etc/passwd tak, že bude mezi nimi znak '_'.
  Použijte "FS" pro čtení, "next" pro ignorování komentářů.
  ## awk 'BEGIN { FS=":" }; { if (/^#/) next; print $3 "_" $4 }' /etc/passwd


  (a13) vypište ty řádky z /etc/passwd, které obsahují slovo "Jan"
  ## awk '/Jan/' /etc/passwd


  (a14) obraťte řádky souboru
  ## awk '{ lines[NR]=$0}; END { for (i=NR; i > 0; --i) print lines[i] }'


  (a15) vypište ty řádky /etc/passwd, které jsou mezi řádky obsahující slova
  Friedel a Pechanec.
  ## awk '/Friedel/, /Pechanec/' /etc/passwd


  (a16) pomocí split(,,) zpracujte soubor /etc/passwd tak, že vypíšete login a
  jméno uživatele
  ## awk '/^[^#]/ { split($0, ln, ":"); print ln[1], ln[5] }' /etc/passwd


  (a17) pro každý řádek souboru vypište jeho pole každé na vlastní řádek.
  Použijte "while".
  ## awk '{ i=1; while (i <= NF) { print $i; ++i} }'