Main

Introduction to UNIX (2010/2011)

Annotation from SIS:
This course would give information and practice in essentials of UNIX family operating systems, mainly from the user point of view. After finishing this course, students are to be able to write a mid-range unix shell program.

14:00 Wed, SU2

NSWI095 - Summer semester 2010/2011

Information for Czech students

Konzultácie

  • po dohode e-mailom, Malá Strana, druhé poschodnie, dvere č. 204

Podmienky pre udelenie zápočtu

  • Prvou podmienkou je zaradenie do skupiny - Streda 14:00, SU2 (limit je 15 študentov)
  • Druhou podmienkou je prítomnosť na cvičeniach, maximálne 3 absencie. Po prekročení 3 absencii, dostane študent speciálnu domácu úlohu.
  • Treťou podmienkou je vypracovanie drobných domácich úloh.
  • Nepodceňujte praktickú prípravu, ktorá je nevyhnutná pre úspešné absolvovanie skúškovej písomky!

For more information see: http://is.cuni.cz/eng/studium/predmety/index.php?do=predmet&kod=NSWI095

Lab 1: 2011-02-23

  • Try it out:
    • UNIX on your own computer by running some Live CD e.g. Ubuntu Linux
    • You can also login into one of the u-plX machines in lab e.g. u-pl0.ms.mff.cuni.cz, use PUTTY on Windows, ssh on Linux
  • Introduction to UNIX:
    • There are many UNIX flavours
    • What is a shell, there are many different shells (sh, ash, bash ...) with various features (keys, history ...)
    • Useful shortcuts in some shells (e.g. bash):
      • CTRL+C = terminates the running process
      • CTRL+Z = suspends the running process (fg continue, bg continue in background)
      • CTRL+D = closes all file descriptors (see below what a file descriptor is)
      • CTRL+L = clear screen
      • CTRL+S = suspend printing to the terminal, CTRL-Q continue printing
      • CTRL+R (history search)
    • Through shell you can run commands (small programs like ls or large applications like firefox)
    • When executing commands from shell, you are actually starting processes. When executed, a program runs as a "foreground process" (the shell is temporarily suspended until the process terminates)
    • Background processes can be started directly from shell using "&", e.g. xterm &. In that case, the process is running in parallel to the shell and you can immediatelly continue executing other commands.
    • Processes usually communicate through file descriptors = imagine channels numbered 0,1,2 ...
    • File descriptor 0 = STDIN = Standard Input = by default shell redirects the input from the terminal
    • File descriptor 1 = STDOUT = Standard Output = by default shell redirects data from this descriptor to the terminal output, that's why you can actually see some output :-)
    • File descriptors can also represent files
    • You can use "<" to redirect content of a file to STDIN e.g. grep root < /etc/passwd
    • You can use ">" to redirect STDOUT to a file e.g. cat /etc/passwd > mytmpfile
    • You can combine (chain) multiple processes to perform complex tasks using "pipes" e.g. cat /etc/passwd | grep qmail | grep 200 | head 3
    • When piping, there are actually multiple processes running at the same time. The operating system ensures synchronisation among the communicating processes. A process waits before reading from its STDIN until other process writes something to its STDOUT (and vice versa)
    • You can also send signals to a running process:
      • kill -9 1234 - Kills a process with process ID 1234 (PID) even if not responding (last resort)
      • kill -l - prints a list of all available signals
      • ps aux - shows a list of all running processes with their PIDs and USERs
  • Useful apps:
    • mc - Midnight Commander (available for some UNIXes, especially Linux) - to ease your day-to-day work
    • top or htop - interactively shows CPU and memory usage statistics and list of processes
    • vi or vim - text editor (Some love it, some hate it. You have to know about it anyway.)
    • When running a graphical desktop in Lab, e.g. KDE or GNOME, you can run one of the terminal emulators (shell in a window) like konsole or xterm. A useful shortcut in KDE is ALT-F2 - a fast application launcher.
  • Commands mentioned today:
    • ls, mkdir, cp, pwd, echo, :, fg, bg, cd, cat, head, tail, less, more, kill, touch

Tasks:

  • Print content of /etc/passwd
  • Print first 10 lines from /etc/passwd and write it to a file called "p"
  • Append lines 5..10 from /etc/passwd to "p"
  • Print content of "p" with line numbers
  • copy "p" to "p2"
  • create a new dir "d"
  • copy "p" and "p2" to "d"
  • create empty file "e"
  • clear content of "p2"

Lab 2: 2011-03-02 (by Pasky)

  • vim (vimtutor), simpler alternatives mcedit, joe, nano
  • wc, `wc -l`, `wc -c` vs `wc -m`; `cat file | wc -l` vs `wc -l file`
  • basic directory structure, hidden files, glob basics (*, ?), ls -d, ls -a, ls -l, `ls` vs `ls | cat`
  • mail: sending mail, `mail` vs `cat | mail`, reading mail and ~/.forward
  • escaping and quoting: \, ''
  • ssh: basic usage, -X, remote commands, ssh in pipeline (ssh cat for copying files from/to local computer), mentioned: scp, rsync
  • multi-user features: who, id, finger, write, talk, wall

Lab 3: 2011-03-09

  • chmod, ln, df, paste -s -d, cut -d -f
  • How to write a shell script
    • chmod +x script.sh
    • #!/bin/sh
  • variables in shell $VAR, $1
  • Shell script that creates shell scripts given on command line and runs them
  • Create a shell script for downloading files through ssh (without using scp or sftp tools)
  • Modify the script above to upload files through ssh
  • extract usernames from /etc/passwd, extract uids from /etc/passwd, create a file in format 'uid_username'

Lab 4: 2011-03-16

  • tempfile
  • find -name -type -exec
  • xargs -n1 -I{}
  • sort -k(key) -n(numbers) -r(reverse) -u(unique) -t(delimiter)
  • uniq -c(count occurences) -d(only duplicates) -i(ignore case)
  • join -1 -2 -v -j -t -o 1.1 ...
    • INNER JOIN: join
    • OUTER LEFT JOIN: join -a 1
    • OUTER RIGHT JOIN: join -a 2
    • INVERTED JOIN: join -v
    • remember that the lists should be sorted before they can be joined. Usually, you will use sort -tDELIMITER -kCOLUMN,COLUMN
  • Create a script using join, sort, cut:
    • the script first creates a file containing /etc/passwd sorted by gid (column #4)
    • then a file containing /etc/group sorted by gid (column #3)
    • now the script should find a group for every user (using INNER JOIN on the gid column) and creates a file containing only groupname:username
    • there is a file /afs/ms.mff.cuni.cz/u/s/simkv0bm/table.txt containing username:description on every line
    • the script should find the description for users listed in the table.txt and should print groupname:username:description
    1. !/bin/sh
    PWD=`tempfile` GRP=`tempfile` PG=`tempfile` STAB=`tempfile`
    1. create files sorted by GID
    sort -k4,4 -t: /etc/passwd > "$PWD" sort -k3,3 -t: /etc/group > "$GRP"
    1. join on the GID column, print only username and groupname, sort by username
    join -1 4 -2 3 -t: -o 2.1 1.1 "$PWD" "$GRP" | sort -t: -k2,2 > "$PG"
    1. use OUTER JOIN to add description about some users
    sort -k1,1 table.txt > "$STAB" join -a1 -t: -1 2 -2 1 -o 1.1 1.2 2.2 "$PG" "$STAB"
    1. cleanup
    rm "$PWD" "$GRP" "$PG" "$STAB"

Lab 5: 2011-03-23

  • Introduction to regular expressions man pcresyntax man grep POSIX CHARACTER CLASSES

grep

  • grep -E -F -P = different kinds of regular expressions (basic regex requires '\' prefixes for special characters)
  • grep -e '...' -e '...' = multiple patterns
  • grep -i -v = case insensitive matching, inverted matching
  • grep -l -L -m 1 = just a list of files with/without match, continue continue after m matches
  • grep -n -c = prefix with line number, count number of matching lines from the file
  • grep -r --include --exclude = search recursively and include/exclude files mathing GLOB pattern
 Z rfc793.txt" spočítejte, kolik
 řádků obsahuje slovo "packet" (přesně v tomto tvaru, bez uvozovek).
 Výsledek má vyjít 15. RFC je na webu ke stažení.
 http://www.rfc-editor.org/rfc/rfc793.txt

 ve stejném souboru spočítejte řádky, které obsahují slovo
 "packet" NEBO slovo "network", přesně v těchto tvarech.
 Výsledek je 48.

 Spočítejte řádky z rfc793, které obsahují řetězce číslic, které jsou
 alespoň délky 3 a ve kterých jsou všechny číslice STEJNÉ. Tedy "222"
 je ten správný řetězec, "2223" také, ale "22445566" ne.
 Výsledek je 6.
  • Create a shell script that processes a list of e-mail addresses.
    • Your script should expect some parameters. First parameter is a commnad the rest are optional params.
    • Your script will take STDIN as input and filter valid e-mail addresses.
  • Example of an input text: bla bla myaddr@host.domain bla bla myotheraddr@host.domain bla
  • For every valid e-mail address the given command with optional parameters will be executed with the e-mail address as the last parameter
  • Example: $ echo -e 'bla\na@ab.ab\nbla\nbb@bb.bb' | yourscript.sh echo "FOUND: " FOUND: a@aaa.com FOUND: b@bbb.com

Solution:

  #!/bin/sh
  grep -E '^[-[:alnum:]_]+(\.[-[:alnum:]_]+)*@([-[:alnum:]]+\.)+[a-z]{2,6}$'|xargs -n1 "$@"
  • Find java files from /usr/share/doc that contain string "test". Exclude files ending with "Test.java" substring.

Lab 6: 2011-03-30

  • sed
    • t, s, !, d, p,
 vymažte všechny řádky obsahující řetězec "bash" (soubor /etc/passwd)
 sed -e '/bash/d'

 nalezněte jiný způsob řešení předchozího příkladu, s použitím option
 "-n" sedu.
 Hint: najděte si v manu "!"
 sed -ne '/bash/!p' /etc/passwd 

 vymažte prázdné řádky
 sed '/^$/d' /etc/passwd 

 před každý řádek (tj. na samostatný řádek) vypište jeho pořadové číslo
 sed -e '=' /etc/passwd

 pomocí sedu a ničeho jiného spočti radky souboru /etc/passwd
 sed -n '$=' /etc/passwd                                                                                                                 

 vypište poslední řádku
 Hint: vyberte poslední řádku, všechny ostatní nevybrané řádky
 smažte, opět viz "!"
 sed '$!d' /etc/passwd

 očíslujte řádky /etc/passwd tak, že za číslem bude :, pak
 tabelátor a poté původní řádek
 getent passwd | sed = | sed 'N;s/\n/:\t/'

 chytejte řádky obsahující řetězec "nologin" do hold space a
 ten vypište v případě, že narazíte na řetězec "tcsh". Nic jiného než
 hold space nechci vypisovat.
 Hint: sed funkce "H" a "g"
 sed -e '/nologin/H; /tcsh/!d; /tcsh/g' /etc/passwd

 if a line ends with a backslash, append the next line to it
 sed -e ':a;/\\$/N; s/\\\n//; ta'

 reverse order of lines (emulates "tac")
 sed '1!G;h;$!d'               # method 1
 sed -n '1!G;h;$p'             # method 2
  • Download file Attach:unixintro-example-actions.txt
    • print all texts from actions
    • print actions with minor priority
    • print all users (closer,who,creator)
    • print action that are older than a specific date

Lab 7: 2011-04-06

  • useful shell constructs:
    • for, while, if, seq
    • shell functions
    • $$, kill, trap, sleep, read
 write a script that can handle signals SIGHUP and SIGINT (by printing some message to stdout)
 when started, your script writes its Process ID to the stdout
 your script should wait in an infinite loop
  • flock - advisory locking
 write a script that tries to acquire an exclusive lock on a lock-file,
 writes its PID and sleeps for 1 second, then releases the lock
 and repeats the process again in an infinite loop.
 run 3 instances of the same script and observe the mutual exclusion.

 # a useful construct taken from man flock:
 # this prevents the script from keeping the lock indefinitely
 # the file descriptor of a lock file is closed after leaving the critical section
 (
   flock -s 200
   # ... your critical section here ...
 ) 200 > yourlockfile

Lab 8: 2011-04-13

Overview:

  • Create a set of cooperating scripts: scanner, worker and query
  • The goal is to automatically parse files containing actions (see Attach:unixintro-example-actions.txt) and to fill a database which can then be queried manually
  • You can use sed, join, sort, flock, ls, tempfile, rm

Locks:

  • Your application will use two lock files: dir and querylock
  • The workers and scanner will share the dir as an input queue and also as a file lock.
  • The workers and query will share the querylock for mutual exclusion when working with two database tables - actionmap and creatormap

Scripts:

  • scanner will try to acquire the dir lock
    • After successfully acquiring the lock
      • It repeatedly scans directory (sleeping 1s between scans).
      • If a file appears in the directory, the scanner releases the lock and repeats the whole procedure.
  • worker will also try to acquire the dir lock.
    • After successfully acquiring the lock:
      • If there are no files in the directory then releases the lock, sleeps for 1s and repeats the whole procedure
      • If there are some files in the directory the script removes the first file (moving it to a private location) and releases the dir lock so that other processes can continue.
  • Then tries to acquire the querylock.
    • After acquiring the querylock it extracts from the input file the uid (unique identifier of an action), creator and actiontext and updates both database tables mentioned earlier:
      • the table actionmap contains two columns - uid:actiontext
      • the table creatormap also contains two columns - uid:creator
    • It should be noted that both tables should always contain unique records sorted by the uid column. (see man join)
    • After updating the database the querylock is released.
  • query is a script run by the user.
    • accepts a single parameter creator
    • when executed, it tries to acquire the querylock and then list all actions (the column actiontext) that belong to the given creator

How to test the whole setup:

  • Run the scanner and 3 instances of worker as a background processes (logging messages to the same console)
  • Try to copy several files containing actions to the dir directory.
    • The files should also contain duplicate actions in order to test whether the database always contains unique records.
  • Observe, how the files are disappearing from dir and how the database files are filled by the extracted records.
  • Use the query script to list actions from a given creator

Lab 9: 2011-04-20

 time - run programs and summarize system resource usage
        you might want to use the /usr/bin/time instead of the build-in time
        command

In this assignment, you should get a hands-on experience with different techniques for iterating over collections. You will measure the performance of every approach to get a clear picture of their efficiency. You should create several scripts that will list 4000 files from /usr/share/doc/ directory, every filename will be prefixed with FILE:

  • script #1: while iteration, shell variable counter, echo printer
  • script #2: while iteration, head counter, echo printer
  • script #3: while iteration, head counter, /bin/echo printer
  • script #4: find -printf 'FILE: %p\n' iteration, head counter
  • script #5: find + xargs iteration, head counter, echo printer
  • script #6: use find, sed and head
  • Now write the measurement script that:
    • executes all the previous scripts N-times,
    • computes the average of sys+user time and
    • sorts the scripts accordingly.
  • Notes:
    • "head counter" means that you should limit the number of lines using the head command.
    • "shell var. counter" means that you should use a shell variable, eval and [ (test) utils.

Lab 10: 2011-04-27

soubor emp.data Δ (1. sloupec je jméno zaměstnance, 2. je hodinová mzda, 3. je počet odpracovaných hodin; soubor převzat z klasické knihy "Aho, Kernighan, Weinberger: The AWK Programming Language")

  (a1) základní informace, historie, struktura AWK programu, patterns/actions,
  datové typy, BEGIN/END, pole ve vstupních řádkách, built-in proměnné,
  syntaxní analýza programu, ...

  (a2) vypište soubor emp.data
  ## awk '{ print }' emp.data

  (a3) vypište jména a celkový plat těch zaměstnanců, kteří odpracovali
  alespoň jednu hodinu. Jako další příklad pak vypište naopak jen jména
  zaměstnanců, kteří nepracovali.
  ## awk '$3 > 0 { print $1, $2 * $3 }' emp.data
  ## awk '$3 == 0 { print $1 }' emp.data


  (a4) pro každého zaměstnance vypište počet slov na příslušné řádce (zde tedy
  budou čísla stejná). Příklad na built-in proměnnou "NF".
  ## awk '{ print $1, NF }' emp.data

  (a5) před každou řádku vypište pořadové číslo řádky
  ## awk '{ print NR, $0 }' emp.data

  (a6) pro každého zaměstnance vypište řádku, jejíž formát bude tento:
  total pay for Katy is 40
  ## awk '{ print "total pay for", $1, "is", $2 * $3 }' emp.data

  (a7) vypište takové zaměstnance společně s výší mzdy, kteří si vydělali více
  než $50
  ## awk '$2 * $3 > 50 { print $1, $2 * $3 }' emp.data

  (a8) vypište řádky souboru emp.data, ale před první řádek vypište tuto
  hlavičku a za ní prázdný řádek:
  NAME    RATE    HOURS
  ## awk 'BEGIN { print "NAME\tRATE\tHOURS"; print "" }; {print}' emp.data

  (a9) vypište pouze jednu řádku, na které budou jména všech zaměstanců; toto
  je příklad na spojování řetězců.
  ## awk '{ employees = employees $1 " " } END { print employees }' emp.data

  (a10) vypište poslední řádku souboru emp.data. Příklad na to, že "$0" jako
  proměnná obsahující celou řádku si neponechá svojí hodnotu pro END podmínku.
  ## awk '{ line = $0 }; END { print line }' emp.data

  (a11) vypište průměrný plat zaměstnanců, kteří vydělávají více než $6 za
  hodinu. Stejný příklad pak zkuste pro ty, co vydělávají více než $4 za
  hodinu. Použijte "if" příkaz pro ochranu dělení nulou.
  ## awk '$2 > 6 { ++n; sum += $2 * $3 }
  ##      END    { if (n == 0)
  ##                 print "no such employee"
  ##               else
  ##                 print sum / n
  ##             }' emp.data

  #---------------------------------------------------------------------------
  # AWK 2
  #---------------------------------------------------------------------------

  (a12) vypište UID/GID z /etc/passwd tak, že bude mezi nimi znak '_'.
  Použijte "FS" pro čtení, "next" pro ignorování komentářů.
  ## awk 'BEGIN { FS=":" }; { if (/^#/) next; print $3 "_" $4 }' /etc/passwd

  (a13) vypište ty řádky z /etc/passwd, které obsahují slovo "Jan"
  ## awk '/Jan/' /etc/passwd

  (a14) obraťte řádky souboru
  ## awk '{ lines[NR]=$0}; END { for (i=NR; i > 0; --i) print lines[i] }'

  (a15) vypište ty řádky /etc/passwd, které jsou mezi řádky obsahující slova
  Friedel a Pechanec.
  ## awk '/Friedel/, /Pechanec/' /etc/passwd

  (a16) pomocí split(,,) zpracujte soubor /etc/passwd tak, že vypíšete login a
  jméno uživatele
  ## awk '/^[^#]/ { split($0, ln, ":"); print ln[1], ln[5] }' /etc/passwd

  (a17) pro každý řádek souboru vypište jeho pole každé na vlastní řádek.
  Použijte "while".
  ## awk '{ i=1; while (i <= NF) { print $i; ++i} }'

Lab 11: 2011-05-04

Lab 12: 2011-05-11

  • finishing previous tasks