CPS 444/544 Lecture notes: Filters



Coverage: [UPE] Chapter 4, §4.2 (pp. 106-108)


tr (ansliterate)

  • only reads from standard input
  • syntax: tr <string1> <string2>
  • converts characters in <string1> to those, respectively, in <string2>
  • tr A-Z a-z < myfile
  • options:
    • tr -d (delete character(s) in <string1>)
    • tr -c (act on complement of <string1>)
    • tr -s (squeeze strings of repeated characters)


sort

  • can be fine-tuned to sort columns in a variety of ways
  • sort -n (numeric-sort: compare according to string numerical value)
  • sort -g (general-numeric-sort: compare according to general numerical value)
  • sort -r (reverse sort: reverse the result of comparisons)
  • sort -rn (reverse numeric-sort)
  • sort -d (dictionary order: consider only blanks and alphanumeric characters)
  • sort -b (ignore leading blanks)
  • sort -f (ignore-case: fold lower case to upper case characters)
  • sort -k=2 (sort on column 2)
  • sort -t":" -k=2 (sort on column 2 using colon delimited columns)


uniq

  • purges duplicate consecutive lines (must be adjacent)
  • fast (linear time)
  • options:
    • uniq -d (only prints the lines which are repeated)
    • uniq -u (only prints the lines which are not repeated)
    • uniq -c (count)
  • hello
    hi
    hi
    hello
    
    exercise: give output of following command lines on above input stream:
    uniq
    uniq -u
    uniq -d
    uniq -c
    
  • to purge duplicates, first sort and then apply uniq, e.g., sort name | uniq = sort -u names


Spellers

  • spell
  • ispell (interactive spell)
  • aspell
  • add following line to your .vimrc to invoke aspell on the current file in vim using <ctrl-t>:
    map ^T <CR>:!aspell --dont-backup check %<CR>:e! %<CR>


Pipeline of filters

    (recall UNIX model of computation;
    communication mechanism setup for free by the shell)
    $ spell uist2003.tex | sort | uniq
    $ spell uist2003.tex | sort | uniq | wc -l
    $ spell uist2003.tex | sort -u
    $ spell uist2003.tex | sort -u | wc -l
    $ detex 20100115/20100115.tex | nroff
    


cut and paste

  • extract or merge fields or columns from lines
  • $ who | cut -d" " -f1 | paste - -
  • join (relational database operator)


File comparison utilities

  • comm
    • syntax: comm <file1> <file2>
    • meaningful if <file1> and <file2> are sorted
    • merges 2 files and prints each line in one of 3 columns
      1. line(s) only in <file1>
      2. line(s) only in <file2>
      3. line(s) in both <file1> and <file2>
    • an apple
                  cat        both ideas
                  dog
      elephants
      
    • options: which columns to suppress

  • cmp

  • diff (find and output differences between two files or two directories)
    $ diff file1 file2
    $ diff dir1 dir2
    
  • sdiff


Printing utilities

  • script
  • lpr
  • lpd
  • lpq
  • a2ps (ascii to postscript)
  • enscript
  • nenscript
  • ghostview
  • gv
  • ggv
  • xpdf
  • acroread
  • ps2pdf
  • pdf2ps
  • latex
  • dvips
  • troff
  • nroff
  • expand (converts tabs to spaces)
  • unexpand
  • iconv
  • indent (a pretty printer)
    $ cat .indent.pro # resource file for indent
    -br -nce -cdw -npcs -ncs -bs -brs -brf -i3
    
  • dos2unix
  • unix2dos
  • xfig
  • ppds


References

    [UPE] B.W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice Hall, Upper Saddle River, NJ, Second edition, 1984.

Return Home