CPS 444/544 Lecture notes: sed (stream editor)



Coverage: [UPE] Chapter 4, §4.3 (pp. 108-114)


ex (line editor)

  • vi is close to a full programming language because of its use of ex
  • a masterpiece in user-interface software design
  • approaches to studying: memorize commands or learn/know general syntax
  • general syntax of ex commands: :[address]command[options]
  • deleting all blank lines:
    • :g/^$/d
    • grep -v '^$'
  • example addresses
    • 10,20 (lines 10 thru 20)
    • .,100 (current line thru line 100)
    • .,$ (current line thru last line of file)
    • % = 1,$
  • :set list (display each TAB as ^Is and EOLs as $)
  • :set nolist
  • search and replace: :%s/RE/replacement_text/g (same as 1,$s/RE/replacement_text/g); examples:
    • :%s/Alice/Lucy/g (the g makes it global, i.e., replace all occurrences, not just the first, on each line)

    • %s/hello/& world/g (& represents the matched text)

    • :%s/[TAB]/   /g (replaces TABs with 3 consecutive spaces on every line)

    • %s/[ TAB][ TAB]*$// (purges trailing whitespace on every line)

    • :%s/fprintf/FPRINTF/g (replaces all occurrences of fprintf with FPRINTF)

    • :.,$s/fprintf/FPRINTF/g (replaces occurrences of fprintf from the current line (.) to the last line of the file ($) with FPRINTF)

    • :10,20s/fprintf/FPRINTF/g (replaces occurrences of fprintf from line 10 to 20 with FPRINTF)

    • :%s/^\([A-Z][a-z-]*\)[,][ ]\([A-Z][a-z-]*\)$/\2 \1/ (converts names from <last>, <first> format to <first> <last> format)

    • :%s/^\([[:alpha:]]*\)[ ]\([[:alpha:]]*\)$/\2, \1/ (undoes the previous transformation)

  • move text: :100,200m. (moves lines 100 thru 200 to the current line)
  • another example: :10,20w newfile (extracts lines 10 thru 20 and writes them to newfile)


Essential sed

  • (non-interactive) stream editor

  • beginnings of a complete command language
  • execution model for each line in the input stream:
    1. read input line into pattern space,
    2. apply commands to pattern space,
    3. send pattern space to stdout.



  • similar syntax to ex
  • basic syntax: <condition><action>
  • detailed syntax: [<address>[,<address>]][!]<command>[<arguments>]



  • conditions actions
    /RE/ d
    m,n p
    $ q
    <condition>! s/RE/string/
    <condition1>,<condition2>! w <filename>
    i
    a

  • invoking sed
    • sed '<edit commands>' <file(s)>
    • cat <file(s)> | sed '<edit commands>'
    • sed -f <edit commands file> <file(s)>

  • -e option
    • sed -e '{ ... }' file (... represents more than one editing command expression on separate lines, and address space applies to all commands ...)
    • if curly braces ({ }) omitted, put an individual, and possibly distinct, address for each editing command expression



  • use of -n option which suppresses output (step 3 above) (with and without p action or d action)
  • without -n option, p action assumed
  • two examples which produced the same output
    • one with -n: sed -n '/one/p' file
    • one without -n: sed '/one/!d' file
  • sed -n '/<RE>/p' file = grep <RE> file

  • sed is Turing complete


Some representative examples

  • sed 's/[TAB]/   /g' main.c # converts every TAB to three consecutive spaces on every line (will changes take effect in the file main.c?)
  • sed 's/[ TAB][ TAB]*$//' main.c # purges trailing whitespace from each line
  • sed 's/index1/index2/g' main.c # replace string index1 with string index2
  • sed -n '20,30p' file
  • sed '1,10d' file
  • sed '$d' file
  • du -a | sed 's/.*[TAB]//' (ref. [UPE] p. 109)
  • sed 's/^\([A-Z][a-z-]*\)[,][ ]\([A-Z][a-z-]*\)$/\2 \1/' file
  • sed '10,20w newfile' file
  • sed '1,/^$/d' file
  • sed -n '/^$/,/^end/p' file
  • sed 's/^/[TAB]/' file (ref. [UPE] p. 109)
  • sed '/./s/^/[TAB]/' file (ref. [UPE] p. 110)
  • sed '/^$/!s/^/[TAB]/' file (! inverts the condition) (ref. [UPE] p. 110)
  • deleting line(s) which contain the strings one or two
    sed '/one/d
         /two/d' file
    
  • put the editing commands above in a file commands.sed and invoke: sed -f commands.sed <file(s)>


More examples

For the remainder of these notes, consider the following file named faculty.details:
    Name: Mehdi Zargham Office: 139 Anderson Hall Course: ASI 150
    Name: Raghava Gowda Office: 142 Anderson Hall Course: CPS 310
    Name: James P. Buckley Office: 146 Anderson Hall Course: CPS 430/530
    Name: Dale Courte Office: 144 Anderson Hall Course: CPS 387
    Name: Saverio Perugini Office: 145 Anderson Hall Course: CPS 444/544
    Name: Zhongmei Yao Office: 150 Anderson Hall Course: CPS 341
    
Examples:
    sed -n '/CPS/p' faculty.details # same as grep CPS faculty.details
    
    sed '/CPS/!d' faculty.details # same as above
    
    sed -n '/[/]/p' faculty.details # prints lines with a cross-listed course; same as sed -n '/\//p' or grep '\/' faculty.details
    
    sed '/\//d' faculty.details # print lines containing a non-cross-listed course; same as grep -v '\/' faculty.details
    
    sed 's/^Name:[ ]//' faculty.details # removes "Name: " from file faculty.details
    
    sed 's/^Name:[ ]//' faculty.details | sed 's/Office:[ ]//' # removes "Name: " & "Office: " from faculty.details
    
    # how can we purge all attribute labels (i.e., "Name: ", "Office: ", "Course: ")? multiple ways:
    
    sed 's/[A-Za-z][A-Za-z]*: //g' faculty.details
    
    sed 's/[A-Za-z]+: //g' faculty.details # will not work, since sed uses basic regular expressions and not full REs
    
    sed 's/[A-Za-z]\{1,\}: //g' faculty.details
    
    sed 's/^Name:[ ]//' faculty.details | sed 's/Office:[ ]//' | sed 's/Course:[ ]//' # purges all attribute labels
    
    sed 's/^Name:[ ]//;
         s/Office:[ ]//;
         s/Course:[ ]//' faculty.details
    
    cat sedfile
    s/^Name:[ ]//
    s/Office:[ ]//
    s/Course:[ ]//
    
    sed -f sedfile faculty.details
    
    sed 's/^Name:[ ]\(.*\)Office:[ ]\(.*\)Course:[ ]\(.*\)$/\1\2\3/' faculty.details
    
    sed 's/[A-Za-z][A-Za-z]*:[ ]//g' faculty.details
    


d for delete

  • delete lines from the output stream, not original file

  • examples:
    • sed 'd' faculty.details reads in one line at a time into a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
    • sed '1d' faculty.details reads in one line at a time into the buffer, deletes it if it is line 1, and prints the buffer contents onto output (in this case, all lines except 1 would be output)
    • sed '$d' faculty.details does the same, but for the last line
    • sed '2,4d' faculty.details deletes lines from 2 up to and including line 4
    • sed '/Yao/,/ran/d' faculty.details deletes lines starting from one which matches Yao up to and including one which matches ran
    • sed '/Yao/,/ran/!d' faculty.details negates the address (i.e., do not delete these lines, and delete others)


p for print

  • print lines from the buffer

  • examples:
    • sed 'p' faculty.details reads in one line at a time into the buffer and prints each. Notice that by default sed prints what is in the buffer. Therefore, you will get two copies of each line.
    • in sed -n 'p' faculty.details, the -n suppresses the default print action of sed. Therefore, this is the equivalent of doing a cat.
    • we can use the same addressing commands as before (e.g., sed -n 4,6 'p' faculty.details prints lines 4 through 6).


More sed jargon

  • = prints (just) the line number
  • a appends text at the end of the buffer; use it as a\ followed by what you want to append
  • b branches out of pattern matching (i.e., stop attempting to make more matches)


Exercises

Write sed commands/scripts to do the following:
  • delete all blank lines in the file: sed '/^$/d' faculty.details
  • print the lines pertaining to faculty who have offices in Anderson Hall: sed -n '/Anderson Hall/p' faculty.details
  • find the line numbers describing faculty who teach non-cross-listed undergraduate courses: sed -n '/[/]/=' faculty.details
  • You are that Perugini is an assistant professor, and all other professors are associate professors. Print each professor's rank on a separate line, after the given line, in the form Rank: .
      /Perugini/ {
         a\
         Rank: Assistant Professor
         p
         b
      }
      {
         a\
         Rank: Associate Professor
         p
      }
      
    Put the editing commands above in a file rank.f and invoke it as: sed -n -f rank.f faculty.details. Note that the b commands are important, otherwise since the last command is supposed to work for all lines (note the lack of addresses), everybody will also be listed as an assistant professor. Also note that you can append multiple lines, each must be followed by a \ except the last line (observe the \ after the a command). The braces { and } must be where they are (i.e., the { must end the first line and the } must be on a line by itself).

  • print the lines in the format <name>:<office>:<course> (i.e., strip the headers Name: and Office and Course:: sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\1:\2:\3/' faculty.details)
  • print the lines in the format <course>:<office>:<name> sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\3:\2:\1/' faculty.details
  • break down every entry onto three lines: sed 's/Name: \(.*\) Office: \(.*\) Course: \(.*\)/\1\n\2\n\3/' faculty.details


A tale of two buffers

Normally, sed reads one line at a time into its main buffer (sometimes called the pattern buffer). There is another buffer (called the hold buffer) available for use. Some commands to work with this buffer include:
  • h copies the contents of the main buffer into the hold buffer, thus overwriting whatever it was that was already in the hold buffer
  • g copies the contents of the hold buffer into the main buffer, overwriting it
  • H does the same as h, except it appends the contents of the main buffer after the last line in the hold buffer
  • G does the same as g, again in the `append' sense
  • x exchanges contents of the two buffers; what was in hold buffer is now in the pattern buffer, and vice versa; a buffer (work space), deletes it, and prints the contents of the buffer (in this case, empty)
  • N reads in an additional line and appends it to the contents of the pattern buffer; in between the original line and the newly added line, N will insert a newline (\n) character; useful for reading in multiple lines at a time (see flip example below)


More exercises

Write sed commands/scripts to (put the solutions in a separate file, and invoke using sed -n -f option):
  • Suppose the department is moving. move faculty in Anderson Hall to the Science Center and move those in the Science Center to Miriam Hall. Let faculty keep their old office numbers because they believe their numbers are lucky.
      s/Science Center/Miriam Hall/
      s/Anderson Hall/Science Center/
      
    Notice that you have to do the transformations in this order, else everybody gets assigned to Miriam Hall! This example shows that sed reads in one line at a time, applies all the commands sequentially, then picks the next line, and so on. This is in contrast to reading all lines at once, applying the first command, then reading all again, applying the second command, and so on.

  • Pretty print the file so that each line has one line before it describing what it is about (e.g., "The next line is about Zhongmei Yao") before the first line.
      {
         h  # hold buffer now contains what was matched
         s/Name: \(.*\) Office: .* Course: .*/The next line is about: \1/
         G # appends hold buffer to pattern buffer
         p
      }
      
  • Completely capitalize the names of faculty.
      {
          h # save the current line in hold buffer
          s/Name: \(.*\) Office: .* Course: .*/\1/
          y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
          G # current buffer contains a capital name, newline, old line
          s/\(.*\)\nName: \(.*\) Office: \(.*\)/Name: \1 Office: \3/
          p
      }
      
  • Flip alternate lines
      $p
      {
         N # read the next line, we now have two lines
         s/\(.*\)\n\(.*\)/\2\n\1/ # flip the two lines
         p # print it
      }
      
  • Delete all the blank lines.
      /^$/ {
         d
         b
      }
      p
      
  • Replace multiple blank lines wherever they occur with just one blank line.
      /^$/ {
         N
         /^\n$/D
      }
      p
      
    Notice that this uses a new command, namely D. D is just like d, it deletes the contents of the pattern (main) buffer. However, while d deletes the entire buffer, D deletes only until the first embedded newline.


References

    [UPE] B.W. Kernighan and R. Pike. The UNIX Programming Environment. Prentice Hall, Upper Saddle River, NJ, Second edition, 1984.

Return Home