Text copied from http://www.geocities.com/mparker762/clawk
This package implements nearly all the features provided by the Unix AWK language, albeit with a pretty lisp-y flavor. In addition, it provides a large set of macros for doing more complicated processing beyond that provided by AWK.
To give you an idea of the flavor of CLAWK programming, here's an example from Ch 1 of The Awk Programming Language that calculates the number of employees, total pay for all employees, and average pay.
{ pay = pay + $2 * $3 }
END { print NR, "employees"
print "total pay is", pay
print "average pay is", pay/NR
}
The book gives the text file "emp.data" as example input.
Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Suzie 4.25 18
When the awk program (called say, SAMPLE.AWK) is executed
with the command
$ awk -f sample.awk
emp.data
it produces the following output:
6 employees total pay is 337.5 average pay is 56.25
Here's a straightforward translation of SAMPLE.AWK using the CLAWK package:
(defawk sample (&aux (pay 0))
(t (incf pay ($* $2 $3)))
(END ($print *NR* "employees")
($print "total pay is" pay)
($print "average pay is " (/ pay *NR*))) )
When this function is executed with (sample "emp.data"), it produces the same output:
6 employees total pay is 337.5 average pay is 56.25 NIL
(the NIL is the return value from the SAMPLE function).
That's nearly as concise as the original AWK. Here's a version that doesn't use the DEFAWK macro, and tries to be a bit more Lispish.
(defun sample (filename &aux (pay 0))
(do-file-fields (filename (name payrate hrsworked))
(declare (ignore name))
(incf pay (* (num payrate) (num hrsworked))))
(format t "~%~D employees" *NR*)
(format t "~%total pay is ~F" pay)
(format t "~%average pay is ~F" (/ pay *NR*)) )
... which prints out the same thing as above.
Here's another example from The AWK Programming Language that shows off the regular expressions a little better. This program performs some basic validity checks on a Unix password file.
BEGIN {
FS = ";" }
NF != 7 {
printf("line %d, does not have 7 fields: %s\n", NR, $0)}
$1 ~ /[^A-Za-z0-9]/ {
printf("line %d, nonalphanumeric user id: %s\n", NR, $0)}
$2 == "" {
printf("line %d, no password: %s\n", NR, $0)}
$3 ~ /[^0-9]/ {
printf("line %d, nonnumeric user id: %s\n", NR, $0)}
$4 ~ /[^0-9]/ {
printf("line %d, nonnumeric group id: %s\n", NR, $0)}
$6 !~ /^\// {
printf("line %d, invalid login directory: %s\n", NR, $0)}
And this is the same program using CLAWK:
(defawk checkpass ()
(BEGIN
(setq *FS* ";"))
((/= *NF* 7)
(format t "~%line ~D, does not have 7 fields: ~S" *NR* $0))
((~ $1 #/[^A-Za-z0-9]/)
(format t "~%line ~D, nonalphanumeric user id: ~S" *NR* $0))
(($== $2 "")
(format t "~%line ~D, no password: ~S" *NR* $0))
((~ $3 #/[^0-9]/)
(format t "~%line ~D, nonnumeric user id: ~S" *NR* $0))
((~ $4 #/[^0-9]/)
(format t "~%line ~D, nonnumeric group id: ~S" *NR* $0))
((!~ $6 #/^\//)
(format t "~%line ~D, invalid login directory: ~S" *NR* $0)) )
Which you must admit is awfully close to the original AWK. Since this uses the #/ readmacro you must evaluate (clawk:install-regex-syntax) before evaluating this sample.
The INSTALL-REGEX-SYNTAX function sets up a readmacro on #\/ that parses to an unescaped #\/, and interns the resulting string in the current package. This (a) gives a nice AWK-ish syntax (b) results in an object that can be printed readably and interacts well in the listener even without Dynamic Windows, and (c) gives CLAWK a convenient place to hook the compiled matcher and some associated information, speeding things up nicely. You can still specify these things using the |...| syntax, but it's not quite as nice, since #\| is a common character in regex patterns.
The INSTALL-CMD-SYNTAX function sets up a readmacro on #` that parses to an unescaped ` character, and expands into a call to CLAWK::CALL-SYSTEM-CATCHING-OUTPUT. This function sends the command to the shell and returns an input stream containing the output of the command. This stream can then be used by CLAWK:FOR-STREAM-LINES or any of the CL stream functions. The readmacro is defined for all non-Symbolics systems, although currently only LispWorks is supported by CALL-SYSTEM-CATCHING-OUTPUT.
The various special variables *FS*, *NR*, *NF*, *RSTART*, *RLENGTH*, etc are fairly obvious if you've ever used AWK, as are the SUB, GSUB, SPLIT, INDEX, MATCH, SUBSTR, and ~ and !~ functions. $SUB, $GSUB, $SPLIT, $INDEX, $MATCH, and $SUBSTR do the same, but they attempt to coerce the various parameters to the right type (so you can pass in a string for a parameter that expects a number).
WITH-PATTERNS is a macro that cleans up the syntax for using dynamically-generated regex patterns. A future extension is for this to use the fast regex compiler instead of the closure-based regex compiler when the compiler is available and the pattern is a string literal.
WITH-FIELDS is a macro for doing line-splitting into variables. WITH-FIELDS takes a optional field list (in destructuring-bind syntax) and an optional string (default is the current line -- see DO-LINES) and field-separator (default is *FS*). If you don't give a set of variables to use, it will use the $ variables. It splits the line using SPLIT, and executes the body forms in an environment containing the variables. The $n variables are locally bound special, so they will revert on exit.
WITH-SUBMATCHES is a macro for processing register submatches. It takes a list of variables and a set of body forms, and binds the variables to the strings corresponding to the register submatches (or nil if the register didn't match), and evaluates the body forms in this environment. This is particularly handy for use in the consequent clauses of match-case.
MATCH-CASE is similar to CASE except that the value must be a string, and the cases are strings or regex symbols. Handy for simulating the implicit outer match structure of an AWK program, without being limited to just the toplevel. I tend to find that my serious AWK programs pretty quickly devolve into a BEGIN clause that calls a bunch of functions to do the real work, and this macro (along with the next few) really make this cleaner; allowing me to reuse that nifty auto-looping / splitting / matching mechanism wherever I need it.
MATCH-WHEN takes a set of forms similar to the toplevel CLAWK forms (with BEGIN, END, and other pattern clauses), but unlike the toplevel it does no looping. It executes all the BEGIN clauses first, then conditionally executes each of the pattern clauses, then executes all the END clauses. It winds up looking a lot like COND, except that it recognizes several special flavors of test expression, and is basically equivalent to
(progn (when test1 . consequent1)
(when test2 . consequent2)
...)
As with real AWK, the default action is to print the current line.
DO-FILE-LINES is a macro that takes a filename and a set of body forms. It opens the file in read mode, and loops over the lines in the file binding the line variable to the current line and executing the body forms. It doesn't do any line splitting, though it maintain the *NR*, *FNR*, and $0 variables. Executing (NEXT) within the body will restart the body at the next input line.
DO-STREAM-LINES is a closely related version that takes a stream (or t for the standard input stream).
DO-FILE-FIELDS is a macro that takes a filename, an optional field-list, and a set of body forms. It opens the file in read mode, and loops over the lines in the file splitting them into the specified field variables (or the $n variables if no field list was given), and executes the body forms in this new environment. Because of the splitting overhead, it's not as snappy as FOR-FILE-LINES. Executing (NEXT) within the body will restart the processing at the next input line.
DO-STREAM-FIELDS is a closely related version that takes a stream (or t for the standard input stream).
WHEN-STREAM-FIELDS and WHEN-FILE-FIELDS implement most of the CLAWK toplevel as a reusable special form. It takes an input stream (or file) and an optional field list, and a set of MATCH-WHEN clauses. The BEGIN clauses will be executed before the file is opened, the pattern clauses will be executed as with MATCH-WHEN, and the END clauses will executed after the file is closed, and the body. One limitation is it doesn't currently support range patterns (it's pretty simple to add, I just haven't gotten around to it).
DEFAWK is the highest-level macro, and one that closely mimics the awk toplevel. It takes a name, a parameter list consisting of &key and &aux parameters, and a set of MATCH-WHEN clauses. It binds the parameter list to the variable ARGS, and evaluates the BEGIN clauses. The BEGIN clauses are free to modify ARGS. Then it loops through the (possibly modified) ARGS, executing the pattern clauses for each stream and pathname in the list. It handles docstrings correctly.
The handling of the parameter list is perhaps the oddest thing about the DEFAWK macro. It is intended to closely match the behavior of a command-line AWK. A pleasant difference is that keyword parameters are automatically parsed out, instead of requiring you to do it manually like in real AWK.
The $+ , $-, $*, $/, $REM, $EXPT, $++, #=, $<, $>, $<=, $>=, $/=, $MIN, $MAX, $ZEROP, $LENGTH, $PRINT functions can take numbers and strings and do the AWK thing, coercing the arguments back and forth as necessary. $++ is the concatenation operator (in AWK this is indicated by space, which clearly doesn't work in the Lisp world). $LENGTH is overloaded on associative arrays (i.e. hashtables) to return the HASH-TABLE-COUNT.
The generic functions INT, STR, and NUM handle the common coercions. AWK does provide an INT function, but string coercions are done by concatenating the value with the empty string, and numeric coercions are done by adding 0 to the value. While these kludges also work in CLAWK, the NUM and STR functions are to be preferred.
The $ARRAY, $AREF, $FOR $IN and $DELETE functions (and macros) implement AWK-like associative arrays (including the amusing SUBSEP behavior for multiple indices), except that unlike AWK there's nothing to keep you from nesting these "arrays".
$0, $1, ..., $20 are symbol-macros that access the fields of the current line. $#0, $#1, ..., $#20 do also, but they attempt to interpret the value as a number.
%0, %1, ..., %20 are symbol-macros that index into the recent submatches. %#0, %#1, ..., %#20 do also, but they attempt to interpret the submatch value as a number.
This system also defines a clawk-user package.