--- /dev/null
+FARE-CSV
+
+This library allows you to read and write CSV files, according to
+any of the prevailing "standards" and their popular variants.
+
+CSV means "Comma-Separated Values". It's a vastly underspecified "standard",
+as each and every implementation seems to behave differently, and sometimes,
+even major implementations (e.g. Microsoft Excel) change their behavior
+from one version to the next. Moreover, programs using CSV often explicitly
+allow for variants, whereby another character can be used instead of
+the standard comma U+2C #\, as a separator (typically, a tab, U+09, or
+a semi-colon, U+3B #\;), and another character can be used for quoting
+instead of the standard double-quote U+22 #\" (typically, a single-quote
+U+27 #\'). Finally, some implementations don't handle quotation properly
+when printing, and different implementations do different things with
+respect to line-ending. We try to handle all sensible such variants.
+However, one thing we do not try to do is encoding or decoding complex
+objects, as there is no standard whatsoever that covers this.
+The only standardized type for entries is strings, and
+we parse everything as (properly quoted) strings.
+We print strings by properly quoting them, and we PRINC numbers:
+it is up to you to make sure numbers are printed as you desire,
+or else to pass a string if CL's PRINC doesn't do what you want.
+
+By default, we follow the specification from creativyst,
+that seems to describe popular usage:
+ http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
+
+This document says about the same:
+ http://edoceo.com/utilitas/csv-file-format
+
+There is now an RFC that tries to standardize CSV,
+and we support it as well:
+ http://www.rfc-editor.org/rfc/rfc4180.txt
+
+Finally, here's what Perl hackers think CSV is:
+ http://search.cpan.org/~hmbrand/Text-CSV_XS-0.59/CSV_XS.pm
+
+
+==== Exported Functionality ====
+
+fare-csv defines and uses package FARE-CSV.
+
+function READ-CSV-STREAM (STREAM)
+ Read lines from STREAM in CSV format, using the current syntax parameters.
+ Return a list of list of strings, one entry for each line,
+ that contains one entry for each field.
+ Entries are read as strings;
+ it is up to you to interpret the strings as whatever you want.
+
+function READ-CSV-LINE (STREAM)
+ Read one line from STREAM in CSV format, using the current syntax parameters.
+ Return a list of strings, one for each field in the line.
+ Entries are read as strings;
+ it is up to you to interpret the strings as whatever you want.
+
+function READ-CSV-FILE (PATHNAME &KEY ELEMENT-TYPE EXTERNAL-FORMAT)
+ Open the file designated by PATHNAME, using the provided keys if any,
+ and call READ-CSV-STREAM on it.
+
+function WRITE-CSV-LINES (LINES STREAM)
+ Given a list of LINES, each of them a list of fields, and a STREAM,
+ format those lines as CSV according to the current syntax parameters.
+
+function WRITE-CSV-LINE (FIELDS STREAM)
+ Format one line of FIELDS to STREAM in CSV format,
+ using the current syntax parameters.
+ Take a list of FIELDS, and format them as follows:
+ if it's a string, write it,
+ only using quotes if needed for escaping;
+ if it's null, write an empty field;
+ if it's a different symbol, write its name as if a string,
+ only using quotes if needed for escaping;
+ if it's a number, format it as per PRINC.
+
+Constant +CR+
+ a string with the ASCII character 13 (Carriage Return).
+ It's the standard line termination for text on MacOS.
+
+Constant +LF+
+ a string with the ASCII character 10 (Line Feed).
+ It's the standard line termination for text on Unix.
+
+Constant +CRLF+
+ a string with the ASCII characters 13 and 10 (CR, LF).
+ It's the standard line termination for text on Windows, and many RFCs.
+
+Variable *SEPARATOR*
+ The separator to use when reading or writing CSV files.
+ A character. By default, a comma: #\,
+
+Variable *QUOTE*
+ The quote character to use when reading or writing CSV files.
+ A character. By default, a double-quote: #\"
+
+Variable *UNQUOTED-QUOTEQUOTE*
+ A boolean that is true iff a pair of quotes
+ represents a quote outside of quotes.
+ Microsoft and RFC4180 says NIL, csv.3tcl says T.
+ A boolean. By default, NIL.
+
+Variable *LOOSE-QUOTE*
+ A boolean that is true iff quotes appear anywhere in a field?
+ By default, NIL.
+
+Variable *ALLOW-BINARY*
+ A boolean that is true iff we accept non-ASCII data.
+ A boolean. By default, T.
+
+Variable *KEEP-META-INFO*
+ A boolean that when true causes the reader functions to return
+ for each entry, instead of a string, a list of a string and a plist;
+ the plist currently only has one property, :quoted, that has a boolean value
+ which is true iff the string included quotes.
+ A boolean. By default, NIL.
+
+Variable *EOL*
+ Line ending to use when writing CSV files.
+ A string. By default, +CRLF+ as specified by creativyst.
+
+Variable *LINE-ENDINGS*
+ A list of line endings accepted when parsing a CSV file.
+ Valid elements of that list are the constants +CRLF+, +LF+ and +CR+.
+ By default, contains all three values, as specified by creativyst.
+
+Variable *SKIP-WHITESPACE*
+ A boolean that when true causes initial and final (unquoted) spaces
+ to be ignored while parsing CSV.
+ A boolean. By default, T as specified by creativyst.
+
+Macro WITH-CREATIVYST-CSV-SYNTAX () &BODY BODY
+ A macro in which to wrap a program BODY, around which
+ all the above parameters will be bound to their default value,
+ as specified by creativyst.
+
+Macro WITH-RFC4180-CSV-SYNTAX () &BODY BODY
+ A macro in which to wrap a program BODY, around which
+ all the above parameters will be bound as per the RFC4180 specification.
+ As compared to creativyst, *EOL* is +LF+,
+ *LINE-ENDINGS* doesn't contain +CR+ but only +CRLF+ and +LF+
+ and *SKIP-WHITESPACE* is NIL.
+
+Macro WITH-STRICT-RFC4180-CSV-SYNTAX () &BODY BODY
+ A macro in which to wrap a program BODY, around which
+ all the above parameters will be bound as per the RFC4180 specification,
+ but with a stricter interpretation:
+ only +CRLF+ is accepted as *LINE-ENDINGS*, and we don't *ALLOW-BINARY* data.
http://www.cliki.net/fare-csv
LICENSE:
- http://www.geocities.com/SoHo/Cafe/5947/bugroff.html
+ http://tunes.org/legalese/bugroff.html
Also under no-restriction BSD license for those who insist.
DEPENDENCIES:
- apt-get install cl-asdf
+ asdf
USAGE:
- (asdf:load-system :fare-csv) ;; or (asdf:oos 'asdf:load-op :fare-csv) if using an old asdf
- (read-csv-line)
- (read-csv-stream s)
+ (asdf:load-system :fare-csv)
(read-csv-file "foo.csv")
+ (read-csv-stream stream)
+ (read-csv-line stream)
+ (write-csv-lines lines stream)
+ (write-csv-line fields stream)
EXAMPLE USE:
...
; -----------------------------------------------------------------------------
;;; Optimization
-(eval-when (:compile-toplevel)
- (declaim (optimize (speed 3) (safety 1) (debug 3))))
+(eval-when (:compile-toplevel :execute)
+ (declaim (optimize (speed 3) (safety 1) (debug 3))
+ #+sbcl (sb-ext:muffle-conditions sb-ext:compiler-note)))
; -----------------------------------------------------------------------------
;;; Thin compatibility layer
#| ;;; Not needed anymore
(eval-when (:compile-toplevel :load-toplevel :execute)
(unless (fboundp 'parse-number)
- (defun parse-number (s)
+ (defun parse-number (string)
(with-standard-io-syntax ()
(let* ((*read-eval* nil)
(*read-default-float-format* 'double-float)
- (n (read-from-string s)))
- (if (numberp n) n)))))) |#
+ (n (read-from-string string)))
+ (when (numberp n) n)))))) |#
; -----------------------------------------------------------------------------
;;; Parameters
;;#+DEBUG (defparameter *max* 2000)
;;#+DEBUG (defun maxbreak () (when (<= *max* 0) (setf *max* 2000) (break)) (decf *max*))
-(defsubst accept-p (x s)
- (let ((c (peek-char nil s nil nil)))
+(defsubst accept-p (x stream)
+ (let ((c (peek-char nil stream nil nil)))
;;#+DEBUG (format t "~&Current char: ~S~%" c)
;;#+DEBUG (maxbreak)
(etypecase x
((or function symbol) (funcall x c))
(integer (eql x (char-code c))))))
-(defsubst accept (x s)
- (and (accept-p x s)
- (read-char s)))
+(defsubst accept (x stream)
+ (and (accept-p x stream)
+ (read-char stream)))
-(defsubst accept-eof (s)
- (not (peek-char nil s nil nil)))
+(defsubst accept-eof (stream)
+ (not (peek-char nil stream nil nil)))
-(defsubst accept-eol (s)
+(defsubst accept-eol (stream)
(block nil
- (when (and *accept-lf* (accept #\Linefeed s)) (return t))
+ (when (and *accept-lf* (accept #\Linefeed stream)) (return t))
(when (or *accept-crlf* *accept-cr*)
- (when (accept #\Return s)
+ (when (accept #\Return stream)
(when *accept-crlf*
- (if (accept #\Linefeed s)
+ (if (accept #\Linefeed stream)
(return t)
(unless *accept-cr*
(error "Carriage-return without Linefeed!"))))
(return t)))
nil))
-(defsubst accept-space (s)
- (accept #'char-space-p s))
+(defsubst accept-space (stream)
+ (accept #'char-space-p stream))
-(defsubst accept-spaces (s)
- (loop for x = (accept-space s)
- while x
- collect x))
+(defsubst accept-spaces (stream)
+ (loop :for x = (accept-space stream) :while x :collect x))
-(defsubst accept-quote (s)
- (accept *quote* s))
+(defsubst accept-quote (stream)
+ (accept *quote* stream))
-(defsubst accept-separator (s)
- (accept *separator* s))
+(defsubst accept-separator (stream)
+ (accept *separator* stream))
-(defun read-csv-line (s)
- "Read CSV from a line, a list of strings, one string for each field."
+(defun read-csv-line (stream)
+ "Read one line from STREAM in CSV format, using the current syntax parameters.
+ Return a list of strings, one for each field in the line.
+ Entries are read as strings;
+ it is up to you to interpret the strings as whatever you want."
(validate-csv-parameters)
(let ((ss (make-string-output-stream))
(fields '())
;;#+DEBUG (format t "~&do-field~%")
(setf had-quotes nil)
(when *skip-whitespace*
- (accept-spaces s))
+ (accept-spaces stream))
;;#+DEBUG (format t "~&do-field, after spaces~%")
(cond
- ((or (accept-eol s) (accept-eof s))
+ ((or (accept-eol stream) (accept-eof stream))
(done))
(t
(do-field-start))))
(do-field-start ()
;;#+DEBUG (format t "~&do-field-start~%")
(cond
- ((accept-separator s)
+ ((accept-separator stream)
(add "") (do-fields))
- ((accept-quote s)
+ ((accept-quote stream)
(cond
- ((and *unquoted-quotequote* (accept-quote s))
+ ((and *unquoted-quotequote* (accept-quote stream))
(add-char *quote*) (do-field-unquoted))
(t
(do-field-quoted))))
;;#+DEBUG (format t "~&do-field-quoted~%")
(setf had-quotes t)
(cond
- ((accept-eof s)
+ ((accept-eof stream)
(error "unexpected end of stream in quotes"))
- ((accept-quote s)
+ ((accept-quote stream)
(cond
- ((accept-quote s)
+ ((accept-quote stream)
(quoted-field-char *quote*))
(*loose-quote*
(do-field-unquoted))
(add (current-string))
(end-of-field))))
(t
- (quoted-field-char (read-char s)))))
+ (quoted-field-char (read-char stream)))))
(quoted-field-char (c)
;;#+DEBUG (format t "~"ed-field-char~%")
(add-char c)
(do-field-unquoted ()
;;#+DEBUG (format t "~&do-field-unquoted~%")
(if *skip-whitespace*
- (let ((spaces (accept-spaces s)))
+ (let ((spaces (accept-spaces stream)))
(cond
- ((accept-separator s)
+ ((accept-separator stream)
(add (current-string))
(do-fields))
- ((or (accept-eol s) (accept-eof s))
+ ((or (accept-eol stream) (accept-eof stream))
(add (current-string))
(done))
(t
- (loop for x in spaces do (add-char x))
+ (map () #'add-char spaces)
(do-field-unquoted-no-skip))))
(do-field-unquoted-no-skip)))
(do-field-unquoted-no-skip ()
;;#+DEBUG (format t "~&do-field-unquoted-no-skip~%")
(cond
- ((accept-separator s)
+ ((accept-separator stream)
(add (current-string))
(do-fields))
- ((or (accept-eol s) (accept-eof s))
+ ((or (accept-eol stream) (accept-eof stream))
(add (current-string))
(done))
- ((accept-quote s)
+ ((accept-quote stream)
(cond
- ((and *unquoted-quotequote* (accept-quote s))
+ ((and *unquoted-quotequote* (accept-quote stream))
(add-char *quote*) (do-field-unquoted))
(*loose-quote*
(do-field-quoted))
(t
(error "unexpected quote in middle of field"))))
(t
- (add-char (read-char s))
+ (add-char (read-char stream))
(do-field-unquoted))))
(end-of-field ()
;;#+DEBUG (format t "~&end-of-field~%")
(when *skip-whitespace*
- (accept-spaces s))
+ (accept-spaces stream))
(cond
- ((or (accept-eol s) (accept-eof s))
+ ((or (accept-eol stream) (accept-eof stream))
(done))
- ((accept-separator s)
+ ((accept-separator stream)
(do-fields))
(t
(error "end of field expected"))))
(nreverse fields)))
(do-fields))))
-(defun read-csv-stream (s)
- "Read CSV from a stream, returning a list for each line of a list of strings for each field."
- (loop until (accept-eof s)
- collect (read-csv-line s)))
-
-(defun read-csv-file (pathname)
- "Read CSV from a file, returning a list for each line of a list of strings for each field."
- (with-open-file (s pathname :direction :input :if-does-not-exist :error)
- (read-csv-stream s)))
+(defun read-csv-stream (stream)
+ "Read lines from STREAM in CSV format, using the current syntax parameters.
+ Return a list of list of strings, one entry for each line,
+ that contains one entry for each field.
+ Entries are read as strings;
+ it is up to you to interpret the strings as whatever you want."
+ (loop :until (accept-eof stream) :collect (read-csv-line stream)))
+
+(defun read-csv-file (pathname &rest keys &key element-type external-format)
+ "Open the file designated by PATHNAME, using the provided keys if any,
+ and call READ-CSV-STREAM on it."
+ (declare (ignore element-type external-format))
+ (with-open-stream (stream (apply 'open pathname
+ :direction :input :if-does-not-exist :error keys))
+ (read-csv-stream stream)))
(defun char-needs-quoting (x)
(or (eql x *quote*)
t))
(defun write-csv-lines (lines stream)
- "Write many CSV line to STREAM."
+ "Given a list of LINES, each of them a list of fields, and a STREAM,
+ format those lines as CSV according to the current syntax parameters."
(dolist (x lines)
(write-csv-line x stream)))
(defun write-csv-line (fields stream)
- "Write one CSV line to STREAM."
- (loop for x on fields
- while x
- do
- (write-csv-field (first x) stream)
- (when (cdr x)
- (write-char *separator* stream)))
+ "Format one line of FIELDS to STREAM in CSV format,
+ using the current syntax parameters."
+ (loop :for x :on fields :do
+ (write-csv-field (first x) stream)
+ (when (cdr x)
+ (write-char *separator* stream)))
(write-string *eol* stream))
(defun write-csv-field (field stream)
(defun write-quoted-string (string stream)
(write-char *quote* stream)
- (loop for c across string do
- (when (char= c *quote*)
- (write-char c stream))
- (write-char c stream))
+ (loop :for c :across string :do
+ (when (char= c *quote*)
+ (write-char c stream))
+ (write-char c stream))
(write-char *quote* stream))
-;(trace read-csv-line read-csv-stream)
-
+;;#+DEBUG (trace read-csv-line read-csv-stream)
;;#+DEBUG (write (read-csv-file "test.csv"))
;;#+DEBUG (progn (setq *separator* #\;) (write (read-csv-file "/samba/ciev.csv")))