/[cmucl]/src/code/string.lisp
ViewVC logotype

Log of /src/code/string.lisp

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Links to unicode-snapshot-2009-06: (view) (annotate)
Sticky Tag:

Revision 1.12.30.27 - (view) (annotate) - [select for diffs]
Thu May 28 16:17:48 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-06
Changes since 1.12.30.26: +7 -8 lines
Diff to previous 1.12.30.26 , to branch point 1.12
Slightly modify DECOMPOSE so it can operate on non simple strings.

Revision 1.12.30.26 - (view) (annotate) - [select for diffs]
Wed May 27 20:34:19 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.25: +9 -9 lines
Diff to previous 1.12.30.25 , to branch point 1.12
code/char.lisp:
o Define CODEPOINT-LIMIT
o Define CODEPOINT type

code/extfmts.lisp
code/string.lisp
ode/unidata.lisp
pcl/simple-streams/external-formats/utf-32.lisp
pcl/simple-streams/external-formats/utf-8.lisp
o Use the CODEPOINT type in declarations.

Revision 1.12.30.25 - (view) (annotate) - [select for diffs]
Wed May 27 17:39:51 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.24: +35 -1 lines
Diff to previous 1.12.30.24 , to branch point 1.12
code/seq.lisp:
o Moved STRING-REVERSE* and STRING-NREVERSE* to string.lisp because we
  need to use WITH-STRING.

code/string.lisp:
o Fix STRING-REVERSE* and STRING-NREVERSE* which were not properly
  handling non-simple strings.  The following tests were not returning
  "edcba":

(let* ((x (make-array 10
		      :initial-contents "abcdefghij"
		      :fill-pointer 5
		      :element-type 'base-char))
       (y (reverse x)))
  y)

(let* ((x (make-array 10
		      :initial-contents "abcdefghij"
		      :fill-pointer 5
		      :element-type 'character))
       (y (nreverse x)))
  y)

Revision 1.12.30.24 - (view) (annotate) - [select for diffs]
Wed May 27 11:31:38 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.23: +35 -38 lines
Diff to previous 1.12.30.23 , to branch point 1.12
o Revert previous change to STRING-TO-NFC and STRING-TO-NFKC.
o Use WITH-STRING in NORMALIZED-FORM-P so we operate on the underlying
  simple-string data.

Revision 1.12.30.23 - (view) (annotate) - [select for diffs]
Wed May 27 01:06:19 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.22: +15 -13 lines
Diff to previous 1.12.30.22 , to branch point 1.12
NORMALIZED-FORM-P needs simple-strings.  We should to this in a
different way, but this will do for now.

Revision 1.12.30.22 - (view) (annotate) - [select for diffs]
Tue May 26 16:25:02 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.21: +19 -73 lines
Diff to previous 1.12.30.21 , to branch point 1.12
code/string.lisp:
o Add function (setf codepoint)
o Add docstrings for STRING-TO-NFC and STRING-TO-NFKC.
o Move things related to pairwise composition to unidata.lisp.

code/unidata.lisp:
o Things related to pairwise composition moved here.
o Adjust *COMPOSITION-EXCLUSION* to include only the non-commented
  items in CompositionExclusions.txt.
o Make BUILD-COMPOSITION-TABLE to exclude characters that can be
  derived from the decomposition.  (Basically, ignore the four
  decompositions of length greater than 1 that start with a non-zero
  combining class.)

Revision 1.12.30.21 - (view) (annotate) - [select for diffs]
Tue May 26 02:15:55 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.20: +169 -6 lines
Diff to previous 1.12.30.20 , to branch point 1.12
Add support for Unicode NFC and NFKC forms.  Implement STRING-TO-NFC
and STRING-TO-NFKC.

This probably needs some more work.  The composition table should
probably be a trie and should be in unidata.bin instead of the hash
table that we use now.  The composition exclusion list should be
probably be in unidata.bin too instead of here.

These functions pass all of the normalization tests.

Revision 1.12.30.20 - (view) (annotate) - [select for diffs]
Fri May 22 11:31:55 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.19: +2 -2 lines
Diff to previous 1.12.30.19 , to branch point 1.12
Fix bug in DECOMPOSE which was no longer sorting the combining
characters in combining-category order.  We now pass the NFD and NFKD
normalization tests again.

(Fix from Paul)

Revision 1.12.30.19 - (view) (annotate) - [select for diffs]
Wed May 20 21:47:36 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.18: +28 -7 lines
Diff to previous 1.12.30.18 , to branch point 1.12
string.lisp:
o Add SURROGATEP function to test if something is a surrogate value.

extfmts.lisp:
utf-16-be.lisp:
utf-16-le.lisp:
utf-16.lisp:
utf-32-be.lisp:
utf-32-le.lisp:
utf-32.lisp:
utf-8.lisp:
o Use SURROGATEP.

Revision 1.12.30.18 - (view) (annotate) - [select for diffs]
Wed May 20 16:30:08 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.17: +13 -6 lines
Diff to previous 1.12.30.17 , to branch point 1.12
Do case-insensitive comparison by converting to lower case instead of
upper case.  This is what Unicode CaseFolding.txt does.  One example
of where it matters is U+1E9E is mapped to a lower case U+DF.  But the
upper case version of U+DF is U+DF.

char.lisp:
o Change EQUAL-CHAR-CODE to convert to lowercase.

string.lisp:
o Change EQUAL-CHAR-CODEPOINT to convert to lowercase.
o Fix mistake in STRING-LESS-GREATER-EQUAL which was incorrectly
  comparing the codepoints instead of the equal-char-codepoint values.

Revision 1.12.30.17 - (view) (annotate) - [select for diffs]
Tue May 19 20:36:28 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.16: +6 -5 lines
Diff to previous 1.12.30.16 , to branch point 1.12
Fix grammar.

Revision 1.12.30.16 - (view) (annotate) - [select for diffs]
Tue May 19 20:24:19 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.15: +19 -1 lines
Diff to previous 1.12.30.15 , to branch point 1.12
Add UTF16-STRING-P to determine if a string is a valid UTF-16 encoded
string.

Revision 1.12.30.15 - (view) (annotate) - [select for diffs]
Mon May 18 13:38:11 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.14: +53 -5 lines
Diff to previous 1.12.30.14 , to branch point 1.12
STRING-LESS-GREATER-EQUAL handles codepoints so STRING-LESSP and
friends now sort in codepoint order (after converting to uppercase).

Revision 1.12.30.14 - (view) (annotate) - [select for diffs]
Tue May 12 16:31:49 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.13: +6 -6 lines
Diff to previous 1.12.30.13 , to branch point 1.12
o Lots of spelling fixes from Paul.
o Add unicode codepoints in final-sigma.lisp (in case the characters
  there don't show up correctly).
o Support partial-fill in READ-INTO-STRING.

Revision 1.12.30.13 - (view) (annotate) - [select for diffs]
Wed May 6 13:05:15 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.12: +5 -1 lines
Diff to previous 1.12.30.12 , to branch point 1.12
Simple docstrings for STRING-TO-NFD and STRING-TO-NFKD.

Revision 1.12.30.12 - (view) (annotate) - [select for diffs]
Mon May 4 14:13:32 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.11: +9 -3 lines
Diff to previous 1.12.30.11 , to branch point 1.12
From Paul: Package and symbols names in Unicode need to be in a
canonical normalization form (eventually...when NFC is implemented)

Revision 1.12.30.11 - (view) (annotate) - [select for diffs]
Sun May 3 13:51:59 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.10: +12 -23 lines
Diff to previous 1.12.30.10 , to branch point 1.12
From Paul.  Use CODEPOINT in %GLYPH-B.

Revision 1.12.30.10 - (view) (annotate) - [select for diffs]
Sun May 3 12:37:02 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.9: +48 -70 lines
Diff to previous 1.12.30.9 , to branch point 1.12
Updates from Paul.

o Use CODEPOINT instead of XCHAR in %GLYPH-F
o Simplify DECOMPOSE

Revision 1.12.30.9 - (view) (annotate) - [select for diffs]
Sat May 2 11:54:37 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.8: +73 -5 lines
Diff to previous 1.12.30.8 , to branch point 1.12
Updates from Paul.  With these changes, we pass the Unicode
normalization test suite successfully for NFD and NFKD.

unidata.lisp:
o Implement algorithmic decomposition of Hangul.

string.lisp:
o Implement Unicode normalization forms NFD and NFKD.

Revision 1.12.30.8 - (view) (annotate) - [select for diffs]
Thu Apr 23 15:10:08 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-05
Changes since 1.12.30.7: +23 -23 lines
Diff to previous 1.12.30.7 , to branch point 1.12
string.lisp:
o Add Paul's SURROGATES-TO-CODEPOINT and remove
  CODEPOINT-FROM-SURROGATES.
o Change SURROGATES to return characters, not numbers.
o Update callers of SURROGATES to match.

extfmts.lisp:
o Update callers of SURROGATES to match.
o Use CODEPOINT to extract the correct codepoint from a string in
  EF-STRING-TO-OCTETS and EF-OCTETS-TO-STRING.

Revision 1.12.30.7 - (view) (annotate) - [select for diffs]
Wed Apr 22 17:05:51 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.6: +21 -12 lines
Diff to previous 1.12.30.6 , to branch point 1.12
o Add new function CODEPOINT-FROM-SURROGATES to compute the codepoint
  from two surrogate values.  (Should we use a better name?)
o Use the new function in CODEPOINT.
o Add docstrings to the functions.

Revision 1.12.30.6 - (view) (annotate) - [select for diffs]
Tue Apr 21 17:47:31 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.5: +89 -65 lines
Diff to previous 1.12.30.5 , to branch point 1.12
code/string.lisp:
o From Paul:
  - Handle the ASCII special casing in string.lisp instead of
    unidata.lisp
  - Add utility functions CODEPOINT and SURROGATES.

code/unidata.lisp:
o Remove the ASCII special cases from UNICODE-LOWER, UNICODE-UPPER,
  UNICODE-TITLE.

Revision 1.12.30.5 - (view) (annotate) - [select for diffs]
Mon Apr 20 19:46:48 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.4: +7 -7 lines
Diff to previous 1.12.30.4 , to branch point 1.12
NSTRING-UPCASE and NSTRING-DOWNCASE were referencing the unknown
symbols NEWSTRING and NEW-INDEX.  Replace with STRING and INDEX,
respectively.  I think that's what was intended.

Revision 1.12.30.4 - (view) (annotate) - [select for diffs]
Mon Apr 20 14:26:48 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.3: +155 -33 lines
Diff to previous 1.12.30.3 , to branch point 1.12
From Paul:

    Here's a version of [n]string-(up|down)case that handles non-BMP
    characters.  Also added functionless stubs for normalization
    forms.  Improved string-reverse* and implemented string-nreverse*
    in a way that shouldn't cons (not the original way I worked out,
    which might be faster but is quite complicated).

    (The glyph builder now stops when it hits a combining character
    that's out of sequence (canonical order)---I'm not sure whether or
    not that's the Right Thing to do)

Revision 1.12.30.3 - (view) (annotate) - [select for diffs]
Sat Apr 18 12:27:05 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.2: +42 -2 lines
Diff to previous 1.12.30.2 , to branch point 1.12
More updates from Paul.

code/seq.lisp:
o Update SEQ-DISPATCH to allow a special dispatch form for strings.
o Implement STRING-REVERSE* that correctly handles our UTF-16 strings.
o Implement STRING-NREVERSE*, but this needs work to reduce consing.

code/string.lisp:
o Add GLYPH and SGLYPH to return the glyph from a position in a
  string.

code/exports.lisp:
o Export GLYPH and SGLYPH

Revision 1.12.30.2 - (view) (annotate) - [select for diffs]
Wed Apr 15 14:41:55 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.1: +3 -3 lines
Diff to previous 1.12.30.1 , to branch point 1.12
Updates from Paul:

	add numeric values and decompositions to unidata, added
	char-titlecase, and made string-capitalize use title-case
	rather than upper-case, when those are different.

The unidata.bin file needs to be rebuilt, and a cross-compile needs to
be done to support the new unidata.bin format.

Revision 1.12.30.1 - (view) (annotate) - [select for diffs]
Wed Mar 25 21:51:34 2009 UTC (5 years, 1 month ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-utf16-extfmt-2009-03-27
Changes since 1.12: +27 -3 lines
Diff to previous 1.12
Merge from unicode-utf16 branch, label
unicode-utf16-char-support-2009-03-25 to get character support.

Revision 1.12 - (view) (annotate) - [select for diffs]
Fri Apr 11 15:41:59 2003 UTC (11 years ago) by emarsden
Branch: MAIN
CVS Tags: RELEASE_19f, amd64-merge-start, double-double-array-base, double-double-array-checkpoint, double-double-base, double-double-init-%make-sparc, double-double-init-checkpoint-1, double-double-init-ppc, double-double-init-sparc, double-double-init-sparc-2, double-double-init-x86, double-double-irrat-end, double-double-irrat-start, double-double-reader-base, double-double-reader-checkpoint-1, double-double-sparc-checkpoint-1, dynamic-extent-base, label-2009-03-16, label-2009-03-25, lisp-executable-base, merge-sse2-packed, merge-with-19f, mod-arith-base, ppc_gencgc_snap_2005-05-14, ppc_gencgc_snap_2005-12-17, ppc_gencgc_snap_2006-01-06, pre-telent-clx, prm-before-macosx-merge-tag, release-19a, release-19a-base, release-19a-pre1, release-19a-pre2, release-19a-pre3, release-19b-base, release-19b-pre1, release-19b-pre2, release-19c, release-19c-base, release-19c-pre1, release-19d, release-19d-base, release-19d-pre1, release-19d-pre2, release-19e, release-19e-base, release-19e-pre1, release-19e-pre2, release-19f-base, release-19f-pre1, remove_negative_zero_not_zero, snapshot-2003-10, snapshot-2003-11, snapshot-2003-12, snapshot-2004-04, snapshot-2004-05, snapshot-2004-06, snapshot-2004-07, snapshot-2004-08, snapshot-2004-09, snapshot-2004-10, snapshot-2004-11, snapshot-2004-12, snapshot-2005-01, snapshot-2005-02, snapshot-2005-03, snapshot-2005-04, snapshot-2005-05, snapshot-2005-06, snapshot-2005-07, snapshot-2005-08, snapshot-2005-09, snapshot-2005-10, snapshot-2005-11, snapshot-2005-12, snapshot-2006-01, snapshot-2006-02, snapshot-2006-03, snapshot-2006-04, snapshot-2006-05, snapshot-2006-06, snapshot-2006-07, snapshot-2006-08, snapshot-2006-09, snapshot-2006-10, snapshot-2006-11, snapshot-2006-12, snapshot-2007-01, snapshot-2007-02, snapshot-2007-03, snapshot-2007-04, snapshot-2007-05, snapshot-2007-06, snapshot-2007-07, snapshot-2007-08, snapshot-2007-09, snapshot-2007-10, snapshot-2007-11, snapshot-2007-12, snapshot-2008-01, snapshot-2008-02, snapshot-2008-03, snapshot-2008-04, snapshot-2008-05, snapshot-2008-06, snapshot-2008-07, snapshot-2008-08, snapshot-2008-09, snapshot-2008-10, snapshot-2008-11, snapshot-2008-12, snapshot-2009-01, snapshot-2009-02, snapshot-2009-04, snapshot-2009-05, sparc_gencgc, sparc_gencgc_merge, sse2-base, sse2-checkpoint-2008-10-01, sse2-merge-with-2008-10, sse2-merge-with-2008-11, sse2-packed-2008-11-12, sse2-packed-base, unicode-utf16-base, unicode-utf16-extfmts-pre-sync-2008-11, unicode-utf16-extfmts-sync-2008-12, unicode-utf16-string-support, unicode-utf16-sync-2008-07, unicode-utf16-sync-2008-09, unicode-utf16-sync-2008-11, unicode-utf16-sync-2008-12, unicode-utf16-sync-label-2009-03-16
Branch point for: RELEASE-19F-BRANCH, double-double-array-branch, double-double-branch, double-double-reader-branch, dynamic-extent, lisp-executable, mod-arith-branch, ppc_gencgc_branch, release-19a-branch, release-19b-branch, release-19c-branch, release-19d-branch, release-19e-branch, sparc_gencgc_branch, sse2-branch, sse2-packed-branch, unicode-utf16-branch, unicode-utf16-extfmt-branch
Changes since 1.11: +3 -3 lines
Diff to previous 1.11
Instead of ignoring the :element-type argument to MAKE-STRING, we check
that it's a valid subtype of character (then ignore it).

Revision 1.11 - (view) (annotate) - [select for diffs]
Sun Jun 17 19:12:34 2001 UTC (12 years, 10 months ago) by pw
Branch: MAIN
CVS Tags: LINKAGE_TABLE, PRE_LINKAGE_TABLE, UNICODE-BASE, cold-pcl-base, release-18e, release-18e-base, release-18e-pre1, release-18e-pre2
Branch point for: UNICODE-BRANCH, cold-pcl, release-18e-branch
Changes since 1.10: +7 -6 lines
Diff to previous 1.10
From eric Marsden:

Fix some error types to be ANSI compliant.

Revision 1.10 - (view) (annotate) - [select for diffs]
Sun Mar 4 23:37:33 2001 UTC (13 years, 1 month ago) by pw
Branch: MAIN
Changes since 1.9: +3 -1 lines
Diff to previous 1.9
A few well placed inhibit-warnings declarations to suppress noise in
compile-lisp.log. Only 46/130 notes left.

Revision 1.9 - (view) (annotate) - [select for diffs]
Fri Feb 13 16:09:42 1998 UTC (16 years, 2 months ago) by dtc
Branch: MAIN
Changes since 1.8: +4 -3 lines
Diff to previous 1.8
ANSI CL compat. changes:
o Add an optional environment argument to constantp; ignored by CMUCL.
o Add the :element-type keyword to make-string.

Revision 1.8 - (view) (annotate) - [select for diffs]
Fri Jul 12 18:55:24 1996 UTC (17 years, 9 months ago) by ram
Branch: MAIN
CVS Tags: RELEASE_18a
Branch point for: RELENG_18
Changes since 1.7: +10 -7 lines
Diff to previous 1.7
Merged DTC's patch to string<>=*-body which fixes various problems that arose
when :start2 :end2 values were specified.

Revision 1.7 - (view) (annotate) - [select for diffs]
Mon Oct 31 04:11:27 1994 UTC (19 years, 5 months ago) by ram
Branch: MAIN
Changes since 1.6: +1 -3 lines
Diff to previous 1.6
Fix headed boilerplate.

Revision 1.6 - (view) (annotate) - [select for diffs]
Fri May 15 17:50:40 1992 UTC (21 years, 11 months ago) by wlott
Branch: MAIN
Changes since 1.5: +2 -2 lines
Diff to previous 1.5
Removed an extra ``)''.

Revision 1.5 - (view) (annotate) - [select for diffs]
Tue May 28 17:25:48 1991 UTC (22 years, 11 months ago) by ram
Branch: MAIN
Changes since 1.4: +7 -14 lines
Diff to previous 1.4
Changed STRING-xxxCASE to not assign arguments.

Revision 1.4 - (view) (annotate) - [select for diffs]
Wed Apr 24 23:37:42 1991 UTC (23 years ago) by ram
Branch: MAIN
Changes since 1.3: +102 -151 lines
Diff to previous 1.3
Changed the WITH-xxx-STRINGs macros to use simply WITH-ARRAY-DATA, now that it
is more clever.  Also, changed it to accept any STRINGable thing, instead of
just strings and symbols.  These macros now bind the offset var instead of
randomly setting it.

Revision 1.3 - (view) (annotate) - [select for diffs]
Fri Feb 8 13:35:59 1991 UTC (23 years, 2 months ago) by ram
Branch: MAIN
Changes since 1.2: +8 -4 lines
Diff to previous 1.2
New file header with RCS header FILE-COMMENT.

Revision 1.2 - (view) (annotate) - [select for diffs]
Fri Aug 24 18:14:26 1990 UTC (23 years, 8 months ago) by wlott
Branch: MAIN
Changes since 1.1: +23 -25 lines
Diff to previous 1.1
Moved MIPS branch onto trunk; no merge necessary.

Revision 1.1 - (view) (annotate) - [select for diffs]
Tue Feb 6 17:27:06 1990 UTC (24 years, 2 months ago) by ram
Branch: MAIN
Initial revision

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.

  Diffs between and
  Type of Diff should be a

Sort log by:

  ViewVC Help
Powered by ViewVC 1.1.5