/[cmucl]/src/code/string.lisp
ViewVC logotype

Log of /src/code/string.lisp

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Sticky Tag:

Revision 1.29 - (view) (annotate) - [select for diffs]
Tue Oct 26 13:56:08 2010 UTC (3 years, 5 months ago) by rtoy
Branch: MAIN
CVS Tags: GIT-CONVERSION, HEAD, cross-sol-x86-2010-12-20, cross-sol-x86-base, cross-sol-x86-merged, cross-sparc-branch-base, snapshot-2010-11, snapshot-2010-12, snapshot-2011-01, snapshot-2011-02, snapshot-2011-03, snapshot-2011-04, snapshot-2011-06, snapshot-2011-07, snapshot-2011-09
Branch point for: cross-sol-x86-branch, cross-sparc-branch
Changes since 1.28: +2 -2 lines
Diff to previous 1.28 , to selected 1.6
Add trailing newline.

Revision 1.28 - (view) (annotate) - [select for diffs]
Wed Oct 13 18:00:44 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.27: +8 -8 lines
Diff to previous 1.27 , to selected 1.6
Some changes to replace calls to gettext with _"" or _N"" for things
compiled with and without Unicode.  This is needed so that the pot
files have the same content for both unicode and non-unicode builds.
(The _"" and _N"" are handled by the reader, so things that are
conditionalized out still get processed, unlike using gettext.)

Revision 1.27 - (view) (annotate) - [select for diffs]
Mon Sep 20 23:01:15 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.26: +11 -7 lines
Diff to previous 1.26 , to selected 1.6
o Inhibit warnings from SURROGATEP; I'm tired seeing the code deletion
  notes now.
o Tell the compiler what type the first return value of CODEPOINT is.
  Apparently, the compiler can't figure that out itself.

Revision 1.26 - (view) (annotate) - [select for diffs]
Wed Sep 15 21:06:38 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.25: +15 -5 lines
Diff to previous 1.25 , to selected 1.6
Add support for Unicode 5.2.  The normalization and wordbreak tests pass.

code/string.lisp:
o In %compose, handle the case where the composite character is
  outside the BMP and thus needs special handling for our UTF-16
  strings.

code/unidata.lisp
o CKJ Ideograph range has changed in 5.2.
o Fix bug in build-composition-table.  We were not correctly handling
  the case where the decomposition of a codepoint was outside the
  BMP.  Special care is needed to handle the UTF-16 strings that we
  use.
o The key for the pairwise composition table are the full codepoints,
  so we need to shift one by 21 bits instead of 16.

tools/build-unidata.lisp
o Update minor version to 2.

i18n/BidiMirroring.txt
i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt
o Updated from Unicode 5.2.

i18n/unidata.bin
o Regenerated from new Unicode 5.2 files.

Revision 1.25 - (view) (annotate) - [select for diffs]
Mon Sep 13 21:27:04 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.24: +31 -1 lines
Diff to previous 1.24 , to selected 1.6
Add function to convert a sequence of codepoints to a string and a
function to convert a string to a list of codepoints.

Revision 1.24 - (view) (annotate) - [select for diffs]
Tue Apr 20 17:57:45 2010 UTC (3 years, 11 months ago) by rtoy
Branch: MAIN
CVS Tags: RELEASE_20b, release-20b-pre1, release-20b-pre2, snapshot-2010-05, snapshot-2010-06, snapshot-2010-07, snapshot-2010-08, sparc-tramp-assem-2010-07-19, sparc-tramp-assem-base
Branch point for: RELEASE-20B-BRANCH, sparc-tramp-assem-branch
Changes since 1.23: +8 -8 lines
Diff to previous 1.23 , to selected 1.6
Change uses of _"foo" to (intl:gettext "foo").  This is because slime
may get confused with source locations if the reader macros are
installed.

Revision 1.23 - (view) (annotate) - [select for diffs]
Mon Apr 19 02:18:04 2010 UTC (4 years ago) by rtoy
Branch: MAIN
Changes since 1.22: +34 -34 lines
Diff to previous 1.22 , to selected 1.6
Remove _N"" reader macro from docstrings when possible.

Revision 1.22 - (view) (annotate) - [select for diffs]
Fri Mar 19 15:18:59 2010 UTC (4 years, 1 month ago) by rtoy
Branch: MAIN
CVS Tags: post-merge-intl-branch, snapshot-2010-04
Changes since 1.21: +53 -51 lines
Diff to previous 1.21 , to selected 1.6
Merge intl-branch 2010-03-18 to HEAD.  To build, you need to use
boot-2010-02-1 as the bootstrap file.  You should probably also use
the new -P option for build.sh to generate and update the po files
while building.

Revision 1.21.6.1 - (view) (annotate) - [select for diffs]
Thu Feb 25 20:34:52 2010 UTC (4 years, 1 month ago) by rtoy
Branch: intl-2-branch
Changes since 1.21: +53 -51 lines
Diff to previous 1.21 , to next main 1.29 , to selected 1.6
Restart internalization work.  This new branch starts with code from
the intl-branch on date 2010-02-12 18:00:00+0500.  This version works
and

LANG=en@piglatin bin/lisp

works (once the piglatin translation is added).

Revision 1.21.4.2 - (view) (annotate) - [select for diffs]
Wed Feb 10 04:01:27 2010 UTC (4 years, 2 months ago) by rtoy
Branch: intl-branch
CVS Tags: intl-branch-2010-03-18-1300, intl-branch-working-2010-02-11-1000, intl-branch-working-2010-02-19-1000
Changes since 1.21.4.1: +51 -51 lines
Diff to previous 1.21.4.1 , to branch point 1.21 , to next main 1.29 , to selected 1.6
Mark translatable strings; update cmucl.pot and ko/cmucl.po
accordingly.

Revision 1.21.4.1 - (view) (annotate) - [select for diffs]
Mon Feb 8 17:15:49 2010 UTC (4 years, 2 months ago) by rtoy
Branch: intl-branch
Changes since 1.21: +3 -1 lines
Diff to previous 1.21 , to selected 1.6
Add (intl:textdomain "cmucl") to the files to set the textdomain.

Revision 1.21 - (view) (annotate) - [select for diffs]
Sun Oct 18 14:21:24 2009 UTC (4 years, 6 months ago) by rtoy
Branch: MAIN
CVS Tags: amd64-dd-start, intl-2-branch-base, intl-branch-base, pre-merge-intl-branch, snapshot-2009-11, snapshot-2009-12, snapshot-2010-01, snapshot-2010-02, snapshot-2010-03
Branch point for: amd64-dd-branch, intl-2-branch, intl-branch
Changes since 1.20: +6 -6 lines
Diff to previous 1.20 , to selected 1.6
Merge changes from unicode-string-buffer-impl-branch which gives
faster reads on external-formats.  This is done by adding an
additional buffer to streams so we can convert the entire in-buffer
into characters all at once.

To build this change, you need to do a cross-compile using
boot-2009-10-1-cross.lisp.  Using that build, do a normal build with
these sources.

For a non-unicode build use boot-2009-10-01.lisp with a 20a
non-unicode build.

code/extfmts.lisp:
o Add another slot to the extfmts for copying the state.
o Modify EF-OCTETS-TO-STRING and OCTETS-TO-STRING to support the
  necesssary changes for fast formats.  This is incompatible with the
  previous version because the string is not grown if needed.

code/fd-stream-extfmt.lisp:
o Set *enable-stream-buffer-p* to T so we have fast streams.

code/fd-stream.lisp:
o Add new slots to support fast strams.
o In SET-ROUTINES, initialize the new slots appropriately.
o Update UNREAD-CHAR to be able to back up in the string buffer to
  unread.
o Add implementation to copy the state of an external format.

code/stream.lisp:
o Change %SET-FD-STREAM-EXTERNAL-FORMAT to be able to change formats
  even if we've already converted the buffer with a different format.
  We reconvert the buffer with the old format until we reach the
  current character.  Then the remaining octets are converted using
  the new format and stored in the string buffer.
o Add FAST-READ-CHAR-STRING-REFILL to refill the string buffer, like
  FAST-READ-CHAR-REFILL does for the octet in-buffer.

code/struct.lisp:
o Add new slots to hold the string buffer, the current index, and
  length.  These are needed for the fast formats.

code/sysmacs.lisp:
o Update PREPARE-FOR-FAST-READ-CHAR, DONE-WITH-FAST-READ-CHAR, and
  FAST-READ-CHAR to support the string buffer.

code/string.lisp:
o Microoptimization of SURROGATEP to reduce the number of branchs.

general-info/release-20b.txt:
o Update with these changes

pcl/simple-streams/external-formats/utf-16-be.lisp:
pcl/simple-streams/external-formats/utf-16-le.lisp:
pcl/simple-streams/external-formats/utf-16.lisp:
o These formats actually have state, so update them to take a handle
  an initial state.  These are needed if the string buffer ends with a
  leading surrogate and the next string buffer starts with a trailing
  surrogate.  The conversion needs to combine the surrogates together.

Revision 1.20.4.1 - (view) (annotate) - [select for diffs]
Wed Oct 7 14:46:38 2009 UTC (4 years, 6 months ago) by rtoy
Branch: unicode-string-buffer-impl-branch
Changes since 1.20: +7 -7 lines
Diff to previous 1.20 , to next main 1.29 , to selected 1.6
Minor optimization for SURROGATEP to have fewer branches in the
generated code.

Revision 1.20 - (view) (annotate) - [select for diffs]
Tue Sep 15 15:52:43 2009 UTC (4 years, 7 months ago) by rtoy
Branch: MAIN
CVS Tags: unicode-string-buffer-base, unicode-string-buffer-impl-base
Branch point for: unicode-string-buffer-branch, unicode-string-buffer-impl-branch
Changes since 1.19: +0 -149 lines
Diff to previous 1.19 , to selected 1.6
Oops.  Remove old code that didn't support our UTF-16 strings.

Revision 1.19 - (view) (annotate) - [select for diffs]
Tue Sep 15 15:51:25 2009 UTC (4 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.18: +468 -5 lines
Diff to previous 1.18 , to selected 1.6
Add support for the Unicode word break algorithm for
STRING-CAPITALIZE.  Not sure about the appropriate interface, though.

code/string.lisp:
o Add Unicode word break algorithm.  Based on Scheme code by William
  Clinger.  Used with permission.
o Update STRING-CAPITALIZE to take another keyword arg to indicate if
  we should use the Unicode word break algorithm.  Default is not to
  use the Unicode algorithm.

compiler/fndb.lisp:
o Update defknown for string-capitalize.

i18n/tests/WordBreakTest.txt:
o New test file for the word break algorithm

i18n/tests/word-break-test.lisp:
o New file to run the word break test.

Revision 1.18 - (view) (annotate) - [select for diffs]
Mon Aug 17 14:02:17 2009 UTC (4 years, 8 months ago) by rtoy
Branch: MAIN
CVS Tags: RELEASE_20a, release-20a-base, release-20a-pre1
Branch point for: RELEASE-20A-BRANCH
Changes since 1.17: +10 -10 lines
Diff to previous 1.17 , to selected 1.6
Use more descriptive argument names for SURROGATEP and
SURROGATES-TO-CODEPOINT.

Revision 1.17 - (view) (annotate) - [select for diffs]
Mon Aug 10 21:22:09 2009 UTC (4 years, 8 months ago) by rtoy
Branch: MAIN
CVS Tags: snapshot-2009-08
Changes since 1.16: +2 -2 lines
Diff to previous 1.16 , to selected 1.6
Oops.  utf16-string-p was returning NIL if the codepoint was
assigned.  It should return NIL if the codepoint is NOT assigned.

Revision 1.16 - (view) (annotate) - [select for diffs]
Mon Jul 13 14:01:48 2009 UTC (4 years, 9 months ago) by rtoy
Branch: MAIN
Changes since 1.15: +4 -1 lines
Diff to previous 1.15 , to selected 1.6
Clean up a few compiler warnings about unused variables.

Revision 1.15 - (view) (annotate) - [select for diffs]
Tue Jun 16 21:25:02 2009 UTC (4 years, 10 months ago) by rtoy
Branch: MAIN
CVS Tags: portable-clx-base, portable-clx-import-2009-06-16, snapshot-2009-07
Branch point for: portable-clx-branch
Changes since 1.14: +11 -1 lines
Diff to previous 1.14 , to selected 1.6
Cleanups for non-unicode build.

code/stream.lisp:
o Only define (setf stream-external-format) for Unicode builds.
o In stream-external-format, don't try to look up the external format
  from the fd-stream structure, which doesn't exist in non-unicode
  builds.

code/strings.lisp:
o Conditionalize out things that will only work if unicode is
  available.

tools/worldcom.lisp:
o Only compile fd-stream-extfmt for unicode builds.

Revision 1.14 - (view) (annotate) - [select for diffs]
Tue Jun 16 17:23:15 2009 UTC (4 years, 10 months ago) by rtoy
Branch: MAIN
Changes since 1.13: +13 -8 lines
Diff to previous 1.13 , to selected 1.6
code/string.lisp:
o Only define STRING-TO-NFD, STRING-TO-NFKD, and STRING-TO-NFKC for
  Unicode builds.  Conditionalize out their support functions too.
o Update export list to be conditional on Unicode too.
o Use new name for get-pairwise-composition.

code/exports.lisp:
o Update export list to be conditional on Unicode for above changes
  in string.lisp.

code/unidata.lisp:
o Change name from GET-PAIRWISE-COMPOSITION to
  UNICODE-PAIRWISE-COMPOSITION to match other Unicode function names.

Revision 1.13 - (view) (annotate) - [select for diffs]
Thu Jun 11 16:03:59 2009 UTC (4 years, 10 months ago) by rtoy
Branch: MAIN
CVS Tags: merged-unicode-utf16-extfmt-2009-06-11
Changes since 1.12: +828 -35 lines
Diff to previous 1.12 , to selected 1.6
Merge Unicode work to trunk.  From label
unicode-utf16-extfmt-2009-06-11.

Revision 1.12.30.34 - (view) (annotate) - [select for diffs]
Thu Jun 11 14:32:24 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-utf16-extfmt-2009-06-11
Changes since 1.12.30.33: +6 -7 lines
Diff to previous 1.12.30.33 , to branch point 1.12 , to next main 1.29 , to selected 1.6
Fix typos in string-case-fold.

Revision 1.12.30.33 - (view) (annotate) - [select for diffs]
Thu Jun 11 13:30:01 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.32: +66 -124 lines
Diff to previous 1.12.30.32 , to branch point 1.12 , to selected 1.6
Revert previous change that added case folding to string-equal and
friends.  We can't really do that for a couple of reasons:

- Case folding should be done on the NFD form according to the Unicode
  spec
- Full case folding may change the length of the string so it's not
  clear what the return value from string-lessp and friends should be.

Instead, we provide a new function, STRING-CASE-FOLD, to perform case
folding.

code/char.lisp:
o Use lowercase for case insensitve comparisons again.

code/string.lisp:
o Remove :casing option for string-lessp and friends.
o Remove code needed to support :casing option.
o Add STRING-CASE-FOLD to perform case folding operation.

compiler/fndb.lisp:
o Remove :casing option from defknowns.

Revision 1.12.30.32 - (view) (annotate) - [select for diffs]
Tue Jun 9 18:16:17 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.31: +73 -13 lines
Diff to previous 1.12.30.31 , to branch point 1.12 , to selected 1.6
o Only recognize :simple and :full for the casing parameter.
o Update docstrings to mention the casing parameter.

Revision 1.12.30.31 - (view) (annotate) - [select for diffs]
Tue Jun 9 14:53:13 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.30: +83 -37 lines
Diff to previous 1.12.30.30 , to branch point 1.12 , to selected 1.6
code/char.lisp:
o Use simple case folding for case-insensitive character comparisons.

code/string.lisp:
o Add new :CASING parameter for STRING-EQUAL and friends to allow for
  simple or full case folding.  Default is :SIMPLE.
o Update code to allow for simple or full case folding.

compiler/fndb.lisp:
o Tell compiler about new :CASING parameter.

Revision 1.12.30.30 - (view) (annotate) - [select for diffs]
Sat Jun 6 20:53:46 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.29: +119 -3 lines
Diff to previous 1.12.30.29 , to branch point 1.12 , to selected 1.6
o Update STRING-TRIM, STRING-LEFT-TRIM, and STRING-RIGHT-TRIM to
  handle surrogates in the string.  If the character bag is a string,
  we properly handle surrogates in the character bag.
o Document that STRING-TO-NFC and STRING-TO-NFKC can both return the
  original string untouched.

Revision 1.12.30.29 - (view) (annotate) - [select for diffs]
Fri Jun 5 19:17:01 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.28: +140 -13 lines
Diff to previous 1.12.30.28 , to branch point 1.12 , to selected 1.6
First cut at full-casing support.

code/string.lisp:
o Add :CASING parameter to STRING-UPCASE, STRING-DOWNCASE, and
  STRING-CAPITALIZE to allow whether :SIMPLE or :FULL casing is done.
  Default is :SIMPLE.
o Implement full casing for upcase, downcase, and capitalize.

compiler/fndb.lisp:
o Tell compiler about the extra parameter.

Revision 1.12.30.28 - (view) (annotate) - [select for diffs]
Thu Jun 4 15:47:40 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.27: +8 -5 lines
Diff to previous 1.12.30.27 , to branch point 1.12 , to selected 1.6
code/unidata.lisp:
o Add UNICODE-ASSIGNED-CODEPOINT-P

code/string.lisp:
o Make UTF16-STRING-P check for unassigned codepoints in the string.

Revision 1.12.30.27 - (view) (annotate) - [select for diffs]
Thu May 28 16:17:48 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-06
Changes since 1.12.30.26: +7 -8 lines
Diff to previous 1.12.30.26 , to branch point 1.12 , to selected 1.6
Slightly modify DECOMPOSE so it can operate on non simple strings.

Revision 1.12.30.26 - (view) (annotate) - [select for diffs]
Wed May 27 20:34:19 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.25: +9 -9 lines
Diff to previous 1.12.30.25 , to branch point 1.12 , to selected 1.6
code/char.lisp:
o Define CODEPOINT-LIMIT
o Define CODEPOINT type

code/extfmts.lisp
code/string.lisp
ode/unidata.lisp
pcl/simple-streams/external-formats/utf-32.lisp
pcl/simple-streams/external-formats/utf-8.lisp
o Use the CODEPOINT type in declarations.

Revision 1.12.30.25 - (view) (annotate) - [select for diffs]
Wed May 27 17:39:51 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.24: +35 -1 lines
Diff to previous 1.12.30.24 , to branch point 1.12 , to selected 1.6
code/seq.lisp:
o Moved STRING-REVERSE* and STRING-NREVERSE* to string.lisp because we
  need to use WITH-STRING.

code/string.lisp:
o Fix STRING-REVERSE* and STRING-NREVERSE* which were not properly
  handling non-simple strings.  The following tests were not returning
  "edcba":

(let* ((x (make-array 10
		      :initial-contents "abcdefghij"
		      :fill-pointer 5
		      :element-type 'base-char))
       (y (reverse x)))
  y)

(let* ((x (make-array 10
		      :initial-contents "abcdefghij"
		      :fill-pointer 5
		      :element-type 'character))
       (y (nreverse x)))
  y)

Revision 1.12.30.24 - (view) (annotate) - [select for diffs]
Wed May 27 11:31:38 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.23: +35 -38 lines
Diff to previous 1.12.30.23 , to branch point 1.12 , to selected 1.6
o Revert previous change to STRING-TO-NFC and STRING-TO-NFKC.
o Use WITH-STRING in NORMALIZED-FORM-P so we operate on the underlying
  simple-string data.

Revision 1.12.30.23 - (view) (annotate) - [select for diffs]
Wed May 27 01:06:19 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.22: +15 -13 lines
Diff to previous 1.12.30.22 , to branch point 1.12 , to selected 1.6
NORMALIZED-FORM-P needs simple-strings.  We should to this in a
different way, but this will do for now.

Revision 1.12.30.22 - (view) (annotate) - [select for diffs]
Tue May 26 16:25:02 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.21: +19 -73 lines
Diff to previous 1.12.30.21 , to branch point 1.12 , to selected 1.6
code/string.lisp:
o Add function (setf codepoint)
o Add docstrings for STRING-TO-NFC and STRING-TO-NFKC.
o Move things related to pairwise composition to unidata.lisp.

code/unidata.lisp:
o Things related to pairwise composition moved here.
o Adjust *COMPOSITION-EXCLUSION* to include only the non-commented
  items in CompositionExclusions.txt.
o Make BUILD-COMPOSITION-TABLE to exclude characters that can be
  derived from the decomposition.  (Basically, ignore the four
  decompositions of length greater than 1 that start with a non-zero
  combining class.)

Revision 1.12.30.21 - (view) (annotate) - [select for diffs]
Tue May 26 02:15:55 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.20: +169 -6 lines
Diff to previous 1.12.30.20 , to branch point 1.12 , to selected 1.6
Add support for Unicode NFC and NFKC forms.  Implement STRING-TO-NFC
and STRING-TO-NFKC.

This probably needs some more work.  The composition table should
probably be a trie and should be in unidata.bin instead of the hash
table that we use now.  The composition exclusion list should be
probably be in unidata.bin too instead of here.

These functions pass all of the normalization tests.

Revision 1.12.30.20 - (view) (annotate) - [select for diffs]
Fri May 22 11:31:55 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.19: +2 -2 lines
Diff to previous 1.12.30.19 , to branch point 1.12 , to selected 1.6
Fix bug in DECOMPOSE which was no longer sorting the combining
characters in combining-category order.  We now pass the NFD and NFKD
normalization tests again.

(Fix from Paul)

Revision 1.12.30.19 - (view) (annotate) - [select for diffs]
Wed May 20 21:47:36 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.18: +28 -7 lines
Diff to previous 1.12.30.18 , to branch point 1.12 , to selected 1.6
string.lisp:
o Add SURROGATEP function to test if something is a surrogate value.

extfmts.lisp:
utf-16-be.lisp:
utf-16-le.lisp:
utf-16.lisp:
utf-32-be.lisp:
utf-32-le.lisp:
utf-32.lisp:
utf-8.lisp:
o Use SURROGATEP.

Revision 1.12.30.18 - (view) (annotate) - [select for diffs]
Wed May 20 16:30:08 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.17: +13 -6 lines
Diff to previous 1.12.30.17 , to branch point 1.12 , to selected 1.6
Do case-insensitive comparison by converting to lower case instead of
upper case.  This is what Unicode CaseFolding.txt does.  One example
of where it matters is U+1E9E is mapped to a lower case U+DF.  But the
upper case version of U+DF is U+DF.

char.lisp:
o Change EQUAL-CHAR-CODE to convert to lowercase.

string.lisp:
o Change EQUAL-CHAR-CODEPOINT to convert to lowercase.
o Fix mistake in STRING-LESS-GREATER-EQUAL which was incorrectly
  comparing the codepoints instead of the equal-char-codepoint values.

Revision 1.12.30.17 - (view) (annotate) - [select for diffs]
Tue May 19 20:36:28 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.16: +6 -5 lines
Diff to previous 1.12.30.16 , to branch point 1.12 , to selected 1.6
Fix grammar.

Revision 1.12.30.16 - (view) (annotate) - [select for diffs]
Tue May 19 20:24:19 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.15: +19 -1 lines
Diff to previous 1.12.30.15 , to branch point 1.12 , to selected 1.6
Add UTF16-STRING-P to determine if a string is a valid UTF-16 encoded
string.

Revision 1.12.30.15 - (view) (annotate) - [select for diffs]
Mon May 18 13:38:11 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.14: +53 -5 lines
Diff to previous 1.12.30.14 , to branch point 1.12 , to selected 1.6
STRING-LESS-GREATER-EQUAL handles codepoints so STRING-LESSP and
friends now sort in codepoint order (after converting to uppercase).

Revision 1.12.30.14 - (view) (annotate) - [select for diffs]
Tue May 12 16:31:49 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.13: +6 -6 lines
Diff to previous 1.12.30.13 , to branch point 1.12 , to selected 1.6
o Lots of spelling fixes from Paul.
o Add unicode codepoints in final-sigma.lisp (in case the characters
  there don't show up correctly).
o Support partial-fill in READ-INTO-STRING.

Revision 1.12.30.13 - (view) (annotate) - [select for diffs]
Wed May 6 13:05:15 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.12: +5 -1 lines
Diff to previous 1.12.30.12 , to branch point 1.12 , to selected 1.6
Simple docstrings for STRING-TO-NFD and STRING-TO-NFKD.

Revision 1.12.30.12 - (view) (annotate) - [select for diffs]
Mon May 4 14:13:32 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.11: +9 -3 lines
Diff to previous 1.12.30.11 , to branch point 1.12 , to selected 1.6
From Paul: Package and symbols names in Unicode need to be in a
canonical normalization form (eventually...when NFC is implemented)

Revision 1.12.30.11 - (view) (annotate) - [select for diffs]
Sun May 3 13:51:59 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.10: +12 -23 lines
Diff to previous 1.12.30.10 , to branch point 1.12 , to selected 1.6
From Paul.  Use CODEPOINT in %GLYPH-B.

Revision 1.12.30.10 - (view) (annotate) - [select for diffs]
Sun May 3 12:37:02 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.9: +48 -70 lines
Diff to previous 1.12.30.9 , to branch point 1.12 , to selected 1.6
Updates from Paul.

o Use CODEPOINT instead of XCHAR in %GLYPH-F
o Simplify DECOMPOSE

Revision 1.12.30.9 - (view) (annotate) - [select for diffs]
Sat May 2 11:54:37 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.8: +73 -5 lines
Diff to previous 1.12.30.8 , to branch point 1.12 , to selected 1.6
Updates from Paul.  With these changes, we pass the Unicode
normalization test suite successfully for NFD and NFKD.

unidata.lisp:
o Implement algorithmic decomposition of Hangul.

string.lisp:
o Implement Unicode normalization forms NFD and NFKD.

Revision 1.12.30.8 - (view) (annotate) - [select for diffs]
Thu Apr 23 15:10:08 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-05
Changes since 1.12.30.7: +23 -23 lines
Diff to previous 1.12.30.7 , to branch point 1.12 , to selected 1.6
string.lisp:
o Add Paul's SURROGATES-TO-CODEPOINT and remove
  CODEPOINT-FROM-SURROGATES.
o Change SURROGATES to return characters, not numbers.
o Update callers of SURROGATES to match.

extfmts.lisp:
o Update callers of SURROGATES to match.
o Use CODEPOINT to extract the correct codepoint from a string in
  EF-STRING-TO-OCTETS and EF-OCTETS-TO-STRING.

Revision 1.12.30.7 - (view) (annotate) - [select for diffs]
Wed Apr 22 17:05:51 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.6: +21 -12 lines
Diff to previous 1.12.30.6 , to branch point 1.12 , to selected 1.6
o Add new function CODEPOINT-FROM-SURROGATES to compute the codepoint
  from two surrogate values.  (Should we use a better name?)
o Use the new function in CODEPOINT.
o Add docstrings to the functions.

Revision 1.12.30.6 - (view) (annotate) - [select for diffs]
Tue Apr 21 17:47:31 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.5: +89 -65 lines
Diff to previous 1.12.30.5 , to branch point 1.12 , to selected 1.6
code/string.lisp:
o From Paul:
  - Handle the ASCII special casing in string.lisp instead of
    unidata.lisp
  - Add utility functions CODEPOINT and SURROGATES.

code/unidata.lisp:
o Remove the ASCII special cases from UNICODE-LOWER, UNICODE-UPPER,
  UNICODE-TITLE.

Revision 1.12.30.5 - (view) (annotate) - [select for diffs]
Mon Apr 20 19:46:48 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.4: +7 -7 lines
Diff to previous 1.12.30.4 , to branch point 1.12 , to selected 1.6
NSTRING-UPCASE and NSTRING-DOWNCASE were referencing the unknown
symbols NEWSTRING and NEW-INDEX.  Replace with STRING and INDEX,
respectively.  I think that's what was intended.

Revision 1.12.30.4 - (view) (annotate) - [select for diffs]
Mon Apr 20 14:26:48 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.3: +155 -33 lines
Diff to previous 1.12.30.3 , to branch point 1.12 , to selected 1.6
From Paul:

    Here's a version of [n]string-(up|down)case that handles non-BMP
    characters.  Also added functionless stubs for normalization
    forms.  Improved string-reverse* and implemented string-nreverse*
    in a way that shouldn't cons (not the original way I worked out,
    which might be faster but is quite complicated).

    (The glyph builder now stops when it hits a combining character
    that's out of sequence (canonical order)---I'm not sure whether or
    not that's the Right Thing to do)

Revision 1.12.30.3 - (view) (annotate) - [select for diffs]
Sat Apr 18 12:27:05 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.2: +42 -2 lines
Diff to previous 1.12.30.2 , to branch point 1.12 , to selected 1.6
More updates from Paul.

code/seq.lisp:
o Update SEQ-DISPATCH to allow a special dispatch form for strings.
o Implement STRING-REVERSE* that correctly handles our UTF-16 strings.
o Implement STRING-NREVERSE*, but this needs work to reduce consing.

code/string.lisp:
o Add GLYPH and SGLYPH to return the glyph from a position in a
  string.

code/exports.lisp:
o Export GLYPH and SGLYPH

Revision 1.12.30.2 - (view) (annotate) - [select for diffs]
Wed Apr 15 14:41:55 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.12.30.1: +3 -3 lines
Diff to previous 1.12.30.1 , to branch point 1.12 , to selected 1.6
Updates from Paul:

	add numeric values and decompositions to unidata, added
	char-titlecase, and made string-capitalize use title-case
	rather than upper-case, when those are different.

The unidata.bin file needs to be rebuilt, and a cross-compile needs to
be done to support the new unidata.bin format.

Revision 1.12.30.1 - (view) (annotate) - [select for diffs]
Wed Mar 25 21:51:34 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-utf16-extfmt-2009-03-27
Changes since 1.12: +27 -3 lines
Diff to previous 1.12 , to selected 1.6
Merge from unicode-utf16 branch, label
unicode-utf16-char-support-2009-03-25 to get character support.

Revision 1.12.28.1 - (view) (annotate) - [select for diffs]
Tue Mar 24 11:44:20 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-branch
CVS Tags: unicode-utf16-char-support-2009-03-25, unicode-utf16-char-support-2009-03-26
Changes since 1.12: +27 -3 lines
Diff to previous 1.12 , to next main 1.29 , to selected 1.6
Compare strings using code-point order, since we don't have any kind
of collation support.

Revision 1.12 - (view) (annotate) - [select for diffs]
Fri Apr 11 15:41:59 2003 UTC (11 years ago) by emarsden
Branch: MAIN
CVS Tags: RELEASE_19f, amd64-merge-start, double-double-array-base, double-double-array-checkpoint, double-double-base, double-double-init-%make-sparc, double-double-init-checkpoint-1, double-double-init-ppc, double-double-init-sparc, double-double-init-sparc-2, double-double-init-x86, double-double-irrat-end, double-double-irrat-start, double-double-reader-base, double-double-reader-checkpoint-1, double-double-sparc-checkpoint-1, dynamic-extent-base, label-2009-03-16, label-2009-03-25, lisp-executable-base, merge-sse2-packed, merge-with-19f, mod-arith-base, ppc_gencgc_snap_2005-05-14, ppc_gencgc_snap_2005-12-17, ppc_gencgc_snap_2006-01-06, pre-telent-clx, prm-before-macosx-merge-tag, release-19a, release-19a-base, release-19a-pre1, release-19a-pre2, release-19a-pre3, release-19b-base, release-19b-pre1, release-19b-pre2, release-19c, release-19c-base, release-19c-pre1, release-19d, release-19d-base, release-19d-pre1, release-19d-pre2, release-19e, release-19e-base, release-19e-pre1, release-19e-pre2, release-19f-base, release-19f-pre1, remove_negative_zero_not_zero, snapshot-2003-10, snapshot-2003-11, snapshot-2003-12, snapshot-2004-04, snapshot-2004-05, snapshot-2004-06, snapshot-2004-07, snapshot-2004-08, snapshot-2004-09, snapshot-2004-10, snapshot-2004-11, snapshot-2004-12, snapshot-2005-01, snapshot-2005-02, snapshot-2005-03, snapshot-2005-04, snapshot-2005-05, snapshot-2005-06, snapshot-2005-07, snapshot-2005-08, snapshot-2005-09, snapshot-2005-10, snapshot-2005-11, snapshot-2005-12, snapshot-2006-01, snapshot-2006-02, snapshot-2006-03, snapshot-2006-04, snapshot-2006-05, snapshot-2006-06, snapshot-2006-07, snapshot-2006-08, snapshot-2006-09, snapshot-2006-10, snapshot-2006-11, snapshot-2006-12, snapshot-2007-01, snapshot-2007-02, snapshot-2007-03, snapshot-2007-04, snapshot-2007-05, snapshot-2007-06, snapshot-2007-07, snapshot-2007-08, snapshot-2007-09, snapshot-2007-10, snapshot-2007-11, snapshot-2007-12, snapshot-2008-01, snapshot-2008-02, snapshot-2008-03, snapshot-2008-04, snapshot-2008-05, snapshot-2008-06, snapshot-2008-07, snapshot-2008-08, snapshot-2008-09, snapshot-2008-10, snapshot-2008-11, snapshot-2008-12, snapshot-2009-01, snapshot-2009-02, snapshot-2009-04, snapshot-2009-05, sparc_gencgc, sparc_gencgc_merge, sse2-base, sse2-checkpoint-2008-10-01, sse2-merge-with-2008-10, sse2-merge-with-2008-11, sse2-packed-2008-11-12, sse2-packed-base, unicode-utf16-base, unicode-utf16-extfmts-pre-sync-2008-11, unicode-utf16-extfmts-sync-2008-12, unicode-utf16-string-support, unicode-utf16-sync-2008-07, unicode-utf16-sync-2008-09, unicode-utf16-sync-2008-11, unicode-utf16-sync-2008-12, unicode-utf16-sync-label-2009-03-16
Branch point for: RELEASE-19F-BRANCH, double-double-array-branch, double-double-branch, double-double-reader-branch, dynamic-extent, lisp-executable, mod-arith-branch, ppc_gencgc_branch, release-19a-branch, release-19b-branch, release-19c-branch, release-19d-branch, release-19e-branch, sparc_gencgc_branch, sse2-branch, sse2-packed-branch, unicode-utf16-branch, unicode-utf16-extfmt-branch
Changes since 1.11: +3 -3 lines
Diff to previous 1.11 , to selected 1.6
Instead of ignoring the :element-type argument to MAKE-STRING, we check
that it's a valid subtype of character (then ignore it).

Revision 1.11.2.1 - (view) (annotate) - [select for diffs]
Fri Oct 4 23:13:36 2002 UTC (11 years, 6 months ago) by pmai
Branch: UNICODE-BRANCH
Changes since 1.11: +27 -7 lines
Diff to previous 1.11 , to next main 1.29 , to selected 1.6
Checked in Brian Spilsbury's experimental Unicode, locales, and dialect
support patchset.  This lives on its own branch, so that people can
play with it and tweak it, without disturbing 18e release engineering
on the main branch.  Bootstrapping has only been tried on LINKAGE_TABLE
x86/Linux builds.  A working cross-compile script is checked in under
bootfiles/19a/boot1-cross-unicode.lisp.  The script still leaves you
with some interactive errors, on the cross compile, which you should
answer with 2.  See the mailing list for more information.

Revision 1.8.2.2 - (view) (annotate) - [select for diffs]
Sat Mar 23 18:50:12 2002 UTC (12 years ago) by pw
Branch: RELENG_18
CVS Tags: RELEASE_18d
Changes since 1.8.2.1: +9 -6 lines
Diff to previous 1.8.2.1 , to branch point 1.8 , to next main 1.29 , to selected 1.6
Mega commit to bring RELENG_18 branch in sync with HEAD in preparation
for release tagging 18d.

Revision 1.11 - (view) (annotate) - [select for diffs]
Sun Jun 17 19:12:34 2001 UTC (12 years, 10 months ago) by pw
Branch: MAIN
CVS Tags: LINKAGE_TABLE, PRE_LINKAGE_TABLE, UNICODE-BASE, cold-pcl-base, release-18e, release-18e-base, release-18e-pre1, release-18e-pre2
Branch point for: UNICODE-BRANCH, cold-pcl, release-18e-branch
Changes since 1.10: +7 -6 lines
Diff to previous 1.10 , to selected 1.6
From eric Marsden:

Fix some error types to be ANSI compliant.

Revision 1.10 - (view) (annotate) - [select for diffs]
Sun Mar 4 23:37:33 2001 UTC (13 years, 1 month ago) by pw
Branch: MAIN
Changes since 1.9: +3 -1 lines
Diff to previous 1.9 , to selected 1.6
A few well placed inhibit-warnings declarations to suppress noise in
compile-lisp.log. Only 46/130 notes left.

Revision 1.8.2.1 - (view) (annotate) - [select for diffs]
Tue Jun 23 11:22:32 1998 UTC (15 years, 9 months ago) by pw
Branch: RELENG_18
CVS Tags: RELEASE_18b, RELEASE_18c
Changes since 1.8: +4 -3 lines
Diff to previous 1.8 , to selected 1.6
This (huge) revision brings the RELENG_18 branch up to the current HEAD.
Note code/unix-glib2.lisp not yet included -- not sure it is ready to go.

Revision 1.9 - (view) (annotate) - [select for diffs]
Fri Feb 13 16:09:42 1998 UTC (16 years, 2 months ago) by dtc
Branch: MAIN
Changes since 1.8: +4 -3 lines
Diff to previous 1.8 , to selected 1.6
ANSI CL compat. changes:
o Add an optional environment argument to constantp; ignored by CMUCL.
o Add the :element-type keyword to make-string.

Revision 1.8 - (view) (annotate) - [select for diffs]
Fri Jul 12 18:55:24 1996 UTC (17 years, 9 months ago) by ram
Branch: MAIN
CVS Tags: RELEASE_18a
Branch point for: RELENG_18
Changes since 1.7: +10 -7 lines
Diff to previous 1.7 , to selected 1.6
Merged DTC's patch to string<>=*-body which fixes various problems that arose
when :start2 :end2 values were specified.

Revision 1.7 - (view) (annotate) - [select for diffs]
Mon Oct 31 04:11:27 1994 UTC (19 years, 5 months ago) by ram
Branch: MAIN
Changes since 1.6: +1 -3 lines
Diff to previous 1.6
Fix headed boilerplate.

Revision 1.6 - (view) (annotate) - [selected]
Fri May 15 17:50:40 1992 UTC (21 years, 11 months ago) by wlott
Branch: MAIN
Changes since 1.5: +2 -2 lines
Diff to previous 1.5
Removed an extra ``)''.

Revision 1.5 - (view) (annotate) - [select for diffs]
Tue May 28 17:25:48 1991 UTC (22 years, 10 months ago) by ram
Branch: MAIN
Changes since 1.4: +7 -14 lines
Diff to previous 1.4 , to selected 1.6
Changed STRING-xxxCASE to not assign arguments.

Revision 1.4 - (view) (annotate) - [select for diffs]
Wed Apr 24 23:37:42 1991 UTC (23 years ago) by ram
Branch: MAIN
Changes since 1.3: +102 -151 lines
Diff to previous 1.3 , to selected 1.6
Changed the WITH-xxx-STRINGs macros to use simply WITH-ARRAY-DATA, now that it
is more clever.  Also, changed it to accept any STRINGable thing, instead of
just strings and symbols.  These macros now bind the offset var instead of
randomly setting it.

Revision 1.3 - (view) (annotate) - [select for diffs]
Fri Feb 8 13:35:59 1991 UTC (23 years, 2 months ago) by ram
Branch: MAIN
Changes since 1.2: +8 -4 lines
Diff to previous 1.2 , to selected 1.6
New file header with RCS header FILE-COMMENT.

Revision 1.2 - (view) (annotate) - [select for diffs]
Fri Aug 24 18:14:26 1990 UTC (23 years, 7 months ago) by wlott
Branch: MAIN
Changes since 1.1: +23 -25 lines
Diff to previous 1.1 , to selected 1.6
Moved MIPS branch onto trunk; no merge necessary.

Revision 1.1.1.2 - (view) (annotate) - [select for diffs] (vendor branch)
Sun Jul 29 12:06:27 1990 UTC (23 years, 8 months ago) by wlott
Changes since 1.1.1.1: +4 -7 lines
Diff to previous 1.1.1.1 , to next main 1.29 , to selected 1.6
Fixed with-mumble-string(s) macros to not say (the simple-string foo) when
foo isn't a simple-string.

Revision 1.1.1.1 - (view) (annotate) - [select for diffs] (vendor branch)
Wed Apr 11 17:15:21 1990 UTC (24 years ago) by wlott
Changes since 1.1: +26 -25 lines
Diff to previous 1.1 , to selected 1.6
Initial MIPS version.

Revision 1.1 - (view) (annotate) - [select for diffs]
Tue Feb 6 17:27:06 1990 UTC (24 years, 2 months ago) by ram
Branch: MAIN
Diff to selected 1.6
Initial revision

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.

  Diffs between and
  Type of Diff should be a

Sort log by:

  ViewVC Help
Powered by ViewVC 1.1.5