Commits · 49f041ad84bf812b25d4fffc210da123400cb6f0 · cmucl / cmucl

May 25, 2013

Remove all the extensions to string-upcase and friends. The extended · 49f041ad

Raymond Toy authored May 25, 2013

functions now live in the new UNICODE package.

 src/code/exports.lisp::
 * Export some unicode functions and constants.

 src/code/string.lisp::
 * Removed the extended versions of string-upcase and friends.
 * Export surrogates function.
 * Make sure with-one-string is defined so the unicode package can use
   it.

 src/code/unicode.lisp:;
 * New file with extended versions of string-upcase and friends.

 src/code/unidata.lisp::
 * Export some unicode functions and constants.

 src/compiler/fndb.lisp::
 * Update defknowns for string-upcase and friends.

 src/tools/worldbuild.lisp::
 * Build unicode.lisp

 src/tools/worldcom.lisp::
 * Load unicode.lisp

49f041ad

May 21, 2013
- Don't reverse surrogate pairs in strings; it's not allowed by the · d879838d
  Raymond Toy authored May 20, 2013
```
CLHS.
```
  d879838d
May 19, 2013

Fix ticket:81 and fix ticket:83. · 78cce51d

Raymond Toy authored May 18, 2013

From ticket 81, the tests are now:

{{{
(time (prog1 t (time-rev *s*)))
; Evaluation took:
;   0.49 seconds of real time
;   0.481813 seconds of user run time
;   0.003624 seconds of system run time
;   1,490,776,936 CPU cycles
;   [Run times include 0.13 seconds GC run time]
;   0 page faults and
;   200,073,704 bytes consed.

(time (prog1 t (time-rev *s2*)))
; Evaluation took:
;   0.97 seconds of real time
;   0.965893 seconds of user run time
;   0.005139 seconds of system run time
;   2,980,415,911 CPU cycles
;   [Run times include 0.23 seconds GC run time]
;   0 page faults and
;   400,005,560 bytes consed.
}}}

So the new string-reverse* is 20 times faster for strings without
surrogates and 10 times faster for strings containing only surrogates.

78cce51d

Mar 06, 2013
- Reindent STRING-NEXT-WORD-BREAK neatly. · e129c45a
  Raymond Toy authored Mar 06, 2013
  
  e129c45a
- Implement Rule WB13c for regional indicators. · cae10dd1
  Raymond Toy authored Mar 06, 2013
  
  cae10dd1
Feb 04, 2012

Update docstring for {{{STRING-CAPITALIZE}}} to mention · e7900f28

Raymond Toy authored Feb 03, 2012

{{{:UNICODE-WORD-BREAK}}} keyword parameter that enables the Unicode
word-breaking algorithm to determine word boundaries.

e7900f28

Nov 04, 2011
- Rearrange directory structure. · a9961276
  Raymond Toy authored Nov 03, 2011
  
  a9961276
Sep 25, 2011

Fix ticket:49. In every file-comment, replace the existing $Header$ · 99a5797f

Raymond Toy authored Sep 24, 2011

entries with just the file path, removing the revision number, date,
author and state. The actual information is now computed during
compilation and stored in the fasl itself. (See ticket:48.)

99a5797f

Oct 26, 2010
- Add trailing newline. · c149672c
  rtoy authored Oct 26, 2010
  
  c149672c
Oct 13, 2010

Some changes to replace calls to gettext with _"" or _N"" for things · b22644d4

rtoy authored Oct 13, 2010

compiled with and without Unicode.  This is needed so that the pot
files have the same content for both unicode and non-unicode builds.
(The _"" and _N"" are handled by the reader, so things that are
conditionalized out still get processed, unlike using gettext.)

b22644d4

Sep 20, 2010

o Inhibit warnings from SURROGATEP; I'm tired seeing the code deletion · c688af8f

rtoy authored Sep 20, 2010

  notes now.
o Tell the compiler what type the first return value of CODEPOINT is.
  Apparently, the compiler can't figure that out itself.

c688af8f

Sep 15, 2010

Add support for Unicode 5.2. The normalization and wordbreak tests pass. · d2b9eace

rtoy authored Sep 15, 2010

code/string.lisp:
o In %compose, handle the case where the composite character is
  outside the BMP and thus needs special handling for our UTF-16
  strings.

code/unidata.lisp
o CKJ Ideograph range has changed in 5.2.
o Fix bug in build-composition-table.  We were not correctly handling
  the case where the decomposition of a codepoint was outside the
  BMP.  Special care is needed to handle the UTF-16 strings that we
  use.
o The key for the pairwise composition table are the full codepoints,
  so we need to shift one by 21 bits instead of 16.

tools/build-unidata.lisp
o Update minor version to 2.

i18n/BidiMirroring.txt
i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt
o Updated from Unicode 5.2.

i18n/unidata.bin
o Regenerated from new Unicode 5.2 files.

d2b9eace

Sep 13, 2010
- Add function to convert a sequence of codepoints to a string and a · 022d6fd1
  rtoy authored Sep 13, 2010
```
function to convert a string to a list of codepoints.
```
  022d6fd1
Apr 20, 2010
- Change uses of _"foo" to (intl:gettext "foo"). This is because slime · a6577064
  rtoy authored Apr 20, 2010
```
may get confused with source locations if the reader macros are
installed.
```
  a6577064
Apr 19, 2010
- Remove _N"" reader macro from docstrings when possible. · d671ee6c
  rtoy authored Apr 19, 2010
  
  d671ee6c
Mar 19, 2010

Merge intl-branch 2010-03-18 to HEAD. To build, you need to use · d8544caa

rtoy authored Mar 19, 2010

boot-2010-02-1 as the bootstrap file.  You should probably also use
the new -P option for build.sh to generate and update the po files
while building.

d8544caa

Oct 18, 2009

Merge changes from unicode-string-buffer-impl-branch which gives · 392d3e59

rtoy authored Oct 18, 2009

faster reads on external-formats.  This is done by adding an
additional buffer to streams so we can convert the entire in-buffer
into characters all at once.

To build this change, you need to do a cross-compile using
boot-2009-10-1-cross.lisp.  Using that build, do a normal build with
these sources.

For a non-unicode build use boot-2009-10-01.lisp with a 20a
non-unicode build.

code/extfmts.lisp:
o Add another slot to the extfmts for copying the state.
o Modify EF-OCTETS-TO-STRING and OCTETS-TO-STRING to support the
  necesssary changes for fast formats.  This is incompatible with the
  previous version because the string is not grown if needed.

code/fd-stream-extfmt.lisp:
o Set *enable-stream-buffer-p* to T so we have fast streams.

code/fd-stream.lisp:
o Add new slots to support fast strams.
o In SET-ROUTINES, initialize the new slots appropriately.
o Update UNREAD-CHAR to be able to back up in the string buffer to
  unread.
o Add implementation to copy the state of an external format.

code/stream.lisp:
o Change %SET-FD-STREAM-EXTERNAL-FORMAT to be able to change formats
  even if we've already converted the buffer with a different format.
  We reconvert the buffer with the old format until we reach the
  current character.  Then the remaining octets are converted using
  the new format and stored in the string buffer.
o Add FAST-READ-CHAR-STRING-REFILL to refill the string buffer, like
  FAST-READ-CHAR-REFILL does for the octet in-buffer.

code/struct.lisp:
o Add new slots to hold the string buffer, the current index, and
  length.  These are needed for the fast formats.

code/sysmacs.lisp:
o Update PREPARE-FOR-FAST-READ-CHAR, DONE-WITH-FAST-READ-CHAR, and
  FAST-READ-CHAR to support the string buffer.

code/string.lisp:
o Microoptimization of SURROGATEP to reduce the number of branchs.

general-info/release-20b.txt:
o Update with these changes

pcl/simple-streams/external-formats/utf-16-be.lisp:
pcl/simple-streams/external-formats/utf-16-le.lisp:
pcl/simple-streams/external-formats/utf-16.lisp:
o These formats actually have state, so update them to take a handle
  an initial state.  These are needed if the string buffer ends with a
  leading surrogate and the next string buffer starts with a trailing
  surrogate.  The conversion needs to combine the surrogates together.

392d3e59

Sep 15, 2009

Oops. Remove old code that didn't support our UTF-16 strings. · 5a8aa73a
rtoy authored Sep 15, 2009

5a8aa73a

Add support for the Unicode word break algorithm for · fc0eb65b

rtoy authored Sep 15, 2009

STRING-CAPITALIZE.  Not sure about the appropriate interface, though.

code/string.lisp:
o Add Unicode word break algorithm.  Based on Scheme code by William
  Clinger.  Used with permission.
o Update STRING-CAPITALIZE to take another keyword arg to indicate if
  we should use the Unicode word break algorithm.  Default is not to
  use the Unicode algorithm.

compiler/fndb.lisp:
o Update defknown for string-capitalize.

i18n/tests/WordBreakTest.txt:
o New test file for the word break algorithm

i18n/tests/word-break-test.lisp:
o New file to run the word break test.

fc0eb65b

Aug 17, 2009
- Use more descriptive argument names for SURROGATEP and · eafa848d
  rtoy authored Aug 17, 2009
```
SURROGATES-TO-CODEPOINT.
```
  eafa848d
Aug 10, 2009
- Oops. utf16-string-p was returning NIL if the codepoint was · f3c86096
  rtoy authored Aug 10, 2009
```
assigned.  It should return NIL if the codepoint is NOT assigned.
```
  f3c86096
Jul 13, 2009
- Clean up a few compiler warnings about unused variables. · 7f4c6202
  rtoy authored Jul 13, 2009
  
  7f4c6202
Jun 16, 2009

Cleanups for non-unicode build. · 8f28c28f

rtoy authored Jun 16, 2009

code/stream.lisp:
o Only define (setf stream-external-format) for Unicode builds.
o In stream-external-format, don't try to look up the external format
  from the fd-stream structure, which doesn't exist in non-unicode
  builds.

code/strings.lisp:
o Conditionalize out things that will only work if unicode is
  available.

tools/worldcom.lisp:
o Only compile fd-stream-extfmt for unicode builds.

8f28c28f

code/string.lisp: · a826481f

rtoy authored Jun 16, 2009

o Only define STRING-TO-NFD, STRING-TO-NFKD, and STRING-TO-NFKC for
  Unicode builds.  Conditionalize out their support functions too.
o Update export list to be conditional on Unicode too.
o Use new name for get-pairwise-composition.

code/exports.lisp:
o Update export list to be conditional on Unicode for above changes
  in string.lisp.

code/unidata.lisp:
o Change name from GET-PAIRWISE-COMPOSITION to
  UNICODE-PAIRWISE-COMPOSITION to match other Unicode function names.

a826481f

Jun 11, 2009
- Merge Unicode work to trunk. From label · 68ac9a3e
  rtoy authored Jun 11, 2009
```
unicode-utf16-extfmt-2009-06-11.
```
  68ac9a3e
Apr 11, 2003
- Instead of ignoring the :element-type argument to MAKE-STRING, we check · 8b84985a
  emarsden authored Apr 11, 2003
```
that it's a valid subtype of character (then ignore it).
```
  8b84985a
Jun 17, 2001
- From eric Marsden: · c840823b
  pw authored Jun 17, 2001
```
Fix some error types to be ANSI compliant.
```
  c840823b
Mar 04, 2001
- A few well placed inhibit-warnings declarations to suppress noise in · b641a186
  pw authored Mar 04, 2001
```
compile-lisp.log. Only 46/130 notes left.
```
  b641a186
Feb 13, 1998

ANSI CL compat. changes: · 2e5e2342

dtc authored Feb 13, 1998

o Add an optional environment argument to constantp; ignored by CMUCL.
o Add the :element-type keyword to make-string.

2e5e2342

Jul 12, 1996
- Merged DTC's patch to string<>=*-body which fixes various problems that arose · 6dd10a2f
  ram authored Jul 12, 1996
```
when :start2 :end2 values were specified.
```
  6dd10a2f
Oct 31, 1994
- Fix headed boilerplate. · afd6fcdc
  ram authored Oct 31, 1994
  
  afd6fcdc
Feb 11, 1994
- This commit was manufactured by cvs2svn to create branch 'solaris_patch'. · cc9e635e
  cvs2git authored Feb 11, 1994
  
  cc9e635e
Jan 13, 1993
- This commit was manufactured by cvs2svn to create branch 'new_struct'. · edac2d80
  cvs2git authored Jan 13, 1993
  
  edac2d80
May 15, 1992
- Removed an extra ``)''. · ec96116d
  wlott authored May 15, 1992
  
  ec96116d
May 28, 1991
- Changed STRING-xxxCASE to not assign arguments. · 90593ea0
  ram authored May 28, 1991
  
  90593ea0
Apr 24, 1991

Changed the WITH-xxx-STRINGs macros to use simply WITH-ARRAY-DATA, now that it · e4ae1805

ram authored Apr 24, 1991

is more clever.  Also, changed it to accept any STRINGable thing, instead of
just strings and symbols.  These macros now bind the offset var instead of
randomly setting it.

e4ae1805

Feb 08, 1991
- New file header with RCS header FILE-COMMENT. · 47bd4c98
  ram authored Feb 08, 1991
  
  47bd4c98
Jul 29, 1990
- Fixed with-mumble-string(s) macros to not say (the simple-string foo) when · 8537f9ea
  wlott authored Jul 29, 1990
```
foo isn't a simple-string.
```
  8537f9ea
May 30, 1990
- This commit was manufactured by cvs2svn to create branch 'new_constraint'. · 6a9d4112
  cvs2git authored May 30, 1990
  
  6a9d4112
Apr 11, 1990
- Initial MIPS version. · 0dc529da
  wlott authored Apr 11, 1990
  
  0dc529da