Commits · 49f041ad84bf812b25d4fffc210da123400cb6f0 · cmucl / cmucl

May 25, 2013

Remove all the extensions to string-upcase and friends. The extended · 49f041ad

Raymond Toy authored May 25, 2013

functions now live in the new UNICODE package.

 src/code/exports.lisp::
 * Export some unicode functions and constants.

 src/code/string.lisp::
 * Removed the extended versions of string-upcase and friends.
 * Export surrogates function.
 * Make sure with-one-string is defined so the unicode package can use
   it.

 src/code/unicode.lisp:;
 * New file with extended versions of string-upcase and friends.

 src/code/unidata.lisp::
 * Export some unicode functions and constants.

 src/compiler/fndb.lisp::
 * Update defknowns for string-upcase and friends.

 src/tools/worldbuild.lisp::
 * Build unicode.lisp

 src/tools/worldcom.lisp::
 * Load unicode.lisp

49f041ad

Mar 05, 2013
- Update to Unicode 6.2. · 424edfe8
  Raymond Toy authored Mar 04, 2013
```
Still needs work because the word-break tests fail.
```
  424edfe8
Jan 17, 2013

Fix ticket:69 · ce037e96

Raymond Toy authored Jan 16, 2013

Change *unidata-path* to be a pathname object instead of a namestring.

ce037e96

Nov 18, 2012
- Revert changes to unicode-complete; can't complete #\Hangul_syllable_ · ae862666
  Raymond Toy authored Nov 18, 2012
```
but the old version could.  This unfixes Trac #52.
```
  ae862666
Mar 03, 2012
- Return NIL if the prefix isn't a prefix of any name instead of · e32ec7c0
  Raymond Toy authored Mar 03, 2012
```
signaling an error.
```
  e32ec7c0
Feb 05, 2012

Update to Unicode 6.1.0. · 537cc9bb

Raymond Toy authored Feb 04, 2012

  src/code/unidata.lisp:: Update Unicode version.

  src/tools/build-unidata.lisp:: Update Unicode version and update for
  change of format of NameAliases.txt.

  src/i18n/unidata.bin:: Updated with new data.

  src/general-info/release-20d.txt:: Updated.

  src/i18n/BidiMirroring.txt:: Updated to Unicode 6.1.0.
  src/i18n/CaseFolding.txt:: Updated to Unicode 6.1.0.
  src/i18n/CompositionExclusions.txt:: Updated to Unicode 6.1.0.
  src/i18n/DerivedNormalizationProps.txt:: Updated to Unicode 6.1.0.
  src/i18n/NameAliases.txt:: Updated to Unicode 6.1.0.
  src/i18n/NormalizationCorrections.txt:;
  src/i18n/SpecialCasing.txt:: Updated to Unicode 6.1.0.
  src/i18n/UnicodeData.txt:: Updated to Unicode 6.1.0.
  src/i18n/WordBreakProperty.txt:: Updated to Unicode 6.1.0.
  src/i18n/tests/NormalizationTest.txt:: Updated to Unicode 6.1.0.
  src/i18n/tests/WordBreakTest.txt:: Updated to Unicode 6.1.0.

537cc9bb

Feb 01, 2012
- Minor fix from Paul: avoid capitalizing mid-name in the completions · 53cb30ad
  Raymond Toy authored Jan 31, 2012
```
list.
```
  53cb30ad
- Fix ticket:52. · 768d6a34
  Raymond Toy authored Jan 31, 2012
```
Thanks to Paul Foley for rewriting {{{UNICODE-COMPLETE-NAME}}} to make
it work.
```
  768d6a34
Nov 04, 2011
- Rearrange directory structure. · a9961276
  Raymond Toy authored Nov 03, 2011
  
  a9961276
Sep 25, 2011

Fix ticket:49. In every file-comment, replace the existing $Header$ · 99a5797f

Raymond Toy authored Sep 24, 2011

entries with just the file path, removing the revision number, date,
author and state. The actual information is now computed during
compilation and stored in the fasl itself. (See ticket:48.)

99a5797f

Jun 27, 2011

Update to Unicode 6.0.0. · 7aa8a23e

rtoy authored Jun 27, 2011


code/unidata.lisp:
o Update unicode version to 6.0.0
o Add pointer to build-unidata.lisp.
tools/build-unidata.lisp:
o Update unicode version to 6.0.0
o Print out directory path so we can see where we're getting the data
  from.


i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt:
o Update with new files from unicode.org.

7aa8a23e

Jun 10, 2011

Add function to load all unicode data into memory. · 55d7f671

rtoy authored Jun 10, 2011

This makes it easy to make an executable image that doesn't need
unidata.bin around.  (Should we do this for normal cores?  It seems to
add about 1 MB to the core size.)

code/unidata.lisp:
o Add LOAD-ALL-UNICODE-DATA to load all unicode data.
o Add UNICODE-DATA-LOADED-P to check that unicode data has been
  loaded.

code/print.lisp:
o If unicode data is loaded, don't check for existence of
  *unidata-path*, because we don't need it.

code/exports.lisp:
o Export LOAD-ALL-UNICODE-DATA.

general-info/release-20c.txt:
o Update info

55d7f671

May 31, 2011

Add -unidata option to specify unidata.bin file. · d9b73849

rtoy authored May 31, 2011

This change requires a cross-compile.  Use boot-2011-04-01-cross.lisp
as the cross-compile script.

bootfiles/20b/boot-2011-04-01-cross.lisp:
o New cross-compile bootstrap file

lisp/lisp.c:
o Recognize -unidata option and setup *UNIDATA-PATH* appropriately.

code/commandline.lisp:
o Add defswitch for unidata so we don't get complaints about unknown
  switch.

code/unidata.lisp:
o Rename +UNIDATA-PATH+ to *UNIDATA-PATH*, since it's not a constant
  anymore.
o Update code to use new name.

code/print.lisp:
o Update code to use *UNIDATA-PATH*

compiler/sparc/parms.lisp:
o Add *UNIDATA-PATH* to list of static symbols.
o Add back in spare-9 and spare-8 static symbols since we need to do a
  cross-compile for this change anyway.

compiler/x86/parms.lisp:
o Add *UNIDATA-PATH* to list of static symbols.
o Reorder the static symbols in a more logical arrangment so that the
  spare symbols are at the end.

i18n/local/cmucl.pot:
o Update

d9b73849

Apr 02, 2011
- Remove extra right parenthesis. · 9fba19dc
  rtoy authored Apr 02, 2011
  
  9fba19dc
Feb 23, 2011

Fix bug where cmucl was no longer recognizing things like · 23fafac4

rtoy authored Feb 23, 2011

#\latin_small_letter_a.  This failure is caused by the new
SEARCH-DICTIONARY function that does partial completion, and
UNICODE-NAME-TO-CODEPOINT function wan't aware of the new way.

We could change UNICODE-NAME-TO-CODEPOINT to do the appropriate thing
with the new way, but I (rtoy) decided it would be nice to have the
old function around too.  Hence, restore the old version and use it.

23fafac4

Sep 29, 2010

Add a function to create the key from two codepoints that can be used · 7f578279

rtoy authored Sep 29, 2010

as the key for the composition table.  That way the logic is in
exactly one place and not spread out through the code.

7f578279

Sep 21, 2010
- When there's more than one possible completion, we need to keep the · 4223c351
  rtoy authored Sep 21, 2010
```
original completions along with the extensions.
```
  4223c351
Sep 20, 2010

Was mishandling the case where there are no more completions. In this · 74a64db3

rtoy authored Sep 20, 2010

case we were returning the prefix string, but that would be incorrect
if the prefix string is not a valid character. So check that it is
valid and return it. Otherwise do nothing (thereby returning nil) so
slime can note the character is invalid.

74a64db3

Improve completion of Hangul syllables and CJK unified ideographs some · a57bbc1c
rtoy authored Sep 20, 2010
```
more and fix some bugs in previous change.
```
a57bbc1c

Sep 19, 2010

o Move %STR, %STRX and %MATCH around so that we can inline them · 119f21c7

rtoy authored Sep 19, 2010

  (because they're so simple).
o Add some comments for %STR.
o Change implementation of %MATCH to be simpler and add comments on
  why we do what we do and explain what happens if we don't.
o Handle completion of Hangul syllables better:
  - Match "Hangul_S" instead of "Hangul_Syllable" because there's
    #\Hangul_Single_Dot_Tone_Mark.
  - If we match "Hangul_S", try to complete some Hangul syllables so
    we don't fool slime into thinking "Hangul_Syllable_" is the only
    completion.  There are obviously more.
o Handle completion of CJK Unified Ideographs better by trying to
  complete more so slime isn't fooled into thinking
  "CJK_Unified_Ideograph-" is the only possible completion.

119f21c7

o Construction of the Hangul syllable codebook was wrong. To satisfy · dc4cdb68

rtoy authored Sep 19, 2010

  the constraints on the codebook, we just sort them in descreasing
  order of length.
o In %MIP, it might happen that MISMATCH returns NIL, which means a
  match.  In this case, don't change the position.

dc4cdb68

Sep 18, 2010

Some Hangul syllables were left out of the Hangul syllable dictionary. · f2065a91
rtoy authored Sep 18, 2010
```
Redo this by looping over all codepoints and selecting the codepoints
that are Hangul syllables.
```
f2065a91

code/unidata.lisp: · 820f2554

rtoy authored Sep 18, 2010

o Update constants to Unicode version 5.2.0.

i18n/unidata.bin:
o Regenerated using Unicode version 5.2.0.

820f2554

code/unidata.lisp: · 3d1d8295

rtoy authored Sep 18, 2010

o Just add some comments on why we don't put the dictionaries in
  unidata.bin.
o Print out some messages when building the hangul and cjk
  dictionaries so the user knows what's happening.

tools/build-unidata.lisp:
o Add some comments on the various parts of unidata.bin.

3d1d8295

Sep 17, 2010

exports.lisp: · 9563cc0b

rtoy authored Sep 17, 2010

o Export STRING-TO-NFC, UNICODE-COMPLETE, and UNICODE-COMPLETE-NAME.

unidata.lisp:
o Add explicit exports.

9563cc0b

Optimize the completion of the Hangul syllables and the CJK unified · d4b307df

rtoy authored Sep 17, 2010

ideographs by using dictionaries.  (Should these dictionaries be part
of unidata.bin so they don't have to be built at run time?  One the
one hand, it makes things simpler, but unnecessarily bloats
unidata.in.  I suspect the hangul syllables and cjk ideographs
characters not not used very often.)

o Change NODE-NEXT and CLOSE-NODE to have an optional parameter for
  the dictionary to use.
o Update UNICODE-COMPLETE-NAME to pass the dictionary to NODE-NEXT and
  CLOSE-NODE.
o Update UNICODE-COMPLETE to use the hangul syllable dictionary and
  the cjk ideograph dictionary when searching.
o Fix typo in UNICODE-COMPLETE.
o Add defvars for dictionaries for hangul syllables and cjk
  ideographs.
o Add functions to build the hangul and cjk dictionaries.
o Steal the implementations of BUILD-DICTIONARY, NAME-LOOKUP, and
  ENCODE-NAME from tools/build-unidata.lisp.

d4b307df

Add support for character completion. This is primarily intended to · d4b888a2

rtoy authored Sep 17, 2010

support character completion for slime.  The implementation is from
Paul Foley, but some slight modifications by Raymond Toy to handle a
few corner cases.

o Modify SEARCH-DICTIONARY to take optional current and posn
  parameters so that SEARCH-DICTIONARY can be started from a different
  place.
o Add UNICODE-COMPLETE, which is the main function for character name
  completion.
o Add other support functions for UNICODE-COMPLETE.

d4b888a2

o Fix typo in UNICODE-DECOMP. (It's hangul-syllable-p, not · 34af3581

rtoy authored Sep 17, 2010

  hangule-syllable-p.)
o Move the computation of *reverse-hangule-choseong*,
  *reverse-hangul-jungseong*, and *reverse-hangul-jongseong* to its
  own routine.  Call it in UNICODE-NAME-TO-CODEPOINT.

34af3581

Sep 15, 2010

Pull out the range tests for CJK Ideographs and Hangul Syllables and · 6692aa7e
rtoy authored Sep 15, 2010
```
put the tests into their own functions so that the limits are on one
place.
```
6692aa7e

Add support for Unicode 5.2. The normalization and wordbreak tests pass. · d2b9eace

rtoy authored Sep 15, 2010

code/string.lisp:
o In %compose, handle the case where the composite character is
  outside the BMP and thus needs special handling for our UTF-16
  strings.

code/unidata.lisp
o CKJ Ideograph range has changed in 5.2.
o Fix bug in build-composition-table.  We were not correctly handling
  the case where the decomposition of a codepoint was outside the
  BMP.  Special care is needed to handle the UTF-16 strings that we
  use.
o The key for the pairwise composition table are the full codepoints,
  so we need to shift one by 21 bits instead of 16.

tools/build-unidata.lisp
o Update minor version to 2.

i18n/BidiMirroring.txt
i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt
o Updated from Unicode 5.2.

i18n/unidata.bin
o Regenerated from new Unicode 5.2 files.

d2b9eace

Apr 20, 2010
- Change uses of _"foo" to (intl:gettext "foo"). This is because slime · a6577064
  rtoy authored Apr 20, 2010
```
may get confused with source locations if the reader macros are
installed.
```
  a6577064
Mar 19, 2010

Merge intl-branch 2010-03-18 to HEAD. To build, you need to use · d8544caa

rtoy authored Mar 19, 2010

boot-2010-02-1 as the bootstrap file.  You should probably also use
the new -P option for build.sh to generate and update the po files
while building.

d8544caa

Sep 11, 2009

tools/build-unidata.lisp: · bf4b37ac

rtoy authored Sep 11, 2009

o Add support for word break properties.
o Some cleanup of the code including moving the common code in
  write-ntrie* to write-ntrie.

code/unidata.lisp:
o Add support for word break properties.
o UNICODE-WORD-BREAK-CODE and UNICODE-WORD-BREAK return the property
  code and the property keyword for a codepoint, respectively.

i18n/WordBreakProperty.txt:
o New file for the word break properties.

bf4b37ac

Jul 10, 2009

unidata.lisp: · 176f40f7

rtoy authored Jul 10, 2009

o Add *unidata-version* to hold our revision number.

save.lisp:
o Add Unicode to the herald items.  Just print out the unidata version
  along with the supported Unicode UCD version.

176f40f7

Jul 02, 2009

boot-2009-07.lisp: · 67fc4ac5

rtoy authored Jul 02, 2009

o Bootstrap file needed to compile this change (because the current
  shrink-vector derive-type optimizer didn't handle union types).

compiler/fndb.lisp:
o Make the compiler warn if the result of lisp::shrink-vector is not
  used.  This is a problem because the compiler doesn't know that
  shrink-vector destructively modifies the length of a vector.  As a
  partial solution, warn the user if the result of shrink-vector is
  not.

code/hash-new.lisp:
code/seq.lisp:
o Make sure the result of shrink-vector is used, to get rid of a new
  compiler warning.

code/unidata.lisp:
o Modify %unicode-full-case so that it doesn't use shrink-vector
  anymore.

compiler/seqtran.lisp:
o Fix shrink-vector derive-type optimizer to handle union types.

tools/build-unidata.lisp:
o Fix typo that someone got in.
o Make sure the result of shrink-vector is used, to get rid of a new
  compiler warning.

67fc4ac5

Jun 16, 2009

code/string.lisp: · a826481f

rtoy authored Jun 16, 2009

o Only define STRING-TO-NFD, STRING-TO-NFKD, and STRING-TO-NFKC for
  Unicode builds.  Conditionalize out their support functions too.
o Update export list to be conditional on Unicode too.
o Use new name for get-pairwise-composition.

code/exports.lisp:
o Update export list to be conditional on Unicode for above changes
  in string.lisp.

code/unidata.lisp:
o Change name from GET-PAIRWISE-COMPOSITION to
  UNICODE-PAIRWISE-COMPOSITION to match other Unicode function names.

a826481f

Jun 11, 2009
- Merge Unicode work to trunk. From label · 68ac9a3e
  rtoy authored Jun 11, 2009
```
unicode-utf16-extfmt-2009-06-11.
```
  68ac9a3e