/[cmucl]/src/code/unidata.lisp
ViewVC logotype

Log of /src/code/unidata.lisp

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (annotate)
Sticky Tag:

Revision 1.28 - (view) (annotate) - [select for diffs]
Mon Jun 27 15:11:29 2011 UTC (2 years, 9 months ago) by rtoy
Branch: MAIN
CVS Tags: GIT-CONVERSION, HEAD, snapshot-2011-07, snapshot-2011-09
Changes since 1.27: +7 -4 lines
Diff to previous 1.27
Update to Unicode 6.0.0.


code/unidata.lisp:
o Update unicode version to 6.0.0
o Add pointer to build-unidata.lisp.
tools/build-unidata.lisp:
o Update unicode version to 6.0.0
o Print out directory path so we can see where we're getting the data
  from.


i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt:
o Update with new files from unicode.org.

Revision 1.27 - (view) (annotate) - [select for diffs]
Fri Jun 10 17:38:27 2011 UTC (2 years, 10 months ago) by rtoy
Branch: MAIN
Changes since 1.26: +76 -29 lines
Diff to previous 1.26
Add function to load all unicode data into memory.

This makes it easy to make an executable image that doesn't need
unidata.bin around.  (Should we do this for normal cores?  It seems to
add about 1 MB to the core size.)

code/unidata.lisp:
o Add LOAD-ALL-UNICODE-DATA to load all unicode data.
o Add UNICODE-DATA-LOADED-P to check that unicode data has been
  loaded.

code/print.lisp:
o If unicode data is loaded, don't check for existence of
  *unidata-path*, because we don't need it.

code/exports.lisp:
o Export LOAD-ALL-UNICODE-DATA.

general-info/release-20c.txt:
o Update info

Revision 1.26 - (view) (annotate) - [select for diffs]
Tue May 31 13:26:40 2011 UTC (2 years, 10 months ago) by rtoy
Branch: MAIN
CVS Tags: snapshot-2011-06
Changes since 1.25: +4 -4 lines
Diff to previous 1.25
Add -unidata option to specify unidata.bin file.

This change requires a cross-compile.  Use boot-2011-04-01-cross.lisp
as the cross-compile script.

bootfiles/20b/boot-2011-04-01-cross.lisp:
o New cross-compile bootstrap file

lisp/lisp.c:
o Recognize -unidata option and setup *UNIDATA-PATH* appropriately.

code/commandline.lisp:
o Add defswitch for unidata so we don't get complaints about unknown
  switch.

code/unidata.lisp:
o Rename +UNIDATA-PATH+ to *UNIDATA-PATH*, since it's not a constant
  anymore.
o Update code to use new name.

code/print.lisp:
o Update code to use *UNIDATA-PATH*

compiler/sparc/parms.lisp:
o Add *UNIDATA-PATH* to list of static symbols.
o Add back in spare-9 and spare-8 static symbols since we need to do a
  cross-compile for this change anyway.

compiler/x86/parms.lisp:
o Add *UNIDATA-PATH* to list of static symbols.
o Reorder the static symbols in a more logical arrangment so that the
  spare symbols are at the end.

i18n/local/cmucl.pot:
o Update

Revision 1.25 - (view) (annotate) - [select for diffs]
Sat Apr 2 20:11:31 2011 UTC (3 years ago) by rtoy
Branch: MAIN
CVS Tags: snapshot-2011-04
Changes since 1.24: +3 -4 lines
Diff to previous 1.24
Remove extra right parenthesis.

Revision 1.24 - (view) (annotate) - [select for diffs]
Wed Feb 23 03:02:33 2011 UTC (3 years, 1 month ago) by rtoy
Branch: MAIN
CVS Tags: snapshot-2011-03
Changes since 1.23: +49 -12 lines
Diff to previous 1.23
Fix bug where cmucl was no longer recognizing things like
#\latin_small_letter_a.  This failure is caused by the new
SEARCH-DICTIONARY function that does partial completion, and
UNICODE-NAME-TO-CODEPOINT function wan't aware of the new way.

We could change UNICODE-NAME-TO-CODEPOINT to do the appropriate thing
with the new way, but I (rtoy) decided it would be nice to have the
old function around too.  Hence, restore the old version and use it.

Revision 1.23 - (view) (annotate) - [select for diffs]
Wed Sep 29 20:51:19 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
CVS Tags: cross-sol-x86-2010-12-20, cross-sol-x86-base, cross-sol-x86-merged, cross-sparc-branch-base, snapshot-2010-11, snapshot-2010-12, snapshot-2011-01, snapshot-2011-02
Branch point for: cross-sol-x86-branch, cross-sparc-branch
Changes since 1.22: +15 -5 lines
Diff to previous 1.22
Add a function to create the key from two codepoints that can be used
as the key for the composition table.  That way the logic is in
exactly one place and not spread out through the code.

Revision 1.22 - (view) (annotate) - [select for diffs]
Tue Sep 21 00:57:29 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.21: +16 -12 lines
Diff to previous 1.21
When there's more than one possible completion, we need to keep the
original completions along with the extensions.

Revision 1.21 - (view) (annotate) - [select for diffs]
Mon Sep 20 01:17:14 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.20: +10 -4 lines
Diff to previous 1.20
Was mishandling the case where there are no more completions.  In this
case we were returning the prefix string, but that would be incorrect
if the prefix string is not a valid character.  So check that it is
valid and return it.  Otherwise do nothing (thereby returning nil) so
slime can note the character is invalid.

Revision 1.20 - (view) (annotate) - [select for diffs]
Mon Sep 20 00:59:22 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.19: +35 -52 lines
Diff to previous 1.19
Improve completion of Hangul syllables and CJK unified ideographs some
more and fix some bugs in previous change.

Revision 1.19 - (view) (annotate) - [select for diffs]
Sun Sep 19 23:07:46 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.18: +63 -28 lines
Diff to previous 1.18
o Move %STR, %STRX and %MATCH around so that we can inline them
  (because they're so simple).
o Add some comments for %STR.
o Change implementation of %MATCH to be simpler and add comments on
  why we do what we do and explain what happens if we don't.
o Handle completion of Hangul syllables better:
  - Match "Hangul_S" instead of "Hangul_Syllable" because there's
    #\Hangul_Single_Dot_Tone_Mark.
  - If we match "Hangul_S", try to complete some Hangul syllables so
    we don't fool slime into thinking "Hangul_Syllable_" is the only
    completion.  There are obviously more.
o Handle completion of CJK Unified Ideographs better by trying to
  complete more so slime isn't fooled into thinking
  "CJK_Unified_Ideograph-" is the only possible completion.

Revision 1.18 - (view) (annotate) - [select for diffs]
Sun Sep 19 02:37:10 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.17: +22 -11 lines
Diff to previous 1.17
o Construction of the Hangul syllable codebook was wrong.  To satisfy
  the constraints on the codebook, we just sort them in descreasing
  order of length.
o In %MIP, it might happen that MISMATCH returns NIL, which means a
  match.  In this case, don't change the position.

Revision 1.17 - (view) (annotate) - [select for diffs]
Sat Sep 18 21:38:10 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.16: +10 -14 lines
Diff to previous 1.16
Some Hangul syllables were left out of the Hangul syllable dictionary.
Redo this by looping over all codepoints and selecting the codepoints
that are Hangul syllables.

Revision 1.16 - (view) (annotate) - [select for diffs]
Sat Sep 18 21:10:42 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.15: +3 -3 lines
Diff to previous 1.15
code/unidata.lisp:
o Update constants to Unicode version 5.2.0.

i18n/unidata.bin:
o Regenerated using Unicode version 5.2.0.

Revision 1.15 - (view) (annotate) - [select for diffs]
Sat Sep 18 20:47:51 2010 UTC (3 years, 6 months ago) by rtoy
Branch: MAIN
Changes since 1.14: +21 -4 lines
Diff to previous 1.14
code/unidata.lisp:
o Just add some comments on why we don't put the dictionaries in
  unidata.bin.
o Print out some messages when building the hangul and cjk
  dictionaries so the user knows what's happening.

tools/build-unidata.lisp:
o Add some comments on the various parts of unidata.bin.

Revision 1.14 - (view) (annotate) - [select for diffs]
Fri Sep 17 23:29:01 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.13: +5 -2 lines
Diff to previous 1.13
exports.lisp:
o Export STRING-TO-NFC, UNICODE-COMPLETE, and UNICODE-COMPLETE-NAME.

unidata.lisp:
o Add explicit exports.

Revision 1.13 - (view) (annotate) - [select for diffs]
Fri Sep 17 22:41:26 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.12: +198 -42 lines
Diff to previous 1.12
Optimize the completion of the Hangul syllables and the CJK unified
ideographs by using dictionaries.  (Should these dictionaries be part
of unidata.bin so they don't have to be built at run time?  One the
one hand, it makes things simpler, but unnecessarily bloats
unidata.in.  I suspect the hangul syllables and cjk ideographs
characters not not used very often.)

o Change NODE-NEXT and CLOSE-NODE to have an optional parameter for
  the dictionary to use.
o Update UNICODE-COMPLETE-NAME to pass the dictionary to NODE-NEXT and
  CLOSE-NODE.
o Update UNICODE-COMPLETE to use the hangul syllable dictionary and
  the cjk ideograph dictionary when searching.
o Fix typo in UNICODE-COMPLETE.
o Add defvars for dictionaries for hangul syllables and cjk
  ideographs.
o Add functions to build the hangul and cjk dictionaries.
o Steal the implementations of BUILD-DICTIONARY, NAME-LOOKUP, and
  ENCODE-NAME from tools/build-unidata.lisp.

Revision 1.12 - (view) (annotate) - [select for diffs]
Fri Sep 17 15:59:45 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.11: +260 -41 lines
Diff to previous 1.11
Add support for character completion.  This is primarily intended to
support character completion for slime.  The implementation is from
Paul Foley, but some slight modifications by Raymond Toy to handle a
few corner cases.

o Modify SEARCH-DICTIONARY to take optional current and posn
  parameters so that SEARCH-DICTIONARY can be started from a different
  place.
o Add UNICODE-COMPLETE, which is the main function for character name
  completion.
o Add other support functions for UNICODE-COMPLETE.

Revision 1.11 - (view) (annotate) - [select for diffs]
Fri Sep 17 02:11:09 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.10: +25 -22 lines
Diff to previous 1.10
o Fix typo in UNICODE-DECOMP.  (It's hangul-syllable-p, not
  hangule-syllable-p.)
o Move the computation of *reverse-hangule-choseong*,
  *reverse-hangul-jungseong*, and *reverse-hangul-jongseong* to its
  own routine.  Call it in UNICODE-NAME-TO-CODEPOINT.

Revision 1.10 - (view) (annotate) - [select for diffs]
Wed Sep 15 23:32:06 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.9: +20 -13 lines
Diff to previous 1.9
Pull out the range tests for CJK Ideographs and Hangul Syllables and
put the tests into their own functions so that the limits are on one
place.

Revision 1.9 - (view) (annotate) - [select for diffs]
Wed Sep 15 21:06:38 2010 UTC (3 years, 7 months ago) by rtoy
Branch: MAIN
Changes since 1.8: +31 -21 lines
Diff to previous 1.8
Add support for Unicode 5.2.  The normalization and wordbreak tests pass.

code/string.lisp:
o In %compose, handle the case where the composite character is
  outside the BMP and thus needs special handling for our UTF-16
  strings.

code/unidata.lisp
o CKJ Ideograph range has changed in 5.2.
o Fix bug in build-composition-table.  We were not correctly handling
  the case where the decomposition of a codepoint was outside the
  BMP.  Special care is needed to handle the UTF-16 strings that we
  use.
o The key for the pairwise composition table are the full codepoints,
  so we need to shift one by 21 bits instead of 16.

tools/build-unidata.lisp
o Update minor version to 2.

i18n/BidiMirroring.txt
i18n/CaseFolding.txt
i18n/CompositionExclusions.txt
i18n/DerivedNormalizationProps.txt
i18n/NameAliases.txt
i18n/NormalizationCorrections.txt
i18n/SpecialCasing.txt
i18n/UnicodeData.txt
i18n/WordBreakProperty.txt
i18n/tests/NormalizationTest.txt
i18n/tests/WordBreakTest.txt
o Updated from Unicode 5.2.

i18n/unidata.bin
o Regenerated from new Unicode 5.2 files.

Revision 1.8.4.1 - (view) (annotate) - [select for diffs]
Tue Sep 14 05:58:01 2010 UTC (3 years, 7 months ago) by rtoy
Branch: RELEASE-20B-BRANCH
CVS Tags: RELEASE_20b
Changes since 1.8: +15 -7 lines
Diff to previous 1.8 , to next main 1.28
UNICODE-NAME-TO-CODEPOINT was incorrectly accepting any value after
#\cjk_unified_ideograph-nnnn and returning the character whose code
was nnnn. This is wrong.

o Add a new function to check for valid ranges for CJK unified
  ideographs.
o Use it in UNICODE-NAME-TO-CODEPOINT and UNICODE-NAME.

Revision 1.8 - (view) (annotate) - [select for diffs]
Tue Apr 20 17:57:45 2010 UTC (3 years, 11 months ago) by rtoy
Branch: MAIN
CVS Tags: release-20b-pre1, release-20b-pre2, snapshot-2010-05, snapshot-2010-06, snapshot-2010-07, snapshot-2010-08, sparc-tramp-assem-2010-07-19, sparc-tramp-assem-base
Branch point for: RELEASE-20B-BRANCH, sparc-tramp-assem-branch
Changes since 1.7: +5 -5 lines
Diff to previous 1.7
Change uses of _"foo" to (intl:gettext "foo").  This is because slime
may get confused with source locations if the reader macros are
installed.

Revision 1.7 - (view) (annotate) - [select for diffs]
Fri Mar 19 15:19:00 2010 UTC (4 years, 1 month ago) by rtoy
Branch: MAIN
CVS Tags: post-merge-intl-branch, snapshot-2010-04
Changes since 1.6: +6 -5 lines
Diff to previous 1.6
Merge intl-branch 2010-03-18 to HEAD.  To build, you need to use
boot-2010-02-1 as the bootstrap file.  You should probably also use
the new -P option for build.sh to generate and update the po files
while building.

Revision 1.6.10.1 - (view) (annotate) - [select for diffs]
Thu Feb 25 20:34:52 2010 UTC (4 years, 1 month ago) by rtoy
Branch: intl-2-branch
Changes since 1.6: +6 -5 lines
Diff to previous 1.6 , to next main 1.28
Restart internalization work.  This new branch starts with code from
the intl-branch on date 2010-02-12 18:00:00+0500.  This version works
and

LANG=en@piglatin bin/lisp

works (once the piglatin translation is added).

Revision 1.6.8.2 - (view) (annotate) - [select for diffs]
Wed Feb 10 04:01:27 2010 UTC (4 years, 2 months ago) by rtoy
Branch: intl-branch
CVS Tags: intl-branch-2010-03-18-1300, intl-branch-working-2010-02-11-1000, intl-branch-working-2010-02-19-1000
Changes since 1.6.8.1: +5 -5 lines
Diff to previous 1.6.8.1 , to branch point 1.6 , to next main 1.28
Mark translatable strings; update cmucl.pot and ko/cmucl.po
accordingly.

Revision 1.6.8.1 - (view) (annotate) - [select for diffs]
Mon Feb 8 17:15:49 2010 UTC (4 years, 2 months ago) by rtoy
Branch: intl-branch
Changes since 1.6: +3 -2 lines
Diff to previous 1.6
Add (intl:textdomain "cmucl") to the files to set the textdomain.

Revision 1.6 - (view) (annotate) - [select for diffs]
Fri Sep 11 16:22:35 2009 UTC (4 years, 7 months ago) by rtoy
Branch: MAIN
CVS Tags: amd64-dd-start, intl-2-branch-base, intl-branch-base, pre-merge-intl-branch, snapshot-2009-11, snapshot-2009-12, snapshot-2010-01, snapshot-2010-02, snapshot-2010-03, unicode-string-buffer-base, unicode-string-buffer-impl-base
Branch point for: amd64-dd-branch, intl-2-branch, intl-branch, unicode-string-buffer-branch, unicode-string-buffer-impl-branch
Changes since 1.5: +24 -2 lines
Diff to previous 1.5
tools/build-unidata.lisp:
o Add support for word break properties.
o Some cleanup of the code including moving the common code in
  write-ntrie* to write-ntrie.

code/unidata.lisp:
o Add support for word break properties.
o UNICODE-WORD-BREAK-CODE and UNICODE-WORD-BREAK return the property
  code and the property keyword for a codepoint, respectively.

i18n/WordBreakProperty.txt:
o New file for the word break properties.

Revision 1.5 - (view) (annotate) - [select for diffs]
Fri Jul 10 04:17:49 2009 UTC (4 years, 9 months ago) by rtoy
Branch: MAIN
CVS Tags: RELEASE_20a, release-20a-base, release-20a-pre1, snapshot-2009-08
Branch point for: RELEASE-20A-BRANCH
Changes since 1.4: +3 -1 lines
Diff to previous 1.4
unidata.lisp:
o Add *unidata-version* to hold our revision number.

save.lisp:
o Add Unicode to the herald items.  Just print out the unidata version
  along with the supported Unicode UCD version.

Revision 1.4 - (view) (annotate) - [select for diffs]
Thu Jul 2 21:00:48 2009 UTC (4 years, 9 months ago) by rtoy
Branch: MAIN
Changes since 1.3: +7 -8 lines
Diff to previous 1.3
boot-2009-07.lisp:
o Bootstrap file needed to compile this change (because the current
  shrink-vector derive-type optimizer didn't handle union types).

compiler/fndb.lisp:
o Make the compiler warn if the result of lisp::shrink-vector is not
  used.  This is a problem because the compiler doesn't know that
  shrink-vector destructively modifies the length of a vector.  As a
  partial solution, warn the user if the result of shrink-vector is
  not.

code/hash-new.lisp:
code/seq.lisp:
o Make sure the result of shrink-vector is used, to get rid of a new
  compiler warning.

code/unidata.lisp:
o Modify %unicode-full-case so that it doesn't use shrink-vector
  anymore.

compiler/seqtran.lisp:
o Fix shrink-vector derive-type optimizer to handle union types.

tools/build-unidata.lisp:
o Fix typo that someone got in.
o Make sure the result of shrink-vector is used, to get rid of a new
  compiler warning.

Revision 1.3 - (view) (annotate) - [select for diffs]
Tue Jun 16 17:23:15 2009 UTC (4 years, 10 months ago) by rtoy
Branch: MAIN
CVS Tags: portable-clx-base, portable-clx-import-2009-06-16, snapshot-2009-07
Branch point for: portable-clx-branch
Changes since 1.2: +2 -2 lines
Diff to previous 1.2
code/string.lisp:
o Only define STRING-TO-NFD, STRING-TO-NFKD, and STRING-TO-NFKC for
  Unicode builds.  Conditionalize out their support functions too.
o Update export list to be conditional on Unicode too.
o Use new name for get-pairwise-composition.

code/exports.lisp:
o Update export list to be conditional on Unicode for above changes
  in string.lisp.

code/unidata.lisp:
o Change name from GET-PAIRWISE-COMPOSITION to
  UNICODE-PAIRWISE-COMPOSITION to match other Unicode function names.

Revision 1.2 - (view) (annotate) - [select for diffs]
Thu Jun 11 16:03:59 2009 UTC (4 years, 10 months ago) by rtoy
Branch: MAIN
CVS Tags: merged-unicode-utf16-extfmt-2009-06-11
Changes since 1.1: +1054 -0 lines
Diff to previous 1.1
Merge Unicode work to trunk.  From label
unicode-utf16-extfmt-2009-06-11.

Revision 1.1.2.30 - (view) (annotate) - [select for diffs]
Wed Jun 10 00:45:08 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-utf16-extfmt-2009-06-11
Changes since 1.1.2.29: +3 -1 lines
Diff to previous 1.1.2.29 , to branch point 1.1 , to next main 1.28
Add link to Hangul composition sample code.

Revision 1.1.2.29 - (view) (annotate) - [select for diffs]
Tue Jun 9 13:07:50 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.28: +42 -2 lines
Diff to previous 1.1.2.28 , to branch point 1.1
Add CaseFolding.txt to unidata.bin so we can do case-insensitive
comparisons according to Unicode.

i18n/CaseFolding.txt:
o New file

code/unidata.lisp
o Add new slots to the unidata structure to hold the simple and full
  case-folding information.
o Add UNICODE-CASE-FOLD-SIMPLE and UNICODE-CASE-FOLD-FULL functions to
  return the case-folded codepoint or string for the simple and full
  options, respectively.

tools/build-unidata.lisp:
o Add new slots to the unidata structure and the ucdent structure to
  hold the case folding information from CaseFolding.txt.
o Update routines to read the case folding data and to write the data
  to unidata.bin.
o Speed optimization: Use a hash table whose key is the codepoint and
  whose value is the index into the vector.  This preserves the
  structure of the code but vastly improves the speed of reading and
  processing the unicode data files, especially for the derived
  normalization properties.  (We should just replace the vector with
  the hash table.)

Revision 1.1.2.28 - (view) (annotate) - [select for diffs]
Fri Jun 5 18:46:17 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.27: +4 -3 lines
Diff to previous 1.1.2.27 , to branch point 1.1
Oops.  Forgot to call the default converter in %UNICODE-FULL-CASE and
forgot to return the string.

Revision 1.1.2.27 - (view) (annotate) - [select for diffs]
Fri Jun 5 16:22:09 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.26: +66 -1 lines
Diff to previous 1.1.2.26 , to branch point 1.1
tools/build-unidata.lisp:
o Add support for reading SpecialCasing.txt to support full-casing
  operation.  (Currently does not support language-specific cases or
  context dependent cases.)
o Update some prints
o Add check to write-unidata to produce an error if we try to write
  more objects than we have allocated space for in the index table.

code/unidata.lisp:
o Support loading the full case tables
o Add functions to produce the full case string for a codepoint.

Revision 1.1.2.26 - (view) (annotate) - [select for diffs]
Thu Jun 4 15:47:40 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.25: +5 -1 lines
Diff to previous 1.1.2.25 , to branch point 1.1
code/unidata.lisp:
o Add UNICODE-ASSIGNED-CODEPOINT-P

code/string.lisp:
o Make UTF16-STRING-P check for unassigned codepoints in the string.

Revision 1.1.2.25 - (view) (annotate) - [select for diffs]
Fri May 29 16:12:40 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-06
Changes since 1.1.2.24: +19 -21 lines
Diff to previous 1.1.2.24 , to branch point 1.1
tools/build-unidata.lisp:
o Read composition exclusions from the composition exclusions files
  and save it in unidata.bin.

code/unidata.lisp:
o Read composition exclusions from unidata.bin
o Use the exclusions from unidata.bin  instead of using the
  hand-initialized list.

i18n/unidata.bin:
o Updated with composition exclusions list.

Revision 1.1.2.24 - (view) (annotate) - [select for diffs]
Thu May 28 15:04:29 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.23: +23 -23 lines
Diff to previous 1.1.2.23 , to branch point 1.1
Remove (debug 0) so DESCRIBE can give better information about these
functions.  (Should we adjust safety and space too?  We probably don't
need unsafe code everywhere.)

Revision 1.1.2.23 - (view) (annotate) - [select for diffs]
Wed May 27 20:34:19 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.22: +29 -29 lines
Diff to previous 1.1.2.22 , to branch point 1.1
code/char.lisp:
o Define CODEPOINT-LIMIT
o Define CODEPOINT type

code/extfmts.lisp
code/string.lisp
ode/unidata.lisp
pcl/simple-streams/external-formats/utf-32.lisp
pcl/simple-streams/external-formats/utf-8.lisp
o Use the CODEPOINT type in declarations.

Revision 1.1.2.22 - (view) (annotate) - [select for diffs]
Tue May 26 16:25:03 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.21: +76 -1 lines
Diff to previous 1.1.2.21 , to branch point 1.1
code/string.lisp:
o Add function (setf codepoint)
o Add docstrings for STRING-TO-NFC and STRING-TO-NFKC.
o Move things related to pairwise composition to unidata.lisp.

code/unidata.lisp:
o Things related to pairwise composition moved here.
o Adjust *COMPOSITION-EXCLUSION* to include only the non-commented
  items in CompositionExclusions.txt.
o Make BUILD-COMPOSITION-TABLE to exclude characters that can be
  derived from the decomposition.  (Basically, ignore the four
  decompositions of length greater than 1 that start with a non-zero
  combining class.)

Revision 1.1.2.21 - (view) (annotate) - [select for diffs]
Mon May 25 20:08:28 2009 UTC (4 years, 10 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.20: +78 -4 lines
Diff to previous 1.1.2.20 , to branch point 1.1
Add support for quick check normalization properties.  (From Paul.)

i18n/DerivedNormalizationProps.txt:
o New file containing the normalization data we need.

tools/build-unidata.lisp:
o Read the normalization properties and build unidata.bin to include
  four new tries, one each NFC/NFKC/NFD/NFKD.
o Add new 1 and 2 bit tries.

code/unidata.lisp:
o Read the new data
o Add new functions to return the quick check normalization data.

code/stream-vector-io.lisp:
code/stream.lisp:
o Add support for 1, 2, and 4 bit vectors for stream I/O.

Revision 1.1.2.20 - (view) (annotate) - [select for diffs]
Thu May 14 17:50:48 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.19: +3 -3 lines
Diff to previous 1.1.2.19 , to branch point 1.1
The Unicode 1.0 names were being stored in the wrong slots of
*unicode-data*, overwriting the Unicode names.  Put them in the right
slots.

Revision 1.1.2.19 - (view) (annotate) - [select for diffs]
Mon May 11 16:46:47 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.18: +15 -3 lines
Diff to previous 1.1.2.18 , to branch point 1.1
o Add constants for the magic number and the (expected) Unicode major,
  minor, and upgrade version to make the code slightly easier to read.

Revision 1.1.2.18 - (view) (annotate) - [select for diffs]
Wed May 6 13:26:18 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.17: +3 -3 lines
Diff to previous 1.1.2.17 , to branch point 1.1
Use TRUNCATE instead of FLOOR.  (Works around an issue with type
derivation and the maybe-inlined FLOOR function.)

Revision 1.1.2.17 - (view) (annotate) - [select for diffs]
Mon May 4 14:10:31 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.16: +100 -11 lines
Diff to previous 1.1.2.16 , to branch point 1.1
From Paul.  Support Unicode names for Hangul syllables and the CJK
ideographs.  These names can all be computed from the codepoint.

Revision 1.1.2.16 - (view) (annotate) - [select for diffs]
Sat May 2 11:54:37 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.15: +30 -13 lines
Diff to previous 1.1.2.15 , to branch point 1.1
Updates from Paul.  With these changes, we pass the Unicode
normalization test suite successfully for NFD and NFKD.

unidata.lisp:
o Implement algorithmic decomposition of Hangul.

string.lisp:
o Implement Unicode normalization forms NFD and NFKD.

Revision 1.1.2.15 - (view) (annotate) - [select for diffs]
Fri May 1 11:42:49 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
CVS Tags: unicode-snapshot-2009-05
Changes since 1.1.2.14: +5 -7 lines
Diff to previous 1.1.2.14 , to branch point 1.1
Updates from Paul:

o Fix some typos in comments.
o Change UNICODE-DECOMP to use T to get compatibility decompositions.
o Fix error in returning the compatibility decompositions.

Revision 1.1.2.14 - (view) (annotate) - [select for diffs]
Tue Apr 21 18:11:06 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.13: +3 -4 lines
Diff to previous 1.1.2.13 , to branch point 1.1
From Paul.

Make the decomp structure store strings instead of an array of 16-bit
integers, since the space is the same.

Revision 1.1.2.13 - (view) (annotate) - [select for diffs]
Tue Apr 21 17:47:31 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.12: +22 -40 lines
Diff to previous 1.1.2.12 , to branch point 1.1
code/string.lisp:
o From Paul:
  - Handle the ASCII special casing in string.lisp instead of
    unidata.lisp
  - Add utility functions CODEPOINT and SURROGATES.

code/unidata.lisp:
o Remove the ASCII special cases from UNICODE-LOWER, UNICODE-UPPER,
  UNICODE-TITLE.

Revision 1.1.2.12 - (view) (annotate) - [select for diffs]
Mon Apr 20 19:44:08 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.11: +40 -22 lines
Diff to previous 1.1.2.11 , to branch point 1.1
For UNICODE-LOWER, UNICODE-UPPER, and UNICODE-TITLE, add special case
to handle ASCII without loading unidata.bin.  This handles the issue
of these functions getting called early in the init process before
unicode is set up, in, for example, STRING-DOWNCASE, which is called
when setting up search lists.

Revision 1.1.2.11 - (view) (annotate) - [select for diffs]
Mon Apr 20 14:06:00 2009 UTC (4 years, 11 months ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.10: +19 -18 lines
Diff to previous 1.1.2.10 , to branch point 1.1
Bug fix from Paul:

Some unnamed chars printed as "#\" because this [unicode-name+] was
returning an empty string instead of NIL.

Revision 1.1.2.10 - (view) (annotate) - [select for diffs]
Sun Apr 19 04:15:27 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.9: +15 -10 lines
Diff to previous 1.1.2.9 , to branch point 1.1
More updates from Paul:

	changes the order [of the unicode categories], which fixes
	some bugs, too.  Need to rebuild unidata.bin once more.

Revision 1.1.2.9 - (view) (annotate) - [select for diffs]
Sat Apr 18 03:54:22 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.8: +8 -3 lines
Diff to previous 1.1.2.8 , to branch point 1.1
UNICODE-NAME-TO-CODEPOINT and UNICODE-1.0-NAME-TO-CODEPOINT need to
return NIL if the name can't be found.  This fixes the issue where
(NAME-CHAR "a") didn't return NIL since "a" isn't a name.

(From Paul)

Revision 1.1.2.8 - (view) (annotate) - [select for diffs]
Sat Apr 18 01:34:15 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.7: +2 -1 lines
Diff to previous 1.1.2.7 , to branch point 1.1
Support printing symbols with Unicode letters that have no case like
Hangul.

code/unidata.lisp:
o Add +UNICODE-CATEGORY-OTHER+ to represent Unicode category Lo.

code/print.lisp:
o New attribute OTHERCASE-ATTRIBUTE for Unicode category Lo.
o Update ATTRIBUTE-NAMES to include new attribute.
o In SYMBOL-QUOTEP, adjust letter-attribute to include
  othercase-attribute as appropriate.
o In REINIT-CHAR-ATTRIBUTES, initialize character attributes to
  include Unicode characters with category Lo.

Revision 1.1.2.7 - (view) (annotate) - [select for diffs]
Thu Apr 16 17:08:30 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.6: +8 -2 lines
Diff to previous 1.1.2.6 , to branch point 1.1
o Fix bug in LOAD-SCASE:  the ntrie is 32 bits, not 16.
o Add constants for upper and lower case categories. (Primarily for
  use in char.lisp, so we don't ever have to modify char.lisp for
  this.)

Revision 1.1.2.6 - (view) (annotate) - [select for diffs]
Thu Apr 16 14:13:55 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.5: +142 -1 lines
Diff to previous 1.1.2.5 , to branch point 1.1
Document the dictionary and ntrie structures.  (From Paul Foley.)

Revision 1.1.2.5 - (view) (annotate) - [select for diffs]
Wed Apr 15 21:19:05 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.4: +134 -70 lines
Diff to previous 1.1.2.4 , to branch point 1.1
Another update from Paul:

	added combining class, bidi info, and Unicode 1.0 names -
	that's everything from the base UnicodeData.txt (and a few
	additions).

New files: BidiMirroring.txt and NormalizationCorrections.txt

Updated unidata.bin too.

Revision 1.1.2.4 - (view) (annotate) - [select for diffs]
Wed Apr 15 14:41:55 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.3: +78 -2 lines
Diff to previous 1.1.2.3 , to branch point 1.1
Updates from Paul:

	add numeric values and decompositions to unidata, added
	char-titlecase, and made string-capitalize use title-case
	rather than upper-case, when those are different.

The unidata.bin file needs to be rebuilt, and a cross-compile needs to
be done to support the new unidata.bin format.

Revision 1.1.2.3 - (view) (annotate) - [select for diffs]
Tue Apr 14 20:55:12 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.2: +232 -143 lines
Diff to previous 1.1.2.2 , to branch point 1.1
New implementation of the unidata structures from Paul.  He says he

    changed the implementation to use a three way split of the
    codepoint instead of binary search, renamed a few things, altered
    the way it encodes the general category information slightly, so
    that "Cn" (nonexistent character) turns into #x00 (was #x08), and
    fixed the case-conversion code (which ignored titlecase
    characters).

Updated unidata.bin too with the new data.

Revision 1.1.2.2 - (view) (annotate) - [select for diffs]
Sun Apr 12 00:55:38 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1.2.1: +9 -16 lines
Diff to previous 1.1.2.1 , to branch point 1.1
Enable the optimization settings.

Revision 1.1.2.1 - (view) (annotate) - [select for diffs]
Sat Apr 11 12:04:26 2009 UTC (5 years ago) by rtoy
Branch: unicode-utf16-extfmt-branch
Changes since 1.1: +299 -0 lines
Diff to previous 1.1
Import Paul's new routines for storing and accessing the Unicode
data.

i18n/NameAliases.txt:
o New file:  Unicode NameAliases

tools/build-unidata.lisp:
o New file: Reads UnicodeData.txt and NameAliases.txt and creates
  unidata.bin that is accessed by Lisp to obtain unicode information.

code/unidata.lisp:
o New file:  Lisp interface to unidata.bin

code/char.lisp:
o Updated to use the new interface

code/print.lisp:
o Can't set up characer-attributes array with full Unicode data at
  startup because the search-list isn't set up yet.  Hence, only
  initialize part of the array, and use an
  *after-save-initializations* function to fill array with Unicode
  data after the search-list has been initialized.

compiler/srctran.lisp:
o Update deftransforms to use the new interface.

tools/make-main-dist.sh:
o Copy unidata.bin into the distribution.

tools/worldbuild.lisp:
o Load unidata.lisp

tools/worldcom.lisp:
o Compile unidata.lisp

Revision 1.1
Sat Apr 11 12:04:26 2009 UTC (5 years ago) by rtoy
Branch: MAIN
Branch point for: unicode-utf16-extfmt-branch
FILE REMOVED
file unidata.lisp was initially added on branch unicode-utf16-extfmt-branch.

This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.

  Diffs between and
  Type of Diff should be a

Sort log by:

  ViewVC Help
Powered by ViewVC 1.1.5