Parent Directory | Revision Log
|Links to HEAD:||(view) (annotate)|
Update to Unicode 6.0.0. code/unidata.lisp: o Update unicode version to 6.0.0 o Add pointer to build-unidata.lisp. tools/build-unidata.lisp: o Update unicode version to 6.0.0 o Print out directory path so we can see where we're getting the data from. i18n/CaseFolding.txt i18n/CompositionExclusions.txt i18n/DerivedNormalizationProps.txt i18n/NameAliases.txt i18n/NormalizationCorrections.txt i18n/SpecialCasing.txt i18n/UnicodeData.txt i18n/WordBreakProperty.txt i18n/tests/NormalizationTest.txt i18n/tests/WordBreakTest.txt: o Update with new files from unicode.org.
Merge fix for long standing bug where the trie for Unicode 1.0 names was given the wrong split value.
Oops. Fix long standing bug where the trie for Unicode 1.0 names was given the wrong split value.
Simple refactoring: Add function to write out a dictionary and use it to write out the unicode name dictionaries.
code/unidata.lisp: o Just add some comments on why we don't put the dictionaries in unidata.bin. o Print out some messages when building the hangul and cjk dictionaries so the user knows what's happening. tools/build-unidata.lisp: o Add some comments on the various parts of unidata.bin.
Add support for Unicode 5.2. The normalization and wordbreak tests pass. code/string.lisp: o In %compose, handle the case where the composite character is outside the BMP and thus needs special handling for our UTF-16 strings. code/unidata.lisp o CKJ Ideograph range has changed in 5.2. o Fix bug in build-composition-table. We were not correctly handling the case where the decomposition of a codepoint was outside the BMP. Special care is needed to handle the UTF-16 strings that we use. o The key for the pairwise composition table are the full codepoints, so we need to shift one by 21 bits instead of 16. tools/build-unidata.lisp o Update minor version to 2. i18n/BidiMirroring.txt i18n/CaseFolding.txt i18n/CompositionExclusions.txt i18n/DerivedNormalizationProps.txt i18n/NameAliases.txt i18n/NormalizationCorrections.txt i18n/SpecialCasing.txt i18n/UnicodeData.txt i18n/WordBreakProperty.txt i18n/tests/NormalizationTest.txt i18n/tests/WordBreakTest.txt o Updated from Unicode 5.2. i18n/unidata.bin o Regenerated from new Unicode 5.2 files.
tools/build-unidata.lisp: o Add support for word break properties. o Some cleanup of the code including moving the common code in write-ntrie* to write-ntrie. code/unidata.lisp: o Add support for word break properties. o UNICODE-WORD-BREAK-CODE and UNICODE-WORD-BREAK return the property code and the property keyword for a codepoint, respectively. i18n/WordBreakProperty.txt: o New file for the word break properties.
boot-2009-07.lisp: o Bootstrap file needed to compile this change (because the current shrink-vector derive-type optimizer didn't handle union types). compiler/fndb.lisp: o Make the compiler warn if the result of lisp::shrink-vector is not used. This is a problem because the compiler doesn't know that shrink-vector destructively modifies the length of a vector. As a partial solution, warn the user if the result of shrink-vector is not. code/hash-new.lisp: code/seq.lisp: o Make sure the result of shrink-vector is used, to get rid of a new compiler warning. code/unidata.lisp: o Modify %unicode-full-case so that it doesn't use shrink-vector anymore. compiler/seqtran.lisp: o Fix shrink-vector derive-type optimizer to handle union types. tools/build-unidata.lisp: o Fix typo that someone got in. o Make sure the result of shrink-vector is used, to get rid of a new compiler warning.
Merge Unicode work to trunk. From label unicode-utf16-extfmt-2009-06-11.
Refactor WRITE-UNIDATA by moving common code that writes ntries into their own routines.
Add CaseFolding.txt to unidata.bin so we can do case-insensitive comparisons according to Unicode. i18n/CaseFolding.txt: o New file code/unidata.lisp o Add new slots to the unidata structure to hold the simple and full case-folding information. o Add UNICODE-CASE-FOLD-SIMPLE and UNICODE-CASE-FOLD-FULL functions to return the case-folded codepoint or string for the simple and full options, respectively. tools/build-unidata.lisp: o Add new slots to the unidata structure and the ucdent structure to hold the case folding information from CaseFolding.txt. o Update routines to read the case folding data and to write the data to unidata.bin. o Speed optimization: Use a hash table whose key is the codepoint and whose value is the index into the vector. This preserves the structure of the code but vastly improves the speed of reading and processing the unicode data files, especially for the derived normalization properties. (We should just replace the vector with the hash table.)
tools/build-unidata.lisp: o Add support for reading SpecialCasing.txt to support full-casing operation. (Currently does not support language-specific cases or context dependent cases.) o Update some prints o Add check to write-unidata to produce an error if we try to write more objects than we have allocated space for in the index table. code/unidata.lisp: o Support loading the full case tables o Add functions to produce the full case string for a codepoint.
tools/build-unidata.lisp: o Read composition exclusions from the composition exclusions files and save it in unidata.bin. code/unidata.lisp: o Read composition exclusions from unidata.bin o Use the exclusions from unidata.bin instead of using the hand-initialized list. i18n/unidata.bin: o Updated with composition exclusions list.
Add support for quick check normalization properties. (From Paul.) i18n/DerivedNormalizationProps.txt: o New file containing the normalization data we need. tools/build-unidata.lisp: o Read the normalization properties and build unidata.bin to include four new tries, one each NFC/NFKC/NFD/NFKD. o Add new 1 and 2 bit tries. code/unidata.lisp: o Read the new data o Add new functions to return the quick check normalization data. code/stream-vector-io.lisp: code/stream.lisp: o Add support for 1, 2, and 4 bit vectors for stream I/O.
o Add constants for the magic number and the Unicode major, minor, and upgrade version to make the code slightly easier to read. o Add optional arg to BUILD-UNIDATA to allow user to specify where the Unicode files are. (Requires updating READ-DATA and FOREACH-UCD.) This is useful when the original default directory doesn't work.
Document the magic number for the unidata.bin file.
Updates from Paul: build-unidata.lisp: o Fix bug in PACK-DECOMP which was not computing surrogate pairs correctly. o Fix up and add some comments. o Move the NameAliases, NormalizationCorrections, and BidiMirroring to READ-DATA. unidata.bin: o Updated to reflect corrections in PACK-DECOMP
More updates from Paul: changes the order [of the unicode categories], which fixes some bugs, too. Need to rebuild unidata.bin once more.
Another update from Paul: added combining class, bidi info, and Unicode 1.0 names - that's everything from the base UnicodeData.txt (and a few additions). New files: BidiMirroring.txt and NormalizationCorrections.txt Updated unidata.bin too.
Updates from Paul: add numeric values and decompositions to unidata, added char-titlecase, and made string-capitalize use title-case rather than upper-case, when those are different. The unidata.bin file needs to be rebuilt, and a cross-compile needs to be done to support the new unidata.bin format.
New implementation of the unidata structures from Paul. He says he changed the implementation to use a three way split of the codepoint instead of binary search, renamed a few things, altered the way it encodes the general category information slightly, so that "Cn" (nonexistent character) turns into #x00 (was #x08), and fixed the case-conversion code (which ignored titlecase characters). Updated unidata.bin too with the new data.
Import Paul's new routines for storing and accessing the Unicode data. i18n/NameAliases.txt: o New file: Unicode NameAliases tools/build-unidata.lisp: o New file: Reads UnicodeData.txt and NameAliases.txt and creates unidata.bin that is accessed by Lisp to obtain unicode information. code/unidata.lisp: o New file: Lisp interface to unidata.bin code/char.lisp: o Updated to use the new interface code/print.lisp: o Can't set up characer-attributes array with full Unicode data at startup because the search-list isn't set up yet. Hence, only initialize part of the array, and use an *after-save-initializations* function to fill array with Unicode data after the search-list has been initialized. compiler/srctran.lisp: o Update deftransforms to use the new interface. tools/make-main-dist.sh: o Copy unidata.bin into the distribution. tools/worldbuild.lisp: o Load unidata.lisp tools/worldcom.lisp: o Compile unidata.lisp
file build-unidata.lisp was initially added on branch unicode-utf16-extfmt-branch.
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.
|Powered by ViewVC 1.1.5|