Parent Directory | Revision Log
|Links to HEAD:||(view) (annotate)|
Add trailing newline.
Some changes to replace calls to gettext with _"" or _N"" for things compiled with and without Unicode. This is needed so that the pot files have the same content for both unicode and non-unicode builds. (The _"" and _N"" are handled by the reader, so things that are conditionalized out still get processed, unlike using gettext.)
o Inhibit warnings from SURROGATEP; I'm tired seeing the code deletion notes now. o Tell the compiler what type the first return value of CODEPOINT is. Apparently, the compiler can't figure that out itself.
Add support for Unicode 5.2. The normalization and wordbreak tests pass. code/string.lisp: o In %compose, handle the case where the composite character is outside the BMP and thus needs special handling for our UTF-16 strings. code/unidata.lisp o CKJ Ideograph range has changed in 5.2. o Fix bug in build-composition-table. We were not correctly handling the case where the decomposition of a codepoint was outside the BMP. Special care is needed to handle the UTF-16 strings that we use. o The key for the pairwise composition table are the full codepoints, so we need to shift one by 21 bits instead of 16. tools/build-unidata.lisp o Update minor version to 2. i18n/BidiMirroring.txt i18n/CaseFolding.txt i18n/CompositionExclusions.txt i18n/DerivedNormalizationProps.txt i18n/NameAliases.txt i18n/NormalizationCorrections.txt i18n/SpecialCasing.txt i18n/UnicodeData.txt i18n/WordBreakProperty.txt i18n/tests/NormalizationTest.txt i18n/tests/WordBreakTest.txt o Updated from Unicode 5.2. i18n/unidata.bin o Regenerated from new Unicode 5.2 files.
Add function to convert a sequence of codepoints to a string and a function to convert a string to a list of codepoints.
Change uses of _"foo" to (intl:gettext "foo"). This is because slime may get confused with source locations if the reader macros are installed.
Remove _N"" reader macro from docstrings when possible.
Merge intl-branch 2010-03-18 to HEAD. To build, you need to use boot-2010-02-1 as the bootstrap file. You should probably also use the new -P option for build.sh to generate and update the po files while building.
Restart internalization work. This new branch starts with code from the intl-branch on date 2010-02-12 18:00:00+0500. This version works and LANG=en@piglatin bin/lisp works (once the piglatin translation is added).
Mark translatable strings; update cmucl.pot and ko/cmucl.po accordingly.
Add (intl:textdomain "cmucl") to the files to set the textdomain.
Merge changes from unicode-string-buffer-impl-branch which gives faster reads on external-formats. This is done by adding an additional buffer to streams so we can convert the entire in-buffer into characters all at once. To build this change, you need to do a cross-compile using boot-2009-10-1-cross.lisp. Using that build, do a normal build with these sources. For a non-unicode build use boot-2009-10-01.lisp with a 20a non-unicode build. code/extfmts.lisp: o Add another slot to the extfmts for copying the state. o Modify EF-OCTETS-TO-STRING and OCTETS-TO-STRING to support the necesssary changes for fast formats. This is incompatible with the previous version because the string is not grown if needed. code/fd-stream-extfmt.lisp: o Set *enable-stream-buffer-p* to T so we have fast streams. code/fd-stream.lisp: o Add new slots to support fast strams. o In SET-ROUTINES, initialize the new slots appropriately. o Update UNREAD-CHAR to be able to back up in the string buffer to unread. o Add implementation to copy the state of an external format. code/stream.lisp: o Change %SET-FD-STREAM-EXTERNAL-FORMAT to be able to change formats even if we've already converted the buffer with a different format. We reconvert the buffer with the old format until we reach the current character. Then the remaining octets are converted using the new format and stored in the string buffer. o Add FAST-READ-CHAR-STRING-REFILL to refill the string buffer, like FAST-READ-CHAR-REFILL does for the octet in-buffer. code/struct.lisp: o Add new slots to hold the string buffer, the current index, and length. These are needed for the fast formats. code/sysmacs.lisp: o Update PREPARE-FOR-FAST-READ-CHAR, DONE-WITH-FAST-READ-CHAR, and FAST-READ-CHAR to support the string buffer. code/string.lisp: o Microoptimization of SURROGATEP to reduce the number of branchs. general-info/release-20b.txt: o Update with these changes pcl/simple-streams/external-formats/utf-16-be.lisp: pcl/simple-streams/external-formats/utf-16-le.lisp: pcl/simple-streams/external-formats/utf-16.lisp: o These formats actually have state, so update them to take a handle an initial state. These are needed if the string buffer ends with a leading surrogate and the next string buffer starts with a trailing surrogate. The conversion needs to combine the surrogates together.
Minor optimization for SURROGATEP to have fewer branches in the generated code.
Oops. Remove old code that didn't support our UTF-16 strings.
Add support for the Unicode word break algorithm for STRING-CAPITALIZE. Not sure about the appropriate interface, though. code/string.lisp: o Add Unicode word break algorithm. Based on Scheme code by William Clinger. Used with permission. o Update STRING-CAPITALIZE to take another keyword arg to indicate if we should use the Unicode word break algorithm. Default is not to use the Unicode algorithm. compiler/fndb.lisp: o Update defknown for string-capitalize. i18n/tests/WordBreakTest.txt: o New test file for the word break algorithm i18n/tests/word-break-test.lisp: o New file to run the word break test.
Use more descriptive argument names for SURROGATEP and SURROGATES-TO-CODEPOINT.
Oops. utf16-string-p was returning NIL if the codepoint was assigned. It should return NIL if the codepoint is NOT assigned.
Clean up a few compiler warnings about unused variables.
Cleanups for non-unicode build. code/stream.lisp: o Only define (setf stream-external-format) for Unicode builds. o In stream-external-format, don't try to look up the external format from the fd-stream structure, which doesn't exist in non-unicode builds. code/strings.lisp: o Conditionalize out things that will only work if unicode is available. tools/worldcom.lisp: o Only compile fd-stream-extfmt for unicode builds.
code/string.lisp: o Only define STRING-TO-NFD, STRING-TO-NFKD, and STRING-TO-NFKC for Unicode builds. Conditionalize out their support functions too. o Update export list to be conditional on Unicode too. o Use new name for get-pairwise-composition. code/exports.lisp: o Update export list to be conditional on Unicode for above changes in string.lisp. code/unidata.lisp: o Change name from GET-PAIRWISE-COMPOSITION to UNICODE-PAIRWISE-COMPOSITION to match other Unicode function names.
Merge Unicode work to trunk. From label unicode-utf16-extfmt-2009-06-11.
Fix typos in string-case-fold.
Revert previous change that added case folding to string-equal and friends. We can't really do that for a couple of reasons: - Case folding should be done on the NFD form according to the Unicode spec - Full case folding may change the length of the string so it's not clear what the return value from string-lessp and friends should be. Instead, we provide a new function, STRING-CASE-FOLD, to perform case folding. code/char.lisp: o Use lowercase for case insensitve comparisons again. code/string.lisp: o Remove :casing option for string-lessp and friends. o Remove code needed to support :casing option. o Add STRING-CASE-FOLD to perform case folding operation. compiler/fndb.lisp: o Remove :casing option from defknowns.
o Only recognize :simple and :full for the casing parameter. o Update docstrings to mention the casing parameter.
code/char.lisp: o Use simple case folding for case-insensitive character comparisons. code/string.lisp: o Add new :CASING parameter for STRING-EQUAL and friends to allow for simple or full case folding. Default is :SIMPLE. o Update code to allow for simple or full case folding. compiler/fndb.lisp: o Tell compiler about new :CASING parameter.
o Update STRING-TRIM, STRING-LEFT-TRIM, and STRING-RIGHT-TRIM to handle surrogates in the string. If the character bag is a string, we properly handle surrogates in the character bag. o Document that STRING-TO-NFC and STRING-TO-NFKC can both return the original string untouched.
First cut at full-casing support. code/string.lisp: o Add :CASING parameter to STRING-UPCASE, STRING-DOWNCASE, and STRING-CAPITALIZE to allow whether :SIMPLE or :FULL casing is done. Default is :SIMPLE. o Implement full casing for upcase, downcase, and capitalize. compiler/fndb.lisp: o Tell compiler about the extra parameter.
code/unidata.lisp: o Add UNICODE-ASSIGNED-CODEPOINT-P code/string.lisp: o Make UTF16-STRING-P check for unassigned codepoints in the string.
Slightly modify DECOMPOSE so it can operate on non simple strings.
code/char.lisp: o Define CODEPOINT-LIMIT o Define CODEPOINT type code/extfmts.lisp code/string.lisp ode/unidata.lisp pcl/simple-streams/external-formats/utf-32.lisp pcl/simple-streams/external-formats/utf-8.lisp o Use the CODEPOINT type in declarations.
code/seq.lisp: o Moved STRING-REVERSE* and STRING-NREVERSE* to string.lisp because we need to use WITH-STRING. code/string.lisp: o Fix STRING-REVERSE* and STRING-NREVERSE* which were not properly handling non-simple strings. The following tests were not returning "edcba": (let* ((x (make-array 10 :initial-contents "abcdefghij" :fill-pointer 5 :element-type 'base-char)) (y (reverse x))) y) (let* ((x (make-array 10 :initial-contents "abcdefghij" :fill-pointer 5 :element-type 'character)) (y (nreverse x))) y)
o Revert previous change to STRING-TO-NFC and STRING-TO-NFKC. o Use WITH-STRING in NORMALIZED-FORM-P so we operate on the underlying simple-string data.
NORMALIZED-FORM-P needs simple-strings. We should to this in a different way, but this will do for now.
code/string.lisp: o Add function (setf codepoint) o Add docstrings for STRING-TO-NFC and STRING-TO-NFKC. o Move things related to pairwise composition to unidata.lisp. code/unidata.lisp: o Things related to pairwise composition moved here. o Adjust *COMPOSITION-EXCLUSION* to include only the non-commented items in CompositionExclusions.txt. o Make BUILD-COMPOSITION-TABLE to exclude characters that can be derived from the decomposition. (Basically, ignore the four decompositions of length greater than 1 that start with a non-zero combining class.)
Add support for Unicode NFC and NFKC forms. Implement STRING-TO-NFC and STRING-TO-NFKC. This probably needs some more work. The composition table should probably be a trie and should be in unidata.bin instead of the hash table that we use now. The composition exclusion list should be probably be in unidata.bin too instead of here. These functions pass all of the normalization tests.
Fix bug in DECOMPOSE which was no longer sorting the combining characters in combining-category order. We now pass the NFD and NFKD normalization tests again. (Fix from Paul)
string.lisp: o Add SURROGATEP function to test if something is a surrogate value. extfmts.lisp: utf-16-be.lisp: utf-16-le.lisp: utf-16.lisp: utf-32-be.lisp: utf-32-le.lisp: utf-32.lisp: utf-8.lisp: o Use SURROGATEP.
Do case-insensitive comparison by converting to lower case instead of upper case. This is what Unicode CaseFolding.txt does. One example of where it matters is U+1E9E is mapped to a lower case U+DF. But the upper case version of U+DF is U+DF. char.lisp: o Change EQUAL-CHAR-CODE to convert to lowercase. string.lisp: o Change EQUAL-CHAR-CODEPOINT to convert to lowercase. o Fix mistake in STRING-LESS-GREATER-EQUAL which was incorrectly comparing the codepoints instead of the equal-char-codepoint values.
Add UTF16-STRING-P to determine if a string is a valid UTF-16 encoded string.
STRING-LESS-GREATER-EQUAL handles codepoints so STRING-LESSP and friends now sort in codepoint order (after converting to uppercase).
o Lots of spelling fixes from Paul. o Add unicode codepoints in final-sigma.lisp (in case the characters there don't show up correctly). o Support partial-fill in READ-INTO-STRING.
Simple docstrings for STRING-TO-NFD and STRING-TO-NFKD.
From Paul: Package and symbols names in Unicode need to be in a canonical normalization form (eventually...when NFC is implemented)
From Paul. Use CODEPOINT in %GLYPH-B.
Updates from Paul. o Use CODEPOINT instead of XCHAR in %GLYPH-F o Simplify DECOMPOSE
Updates from Paul. With these changes, we pass the Unicode normalization test suite successfully for NFD and NFKD. unidata.lisp: o Implement algorithmic decomposition of Hangul. string.lisp: o Implement Unicode normalization forms NFD and NFKD.
string.lisp: o Add Paul's SURROGATES-TO-CODEPOINT and remove CODEPOINT-FROM-SURROGATES. o Change SURROGATES to return characters, not numbers. o Update callers of SURROGATES to match. extfmts.lisp: o Update callers of SURROGATES to match. o Use CODEPOINT to extract the correct codepoint from a string in EF-STRING-TO-OCTETS and EF-OCTETS-TO-STRING.
o Add new function CODEPOINT-FROM-SURROGATES to compute the codepoint from two surrogate values. (Should we use a better name?) o Use the new function in CODEPOINT. o Add docstrings to the functions.
code/string.lisp: o From Paul: - Handle the ASCII special casing in string.lisp instead of unidata.lisp - Add utility functions CODEPOINT and SURROGATES. code/unidata.lisp: o Remove the ASCII special cases from UNICODE-LOWER, UNICODE-UPPER, UNICODE-TITLE.
NSTRING-UPCASE and NSTRING-DOWNCASE were referencing the unknown symbols NEWSTRING and NEW-INDEX. Replace with STRING and INDEX, respectively. I think that's what was intended.
From Paul: Here's a version of [n]string-(up|down)case that handles non-BMP characters. Also added functionless stubs for normalization forms. Improved string-reverse* and implemented string-nreverse* in a way that shouldn't cons (not the original way I worked out, which might be faster but is quite complicated). (The glyph builder now stops when it hits a combining character that's out of sequence (canonical order)---I'm not sure whether or not that's the Right Thing to do)
More updates from Paul. code/seq.lisp: o Update SEQ-DISPATCH to allow a special dispatch form for strings. o Implement STRING-REVERSE* that correctly handles our UTF-16 strings. o Implement STRING-NREVERSE*, but this needs work to reduce consing. code/string.lisp: o Add GLYPH and SGLYPH to return the glyph from a position in a string. code/exports.lisp: o Export GLYPH and SGLYPH
Updates from Paul: add numeric values and decompositions to unidata, added char-titlecase, and made string-capitalize use title-case rather than upper-case, when those are different. The unidata.bin file needs to be rebuilt, and a cross-compile needs to be done to support the new unidata.bin format.
Merge from unicode-utf16 branch, label unicode-utf16-char-support-2009-03-25 to get character support.
Compare strings using code-point order, since we don't have any kind of collation support.
Instead of ignoring the :element-type argument to MAKE-STRING, we check that it's a valid subtype of character (then ignore it).
Checked in Brian Spilsbury's experimental Unicode, locales, and dialect support patchset. This lives on its own branch, so that people can play with it and tweak it, without disturbing 18e release engineering on the main branch. Bootstrapping has only been tried on LINKAGE_TABLE x86/Linux builds. A working cross-compile script is checked in under bootfiles/19a/boot1-cross-unicode.lisp. The script still leaves you with some interactive errors, on the cross compile, which you should answer with 2. See the mailing list for more information.
Mega commit to bring RELENG_18 branch in sync with HEAD in preparation for release tagging 18d.
From eric Marsden: Fix some error types to be ANSI compliant.
A few well placed inhibit-warnings declarations to suppress noise in compile-lisp.log. Only 46/130 notes left.
This (huge) revision brings the RELENG_18 branch up to the current HEAD. Note code/unix-glib2.lisp not yet included -- not sure it is ready to go.
ANSI CL compat. changes: o Add an optional environment argument to constantp; ignored by CMUCL. o Add the :element-type keyword to make-string.
Merged DTC's patch to string<>=*-body which fixes various problems that arose when :start2 :end2 values were specified.
Fix headed boilerplate.
Removed an extra ``)''.
Changed STRING-xxxCASE to not assign arguments.
Changed the WITH-xxx-STRINGs macros to use simply WITH-ARRAY-DATA, now that it is more clever. Also, changed it to accept any STRINGable thing, instead of just strings and symbols. These macros now bind the offset var instead of randomly setting it.
New file header with RCS header FILE-COMMENT.
Moved MIPS branch onto trunk; no merge necessary.
Fixed with-mumble-string(s) macros to not say (the simple-string foo) when foo isn't a simple-string.
Initial MIPS version.
This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.
|Powered by ViewVC 1.1.5|