Embeddable Common-Lisp

Experimenting: Windows console

Written by jjgarcia on 2012-05

Not long ago a user reported problems with ECL when running under Windows with the Chinese codepage 936. I was really surprised to verify that indeed, ECL was aborting on any multibyte character... but only when typed in a terminal window (the so called Windows console).

A bit of investigation revealed that the Microsoft C runtime is quite broken (http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/) or at least not very POSIX / ANSI conformant. Roughly, some versions of the runtime act differently when the standard input or output of a program is connected to a console. In this case they try to be very clever and translate the Unicode input/output generated by the console into the selected codepage. This also leads the CRT to reject multibyte characters because read() demands a buffer with as many bytes as a whole character needs. For instance, if a character with codes F7 8C is input at the console, read(0, buffer, 1), instead of returning F7, will abort. Or put some other way: characters cannot be read one by one.

All these problems disappear when the user forgets about POSIX/ANSI file descriptors and works with the Windows API and indeed there are MSDN blogs out there asking you to do so... And so we did.

As of now, ECL incorporates new ANSI streams for handling the Windows console. Currently they cannot be created by the user and are activated by ECL whenever the input or output of a program are connected to a console. The stream is interactive and read, write, listen, etc, they all work as expected.

This also has the added benefit that ECL now detects which codepage is used by the console and in that case it activates by default the expected external format --- which you are free to correct with (SETF STREAM-EXTERNAL-FORMAT) IIRC.

I would like to ask people working with Windows to test and help debug this new feature.