Question about UIOP:ENSURE-DIRECTORY-PATHNAME

Richard M Kreuter kreuter at progn.net
Fri Jul 7 02:11:06 UTC 2023


Robert Goldman <rpgoldman at sift.net> wrote:

> While addressing ASDF issue #140...

Hi Robert,

I don't know if you ever saw my writeup about this operation (sometimes
called by its LispM name, PATHNAME-AS-DIRECTORY) on sbcl-devel, but here
you go:

https://sourceforge.net/p/sbcl/mailman/message/37699633/

The root of the problem for ASDF #140 issue is that CMUCL-descended
pathname implementations expose a subtle, low-prevlence (and IMO
pointless [*]) trap for users: string-valued pathname components, and
string valued arguments to MAKE-PATHNAME in "native" syntax, not
namestring syntax:

;; on Unix, where #\\ is the escape character
* (setq p (pathname "a\\\\b"))
#P"a\\\\b"
* (pathname-name p)
"a\\b"
* (file-namestring p)
"a\\\\b"

The "native" syntax is the one you must use as arguments to
MAKE-PATHNAME, but the namestring syntax is the one you're implicitly
using if you pass strings around as pathname designators, and that
you explicitly receive from namestring functions.

IOW, juggling pathname components (like ENSURE-DIRECTORY-PATHNAME does)
or concatenating strings to compose portions of file specifications is
formally "unsound" around the edge cases unless you've considered the
"provenance" or "intended syntax" of each string. In particular, using
the result of FILE-NAMESTRING as an element of the directory list to
MAKE-PATHNAME is unsound.

Note: although I'm saying that certain things are notionally unsound, in
reality of course filenames having asterisk, question mark,
left-bracket, or backslash are exceedingly rare on Unix. So the the
problem I'm describing here doesn't occur very often in reality. On the
other hand, the very rareness of such things probably makes it /more
likely/ that folks have low-prevalence, unknown-severity bugs lurking in
their programs. :-(

Anyhow, in the sbcl-devel message I've linked to above, I dissected a
few further semantic issues in the rendition of this operation in
SB-COVER; some of those issues might also be relevant to
ENSURE-DIRECTORY-PATHNAME, too, I'm not sure.

Regards,
Richard

[*] The underlying issue is that Unix has no wildcard syntax, and so no
need for escape syntax either. So somebody at CMU circa 1990 had to
decide how to represent

  (pathname-name "a*")    ;; wild, by custom
  (pathname-name "a\\*")  ;; the not-wild analogue of the preceding
  (pathname-name "a\\\\") ;; not wild, a logical result given the preceding

They could've just said "a pathname's component strings are subsequences
of the namestring" and be done with it, but instead they decided

  (pathname-name "a*")
  => #<PATTERN "a" :MULTI-CHAR-WILD>
  (pathname-name "a\\*")
  => "a*"
  (pathname-name "a\\\\")
  => "a\\"

I imagine that these representations were chosen to speed up
manufacturing filenames for system calls: you can concatenate
components' strings without examining those strings' elements.

If that was the rationale, that use could have been solved for
differently: had they stored subsequences of namestrings, they could
have stored 3 bits somewhere in the pathname indicating whether the
name, type and directory contained strings that needed "examination"
(i.e., either they contain wildcard characters or the escape character)
when making a filename. This would have allowed for all reasonable
filenames to be composed by simple concatenation, without exposing two
almost-always-equivalent but formally subtly incompatible string
syntaxes to users.

(Also, PATTERNs are pointless, space-wasting, inefficient, user-hostile
nonsense. But that's a different story.)



More information about the asdf-devel mailing list