ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.12 - (show annotations)
Wed Feb 4 07:49:07 2004 UTC (10 years, 2 months ago) by kaz
Branch: MAIN
CVS Tags: asdf-import-branch~merged-to-HEAD-0, mcvs-1-1-98, asdf-import-branch~branch-point, mcvs-1-1-0, HEAD
Branch point for: asdf-import-branch
Changes since 1.11: +13 -0 lines
Merging from mcvs-1-0-branch.
1 Meta-CVS --- A Directory Structure
2 Versioning Layer Over
3 The Concurrent Versions System.
5 Kaz Kylheku
6 Originally published January 25, 2002
7 Edited and Revised February 3, 2004
10 "Directory versioning is a Hard Problem" -- Subversion FAQ
12 "Any problem in computer science can be solved with
13 another layer of indirection" -- David Wheeler
16 Abstract
18 This is Meta-CVS, Meta-(Concurrent Versions System), a front end for
19 CVS. It supports the concurrent and independent versioning of
20 files, as well as a directory structure, by several people. I have
21 it been using it for a few weeks now, mostly just to version the
22 Meta-CVS sources themselves. It uses the cvs program in such a way that
23 you can not only version the file contents, but you can move and
24 rename files. These changes are committed to the repository, and
25 can be picked up by an update, which will incorporate them by
26 rearranging the working copy accordingly. There can be conflicting
27 parallel changes to the structure, which can be resolved like any
28 other conflict. It is all Lisp.
31 Contents
33 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 42
34 2. Data Representation Overview . . . . . . . . . . . . . . . . . 75
35 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 120
36 2.2 Symbolic links . . . . . . . . . . . . . . . . . . . . 210
37 2.3 Synchronization . . . . . . . . . . . . . . . . . . . . 210
38 2.4 Partial Sandboxes . . . . . . . . . . . . . . . . . . . 210
39 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 247
40 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 260
41 3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 286
42 3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 320
45 1. Introduction
47 The software known as CVS has been in existence since the year
48 1986, when its first version, consisting of shell scripts acting
49 as a front end to RCS commands, was posted to Usenet by Dick
50 Grune. Over the next fifteen years, CVS was turned into a C
51 program, enhanced and debugged. But in its present form, version
52 1.11, it still has annoying quirks and some serious limitations.
54 One of the biggest limitations of CVS is it does not treat the
55 directory structure of a module as a versioned object. Meta-CVS
56 solves this problem not by intruding in any way into the
57 well-debugged and time-tested CVS code, but by introducing a layer
58 of indirection. Meta-CVS retains the fundamental capabilites of
59 CVS: the ability to branch and merge, to work in parallel, to work
60 over a variety of network transports and so on. CVS worked as a
61 front end for RCS; similarly, Meta-CVS is a front end for CVS.
63 It turns out that Meta-CVS solves a few other infelicities in CVS
64 as well. A few tricky scenarios that cause grief in CVS are no
65 longer problems in Meta-CVS, such as: two developers concurrently
66 adding the same file, or one developer removing a file that
67 another is working on.
69 Meta-CVS works by creating a special representation of the
70 versioned file tree, and this special representation is what is
71 stored in CVS. Thus the naive direct mapping between the versioned
72 tree and the tree in the repository is avoided.
74 The aim of this paper is to document this simple representation
75 and explain how it supports the directory versioning operation.
78 2. Data Representation Overview
80 In order to obtain, from CVS, the ability to perform parallel
81 version control over any object, it is necessary to represent that
82 object as a text file. This is a given. CVS can effectively handle
83 only text input in its merging and conflict identification
84 algorithms. A critical non-functional constraint in the
85 requirements of Meta-CVS is that CVS is not to be modified in any
86 way; nobody should have to to install new CVS code on a client or
87 server machine to use Meta-CVS. Morever, the CVS code is fragile C
88 that has been debugged for over a decade (and counting).
90 To treat the file structure as a versioned entity, therefore, it
91 is necessary to represent it as a text file. What structure should
92 that text file have?
94 Firstly, it would be highly desirable if small changes, such as
95 renaming a few files, gave rise to small differences. Moreover,
96 a single change should only affect at most one line or two in the
97 text file. This property would allow for parallel changes with
98 minimal conflicts. The text file representation should also be
99 human readable and editable, because humans will have to resolve
100 conflicts in it.
102 Secondly, a file must somehow retain its identity and CVS history
103 when its path name changes. This means that we must never change
104 the name of the file, at least not the name which is known to CVS.
106 Meta-CVS represents the file structure of a project as a simple
107 entity called a ``file mapping''. The file mapping associates path
108 names with a flat database of files. Both the mapping and the
109 files are stored in CVS. The files have machine-generated names;
110 only through the mapping are they given actual names as they
111 appear to the users. The names known to CVS are called ``F-
112 files''.
114 Meta-CVS manipulates the mapping as a simple data structure in the
115 Lisp language. Lisp has a built-in parser and formatter for
116 reading a printed representation of a list object and producing a
117 printed representation. Thus the text file format for the Meta-CVS
118 mapping is simply a file containing a Lisp association list, with
119 special care taken to print each element of the association on a
120 separate line of text, and maintaining a consistently sorted order.
122 The separation of the directory structure from a flat file database
123 is nothing new; separation of a directory service from a flat file
124 service is a common theme in the design of filesystems.
126 Meta-CVS imitates the UNIX filesystem to the extent that a
127 restore command hunts down ``unlinked'' files and places them
128 into a lost+found directory under cryptic names derived from
129 their ID's, which behave analogously to inode numbers!
130 In UNIX, a file can remain in use while it is deleted from
131 the directory structure, and so it is in Meta-CVS. A file
132 removal in Meta-CVS is a non-destructive unlinking from the directory
133 structure.
135 2.1 File Mapping Example
137 Suppose that some project 'foo' consists of these files:
139 foo/README
140 foo/inc/foo.h
141 foo/src/Makefile
142 foo/src/foo.c
144 what does a Meta-CVS representation look like? This is best
145 understood in terms of the working copy checked out from CVS via
146 Meta-CVS, which contains these things:
148 foo/MCVS/CVS/Entries
149 foo/MCVS/CVS/... other CVS metadata ...
151 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
152 foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h
153 foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
154 foo/MCVS/F-1C43C940D8745CAA78752C1206316B55.c
155 foo/MCVS/MAP
158 foo/README
159 foo/inc/foo.h
160 foo/src/Makefile
161 foo/src/foo.c
163 There is a subdirectory called MCVS, which contains a CVS
164 subdirectory. This MCVS subdirectory is in fact the CVS
165 ``sandbox''. Everything else under foo are the working files.
166 Thus every Meta-CVS working copy is just an ordinary file tree,
167 except that the top level directory contains a MCVS subdirectory
168 with interesting contents.
170 What are these files under MCVS? There are some files with cryptic
171 names like F-123D...438. Then there are two files MAP and
174 Firstly, it should be understood that the F- files and MAP are
175 versioned in CVS. On the other hand, MAP-LOCAL is a file that is
176 not known to CVS, but important to the functioning of Meta-CVS.
178 The four F- files are the actual CVS representations of
179 foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
181 What establishes the relationship between the F- names and the
182 human readable paths is the association list in the MAP file,
183 which, in the early versions of Meta-CVS looked like this:
185 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
186 "README")
187 ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
188 "inc/foo.h")
189 ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
190 "src/Makefile")
191 ("MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
192 "src/foo.c"))
194 The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
195 The purpose of MAP-LOCAL is to keep track of the actual mapping
196 that exists in the user's checked out copy. When an update
197 operation is performed, it may incorporate changes from the
198 repository into MAP, causing the MAP to no longer reflect the
199 local file structure. In fact MAP can at that point contain
200 unresolved conflicts, so that it is not usable by Meta-CVS,
201 requiring manual intervention. The MAP-LOCAL copy, however,
202 remains untouched and consistent.
204 Because Meta-CVS maintains a local copy of the mapping, the
205 Meta-CVS update operation can compute the differences between the
206 new mapping coming from the repository and the local mapping.
207 These differences can then be translated into
208 filesystem-rearranging actions that change the shape of the
209 working copy to bring it up to date. Then MAP and MAP-LOCAL are
210 once again identical.
212 This rearranging is the heart of the Meta-CVS system. Everything
213 else is largely just manipulations of the mappings. For example,
214 renaming a file is simple. Open up MCVS/MAP in a text editor, and
215 change a path (taking care not to create a duplicate, or otherwise
216 corrupt the mapping). Then save it and run the mcvs update.
217 Meta-CVS will propagate the change you made by physically
218 relocating that file. If you like what you have done, simply
219 commit. You can commit at the CVS level within the MCVS
220 directory. But of course, a Meta-CVS file renaming operation is
221 provided, and so is a commit operation, which in addition to
222 running CVS also ensures that the F- files are properly
223 synchronized with their unfolded counterparts.
226 2.2 Symbolic Links
228 In August 2002, support for symbolic links was added to Meta-CVS,
229 and the format of the mapping became more complicated to reflect
230 that. The syntax was extended to allow for different kinds of
231 entries, as well as future extensibility. Each entry now has
232 a Lisp keyword symbol in its first position which identifies
233 its type. The rest of the list specifies the type-specific properties.
234 Currently, there are two types of entries :FILE and :SYMLINK.
236 Right around that time, support for versioned property lists was
237 also added.
239 The previous section's example now looks like this:
241 ((:FILE
242 "MCVS/F-123D61C8FE942733281D2B08C15CD438"
243 "README")
244 (:FILE
245 "MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
246 "inc/foo.h")
247 (:FILE
248 "MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
249 "src/Makefile")
250 (:FILE
251 "MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
252 "src/foo.c"))
254 Executable files have additional material after the path name.
255 Symbolic links look like this:
258 "S-DF03GA1200347CF1935509371F8C1765"
259 "src/foo.c"
260 "../foo.c")
262 which asserts the existence of a symbolic link called src/foo.c
263 whose target is ../foo.c.
265 Both currently supported map entries have an ID string in the second
266 position, and an object path in the third position. The syntax
267 varies after that.
269 Incidentally, Meta-CVS continues to recognize and parse the old
270 format. Once the mapping object is read from the MAP file, the
271 abstract syntax tree is examined to determine whether it conforms
272 to the old syntax or new. Nobody uses the old syntax; only old
273 versions stored in the repository of Meta-CVS itself.
276 2.3 Synchronization
278 The next problem to tackle is how to establish the correspondence
279 between the F- files and the working files. Meta-CVS does this in a
280 platform-specific way, namely by relying on Unix hard links.
282 When Meta-CVS checks out a sandbox, it creates hard links, so that
283 a F- file and its corresponding working file are in fact the same
284 filesystem object. Thus ``unpacking'' the F- files through the
285 mapping does not require the mass duplication of of file data,
286 only the creation of directories and links.
288 The problem is that some operations ``break'' this hard link
289 connection by unlinking a file and overwriting it with a new one
290 that has the same name. The CVS update operation does this, for
291 instance. If cvs up creates a new F- file, that file is no longer
292 connected with the working file.
294 To keep the two synchronized, Meta-CVS performs a synchronization
295 operation. This operation sweeps over the file map, and repairs
296 any broken links. If either of the two files is missing, then a
297 link is created. If both are present, but are distinct objects,
298 then the one with the most recent modification timestamp
299 supersedes; the other is unlinked and replaced with a link to the
300 newer one.
302 A synchronization must be done before any operation which can
303 cause a file to be moved, removed, or to be committed to the CVS
304 repository. In all these situations, the F- files must have
305 the correct contents.
307 The Meta-CVS update operation must perform synchronization twice:
308 before the CVS update to ensure that the F- files carry all of the
309 local changes; then after the CVS update to make sure that any
310 newly incorporated changes propagate back to the working copy.
312 The current behavior of Meta-CVS is more subtle than the above
313 description implies. The synchronization does not process the
314 entire MAP for commands that operate only on a subtree; instead,
315 entries corresponding to that subtree are filtered out of the
316 mapping. Secondly, the synchronization is direction-sensitive.
317 For instance, before a CVS commit, it makes sense to synchronize
318 from the tree to the CVS sandbox, not in the opposite direction.
319 Immediately after a commit, it makes sense to push in the opposite
320 direction, in case CVS modified the commited files (for instance
321 by altering keyword expansions).
323 2.4 Partial Sandboxes
325 Sometimes it is desirable to pull out just a subtree of a larger
326 project from a repository. How can this be done in a version
327 control system that represents the whole tree as a versioned object?
328 Wanting to check out part of the tree seems roughly equivalent to wanting
329 to check out half of a file.
331 Meta-CVS solves this problem by supporting the concept of a partial
332 sandbox. This is a checkout that has the full mapping in the CVS
333 sandbox. A local file called DISPLACED is written which contains
334 the relative pathname of the root of the subtree that is checked out.
335 For example if the testcases/optimization subdirectory of
336 the x-compiler project is checked out, then the DISPLACED file
337 contains the path testcases/optimization.
339 All of the algorithms in Meta-CVS are aware of the DISPLACED path,
340 and properly translate between the /abstract/ paths contained in
341 the mapping and the shorter, /real/ paths in the sandbox tree.
342 This translation is a no-op when there is no DISPLACED file---that
343 is, when the sandbox is a full one.
345 Partial sandboxes behave robustly with respect to mapping changes
346 arriving from the repository. If another user commits a change
347 that moves a file from some currently invisible part of the tree
348 into the visible subtree, this works properly. The opposite direction
349 likewise.
351 Partial sandboxes are used by the grab command to store a new
352 external drop into just a subtree of the project on a particular
353 branch or the trunk. To do this, the grab command's already
354 contorted algorithm had to be infused with translations between
355 abstract and real paths.
358 3. Surprising Advantages
360 The Meta-CVS representation brings with it a few advantages which
361 were not immediately obvious in the design stages, but came to
362 light during development. In addition to the lack of directory
363 structure versioning, CVS has a few other infelicities which go
364 away under Meta-CVS. Also, bringing in the capability to version
365 directory structure also brings in a new concern. Free software
366 developers uses patches to communicate code changes to each other.
367 The traditional tools for producing and applying patches, like
368 CVS, do not handle directory versioning. Meta-CVS has some answers
369 to these problems.
371 3.1 File Adding Conflicts
373 In CVS, it can happen that two (or more) developers working on the
374 same module, add a file to the same directory, and all use the
375 same file name. The first developer commits the file, and then
376 problems occurs for the subsequent developers who try to commit.
377 CVS complains that the file was independently added by a second
378 party, and not allow the commit to proceed.
380 In Meta-CVS, this cannot happen. Meta-CVS recognizes that if two
381 people add a file, it is not the same file. Names do not determine
382 equivalence, semantics does! When a file is added to Meta-CVS, a
383 F- file is created to represent it. That F- file name contains a
384 randomly chosen 128-bit number, expressed in hexadecimal. It is
385 extremely unlikely that two such numbers will collide, so in
386 practice, one will ``never'' see the aforementioned CVS error
387 message.
389 Instead, what will happen when developers choose the same path
390 name for a file is that either a conflict will arise in the MAP
391 file, which will have to be resolved, or else the mapping will
392 contain a duplicate path name, which can be detected by Meta-CVS
393 as an error which again, the users must resolve. Each file is a
394 separate object with its own version history; that two objects
395 accidentally map to the same name is a minor, correctable problem.
397 3.2 File Removal Conflicts
399 CVS does not behave very well when one developer deletes a file,
400 via cvs remove, and another tries to continue comitting changes.
402 This is really just an instance of the classic problem of
403 computing the object lifetimes, translated to the domain of
404 version control.
406 The cleanest solution to the problem of computing object lifetimes
407 is garbage collection, which ensures that as long as an object can
408 still be used, it persists, and thereafter, it is automatically
409 removed when the system finds it necessary or convenient to do so.
411 It turns out that Meta-CVS supports a kind of garbage collection
412 concept. When a file is removed, it does not have to be subject to
413 ``cvs remove''. It only has to be removed from the file mapping,
414 but the F- file can remain unremoved. What this means is that the
415 F- file continues to be checked out, so it occupies bandwidth and
416 space. What happens if a user has outstanding changes, and
417 performs an Meta-CVS update which removes the file? The link
418 synchronization ensures that the outstanding changes are
419 transferred to the F- file before the removal. So the changes are
420 not lost! It is possible to manually restore that F- file in the
421 MAP to give it a ``new lease on life''. This is analogous to
422 sifting through garbage, to salvage it by making it reachable
423 again. And, of course, changes to the F- file itself can be committed to
424 CVS whether or not it is reentered into the map.
426 The space problem can be dealt with by a Meta-CVS ``garbage
427 collection'' routine that can be invoked administratively. This
428 will sweep through the F- files, identify any which have no
429 mapping, and ``cvs remove'' these.
431 3.3 Diffing and Patching
433 Another surprising advantage of Meta-CVS is that it addresses the
434 problem of distributing patches which patch the file system
435 structure as well as contents.
437 The F- and MAP files in fact constitute an interchange format for
438 the distribution of program source which, in principle, amplifies
439 the capabilities of any change management tools that are based on
440 flat files.
442 A developer can obtain a copy of a project in Meta-CVS form, then
443 work on making changes, including the renaming of paths. These
444 changes are represented in a new Meta-CVS file set. A diff is
445 computed between the new and the old. Someone with a copy of the
446 original can patch it, to reproduce the changes. All that is
447 needed is the Meta-CVS software to realize the rearrangements.

  ViewVC Help
Powered by ViewVC 1.1.5