ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.11 - (show annotations)
Wed Feb 4 07:42:32 2004 UTC (10 years, 2 months ago) by kaz
Branch: MAIN
Changes since 1.10: +139 -40 lines
Merging from mcvs-1-0-branch.
1 Meta-CVS --- A Directory Structure
2 Versioning Layer Over
3 The Concurrent Versions System.
5 Kaz Kylheku
6 Originally published January 25, 2002
7 Edited and Revised February 3, 2004
10 "Directory versioning is a Hard Problem" -- Subversion FAQ
12 "Any problem in computer science can be solved with
13 another layer of indirection" -- David Wheeler
16 Abstract
18 This is Meta-CVS, Meta-(Concurrent Versions System), a front end for
19 CVS. It supports the concurrent and independent versioning of
20 files, as well as a directory structure, by several people. I have
21 it been using it for a few weeks now, mostly just to version the
22 Meta-CVS sources themselves. It uses the cvs program in such a way that
23 you can not only version the file contents, but you can move and
24 rename files. These changes are committed to the repository, and
25 can be picked up by an update, which will incorporate them by
26 rearranging the working copy accordingly. There can be conflicting
27 parallel changes to the structure, which can be resolved like any
28 other conflict. It is all Lisp.
31 Contents
33 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 42
34 2. Data Representation Overview . . . . . . . . . . . . . . . . . 75
35 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 120
36 2.2 Symbolic links . . . . . . . . . . . . . . . . . . . . 210
37 2.3 Synchronization . . . . . . . . . . . . . . . . . . . . 210
38 2.4 Partial Sandboxes . . . . . . . . . . . . . . . . . . . 210
39 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 247
40 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 260
41 3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 286
42 3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 320
45 1. Introduction
47 The software known as CVS has been in existence since the year
48 1986, when its first version, consisting of shell scripts acting
49 as a front end to RCS commands, was posted to Usenet by Dick
50 Grune. Over the next fifteen years, CVS was turned into a C
51 program, enhanced and debugged. But in its present form, version
52 1.11, it still has annoying quirks and some serious limitations.
54 One of the biggest limitations of CVS is it does not treat the
55 directory structure of a module as a versioned object. Meta-CVS
56 solves this problem not by intruding in any way into the
57 well-debugged and time-tested CVS code, but by introducing a layer
58 of indirection. Meta-CVS retains the fundamental capabilites of
59 CVS: the ability to branch and merge, to work in parallel, to work
60 over a variety of network transports and so on. CVS worked as a
61 front end for RCS; similarly, Meta-CVS is a front end for CVS.
63 It turns out that Meta-CVS solves a few other infelicities in CVS
64 as well. A few tricky scenarios that cause grief in CVS are no
65 longer problems in Meta-CVS, such as: two developers concurrently
66 adding the same file, or one developer removing a file that
67 another is working on.
69 Meta-CVS works by creating a special representation of the
70 versioned file tree, and this special representation is what is
71 stored in CVS. Thus the naive direct mapping between the versioned
72 tree and the tree in the repository is avoided.
74 The aim of this paper is to document this simple representation
75 and explain how it supports the directory versioning operation.
78 2. Data Representation Overview
80 In order to obtain, from CVS, the ability to perform parallel
81 version control over any object, it is necessary to represent that
82 object as a text file. This is a given. CVS can effectively handle
83 only text input in its merging and conflict identification
84 algorithms. A critical non-functional constraint in the
85 requirements of Meta-CVS is that CVS is not to be modified in any
86 way; nobody should have to to install new CVS code on a client or
87 server machine to use Meta-CVS. Morever, the CVS code is fragile C
88 that has been debugged for over a decade (and counting).
90 To treat the file structure as a versioned entity, therefore, it
91 is necessary to represent it as a text file. What structure should
92 that text file have?
94 Firstly, it would be highly desirable if small changes, such as
95 renaming a few files, gave rise to small differences. Moreover,
96 a single change should only affect at most one line or two in the
97 text file. This property would allow for parallel changes with
98 minimal conflicts. The text file representation should also be
99 human readable and editable, because humans will have to resolve
100 conflicts in it.
102 Secondly, a file must somehow retain its identity and CVS history
103 when its path name changes. This means that we must never change
104 the name of the file, at least not the name which is known to CVS.
106 Meta-CVS represents the file structure of a project as a simple
107 entity called a ``file mapping''. The file mapping associates path
108 names with a flat database of files. Both the mapping and the
109 files are stored in CVS. The files have machine-generated names;
110 only through the mapping are they given actual names as they
111 appear to the users. The names known to CVS are called ``F-
112 files''.
114 Meta-CVS manipulates the mapping as a simple data structure in the
115 Lisp language. Lisp has a built-in parser and formatter for
116 reading a printed representation of a list object and producing a
117 printed representation. Thus the text file format for the Meta-CVS
118 mapping is simply a file containing a Lisp association list, with
119 special care taken to print each element of the association on a
120 separate line of text, and maintaining a consistently sorted order.
122 2.1 File Mapping Example
124 Suppose that some project 'foo' consists of these files:
126 foo/README
127 foo/inc/foo.h
128 foo/src/Makefile
129 foo/src/foo.c
131 what does a Meta-CVS representation look like? This is best
132 understood in terms of the working copy checked out from CVS via
133 Meta-CVS, which contains these things:
135 foo/MCVS/CVS/Entries
136 foo/MCVS/CVS/... other CVS metadata ...
138 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
139 foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h
140 foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
141 foo/MCVS/F-1C43C940D8745CAA78752C1206316B55.c
142 foo/MCVS/MAP
145 foo/README
146 foo/inc/foo.h
147 foo/src/Makefile
148 foo/src/foo.c
150 There is a subdirectory called MCVS, which contains a CVS
151 subdirectory. This MCVS subdirectory is in fact the CVS
152 ``sandbox''. Everything else under foo are the working files.
153 Thus every Meta-CVS working copy is just an ordinary file tree,
154 except that the top level directory contains a MCVS subdirectory
155 with interesting contents.
157 What are these files under MCVS? There are some files with cryptic
158 names like F-123D...438. Then there are two files MAP and
161 Firstly, it should be understood that the F- files and MAP are
162 versioned in CVS. On the other hand, MAP-LOCAL is a file that is
163 not known to CVS, but important to the functioning of Meta-CVS.
165 The four F- files are the actual CVS representations of
166 foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
168 What establishes the relationship between the F- names and the
169 human readable paths is the association list in the MAP file,
170 which, in the early versions of Meta-CVS looked like this:
172 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
173 "README")
174 ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
175 "inc/foo.h")
176 ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
177 "src/Makefile")
178 ("MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
179 "src/foo.c"))
181 The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
182 The purpose of MAP-LOCAL is to keep track of the actual mapping
183 that exists in the user's checked out copy. When an update
184 operation is performed, it may incorporate changes from the
185 repository into MAP, causing the MAP to no longer reflect the
186 local file structure. In fact MAP can at that point contain
187 unresolved conflicts, so that it is not usable by Meta-CVS,
188 requiring manual intervention. The MAP-LOCAL copy, however,
189 remains untouched and consistent.
191 Because Meta-CVS maintains a local copy of the mapping, the
192 Meta-CVS update operation can compute the differences between the
193 new mapping coming from the repository and the local mapping.
194 These differences can then be translated into
195 filesystem-rearranging actions that change the shape of the
196 working copy to bring it up to date. Then MAP and MAP-LOCAL are
197 once again identical.
199 This rearranging is the heart of the Meta-CVS system. Everything
200 else is largely just manipulations of the mappings. For example,
201 renaming a file is simple. Open up MCVS/MAP in a text editor, and
202 change a path (taking care not to create a duplicate, or otherwise
203 corrupt the mapping). Then save it and run the mcvs update.
204 Meta-CVS will propagate the change you made by physically
205 relocating that file. If you like what you have done, simply
206 commit. You can commit at the CVS level within the MCVS
207 directory. But of course, a Meta-CVS file renaming operation is
208 provided, and so is a commit operation, which in addition to
209 running CVS also ensures that the F- files are properly
210 synchronized with their unfolded counterparts.
213 2.2 Symbolic Links
215 In August 2002, support for symbolic links was added to Meta-CVS,
216 and the format of the mapping became more complicated to reflect
217 that. The syntax was extended to allow for different kinds of
218 entries, as well as future extensibility. Each entry now has
219 a Lisp keyword symbol in its first position which identifies
220 its type. The rest of the list specifies the type-specific properties.
221 Currently, there are two types of entries :FILE and :SYMLINK.
223 Right around that time, support for versioned property lists was
224 also added.
226 The previous section's example now looks like this:
228 ((:FILE
229 "MCVS/F-123D61C8FE942733281D2B08C15CD438"
230 "README")
231 (:FILE
232 "MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
233 "inc/foo.h")
234 (:FILE
235 "MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
236 "src/Makefile")
237 (:FILE
238 "MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
239 "src/foo.c"))
241 Executable files have additional material after the path name.
242 Symbolic links look like this:
245 "S-DF03GA1200347CF1935509371F8C1765"
246 "src/foo.c"
247 "../foo.c")
249 which asserts the existence of a symbolic link called src/foo.c
250 whose target is ../foo.c.
252 Both currently supported map entries have an ID string in the second
253 position, and an object path in the third position. The syntax
254 varies after that.
256 Incidentally, Meta-CVS continues to recognize and parse the old
257 format. Once the mapping object is read from the MAP file, the
258 abstract syntax tree is examined to determine whether it conforms
259 to the old syntax or new. Nobody uses the old syntax; only old
260 versions stored in the repository of Meta-CVS itself.
263 2.3 Synchronization
265 The next problem to tackle is how to establish the correspondence
266 between the F- files and the working files. Meta-CVS does this in a
267 platform-specific way, namely by relying on Unix hard links.
269 When Meta-CVS checks out a sandbox, it creates hard links, so that
270 a F- file and its corresponding working file are in fact the same
271 filesystem object. Thus ``unpacking'' the F- files through the
272 mapping does not require the mass duplication of of file data,
273 only the creation of directories and links.
275 The problem is that some operations ``break'' this hard link
276 connection by unlinking a file and overwriting it with a new one
277 that has the same name. The CVS update operation does this, for
278 instance. If cvs up creates a new F- file, that file is no longer
279 connected with the working file.
281 To keep the two synchronized, Meta-CVS performs a synchronization
282 operation. This operation sweeps over the file map, and repairs
283 any broken links. If either of the two files is missing, then a
284 link is created. If both are present, but are distinct objects,
285 then the one with the most recent modification timestamp
286 supersedes; the other is unlinked and replaced with a link to the
287 newer one.
289 A synchronization must be done before any operation which can
290 cause a file to be moved, removed, or to be committed to the CVS
291 repository. In all these situations, the F- files must have
292 the correct contents.
294 The Meta-CVS update operation must perform synchronization twice:
295 before the CVS update to ensure that the F- files carry all of the
296 local changes; then after the CVS update to make sure that any
297 newly incorporated changes propagate back to the working copy.
299 The current behavior of Meta-CVS is more subtle than the above
300 description implies. The synchronization does not process the
301 entire MAP for commands that operate only on a subtree; instead,
302 entries corresponding to that subtree are filtered out of the
303 mapping. Secondly, the synchronization is direction-sensitive.
304 For instance, before a CVS commit, it makes sense to synchronize
305 from the tree to the CVS sandbox, not in the opposite direction.
306 Immediately after a commit, it makes sense to push in the opposite
307 direction, in case CVS modified the commited files (for instance
308 by altering keyword expansions).
310 2.4 Partial Sandboxes
312 Sometimes it is desirable to pull out just a subtree of a larger
313 project from a repository. How can this be done in a version
314 control system that represents the whole tree as a versioned object?
315 Wanting to check out part of the tree seems roughly equivalent to wanting
316 to check out half of a file.
318 Meta-CVS solves this problem by supporting the concept of a partial
319 sandbox. This is a checkout that has the full mapping in the CVS
320 sandbox. A local file called DISPLACED is written which contains
321 the relative pathname of the root of the subtree that is checked out.
322 For example if the testcases/optimization subdirectory of
323 the x-compiler project is checked out, then the DISPLACED file
324 contains the path testcases/optimization.
326 All of the algorithms in Meta-CVS are aware of the DISPLACED path,
327 and properly translate between the /abstract/ paths contained in
328 the mapping and the shorter, /real/ paths in the sandbox tree.
329 This translation is a no-op when there is no DISPLACED file---that
330 is, when the sandbox is a full one.
332 Partial sandboxes behave robustly with respect to mapping changes
333 arriving from the repository. If another user commits a change
334 that moves a file from some currently invisible part of the tree
335 into the visible subtree, this works properly. The opposite direction
336 likewise.
338 Partial sandboxes are used by the grab command to store a new
339 external drop into just a subtree of the project on a particular
340 branch or the trunk. To do this, the grab command's already
341 contorted algorithm had to be infused with translations between
342 abstract and real paths.
345 3. Surprising Advantages
347 The Meta-CVS representation brings with it a few advantages which
348 were not immediately obvious in the design stages, but came to
349 light during development. In addition to the lack of directory
350 structure versioning, CVS has a few other infelicities which go
351 away under Meta-CVS. Also, bringing in the capability to version
352 directory structure also brings in a new concern. Free software
353 developers uses patches to communicate code changes to each other.
354 The traditional tools for producing and applying patches, like
355 CVS, do not handle directory versioning. Meta-CVS has some answers
356 to these problems.
358 3.1 File Adding Conflicts
360 In CVS, it can happen that two (or more) developers working on the
361 same module, add a file to the same directory, and all use the
362 same file name. The first developer commits the file, and then
363 problems occurs for the subsequent developers who try to commit.
364 CVS complains that the file was independently added by a second
365 party, and not allow the commit to proceed.
367 In Meta-CVS, this cannot happen. Meta-CVS recognizes that if two
368 people add a file, it is not the same file. Names do not determine
369 equivalence, semantics does! When a file is added to Meta-CVS, a
370 F- file is created to represent it. That F- file name contains a
371 randomly chosen 128-bit number, expressed in hexadecimal. It is
372 extremely unlikely that two such numbers will collide, so in
373 practice, one will ``never'' see the aforementioned CVS error
374 message.
376 Instead, what will happen when developers choose the same path
377 name for a file is that either a conflict will arise in the MAP
378 file, which will have to be resolved, or else the mapping will
379 contain a duplicate path name, which can be detected by Meta-CVS
380 as an error which again, the users must resolve. Each file is a
381 separate object with its own version history; that two objects
382 accidentally map to the same name is a minor, correctable problem.
384 3.2 File Removal Conflicts
386 CVS does not behave very well when one developer deletes a file,
387 via cvs remove, and another tries to continue comitting changes.
389 This is really just an instance of the classic problem of
390 computing the object lifetimes, translated to the domain of
391 version control.
393 The cleanest solution to the problem of computing object lifetimes
394 is garbage collection, which ensures that as long as an object can
395 still be used, it persists, and thereafter, it is automatically
396 removed when the system finds it necessary or convenient to do so.
398 It turns out that Meta-CVS supports a kind of garbage collection
399 concept. When a file is removed, it does not have to be subject to
400 ``cvs remove''. It only has to be removed from the file mapping,
401 but the F- file can remain unremoved. What this means is that the
402 F- file continues to be checked out, so it occupies bandwidth and
403 space. What happens if a user has outstanding changes, and
404 performs an Meta-CVS update which removes the file? The link
405 synchronization ensures that the outstanding changes are
406 transferred to the F- file before the removal. So the changes are
407 not lost! It is possible to manually restore that F- file in the
408 MAP to give it a ``new lease on life''. This is analogous to
409 sifting through garbage, to salvage it by making it reachable
410 again. And, of course, changes to the F- file itself can be committed to
411 CVS whether or not it is reentered into the map.
413 The space problem can be dealt with by a Meta-CVS ``garbage
414 collection'' routine that can be invoked administratively. This
415 will sweep through the F- files, identify any which have no
416 mapping, and ``cvs remove'' these.
418 3.3 Diffing and Patching
420 Another surprising advantage of Meta-CVS is that it addresses the
421 problem of distributing patches which patch the file system
422 structure as well as contents.
424 The F- and MAP files in fact constitute an interchange format for
425 the distribution of program source which, in principle, amplifies
426 the capabilities of any change management tools that are based on
427 flat files.
429 A developer can obtain a copy of a project in Meta-CVS form, then
430 work on making changes, including the renaming of paths. These
431 changes are represented in a new Meta-CVS file set. A diff is
432 computed between the new and the old. Someone with a copy of the
433 original can patch it, to reproduce the changes. All that is
434 needed is the Meta-CVS software to realize the rearrangements.

  ViewVC Help
Powered by ViewVC 1.1.5