/[meta-cvs]/meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6
ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.11 - (hide annotations)
Wed Feb 4 07:42:32 2004 UTC (10 years, 2 months ago) by kaz
Branch: MAIN
Changes since 1.10: +139 -40 lines
Merging from mcvs-1-0-branch.
1 kaz 1.11 Meta-CVS --- A Directory Structure
2     Versioning Layer Over
3     The Concurrent Versions System.
4 kaz 1.1
5 kaz 1.11 Kaz Kylheku
6     Originally published January 25, 2002
7     Edited and Revised February 3, 2004
8 kaz 1.1
9 kaz 1.4
10 kaz 1.11 "Directory versioning is a Hard Problem" -- Subversion FAQ
11 kaz 1.9
12 kaz 1.11 "Any problem in computer science can be solved with
13     another layer of indirection" -- David Wheeler
14 kaz 1.4
15    
16 kaz 1.1 Abstract
17    
18 kaz 1.5 This is Meta-CVS, Meta-(Concurrent Versions System), a front end for
19 kaz 1.1 CVS. It supports the concurrent and independent versioning of
20     files, as well as a directory structure, by several people. I have
21 kaz 1.2 it been using it for a few weeks now, mostly just to version the
22 kaz 1.5 Meta-CVS sources themselves. It uses the cvs program in such a way that
23 kaz 1.2 you can not only version the file contents, but you can move and
24     rename files. These changes are committed to the repository, and
25     can be picked up by an update, which will incorporate them by
26 kaz 1.1 rearranging the working copy accordingly. There can be conflicting
27 kaz 1.2 parallel changes to the structure, which can be resolved like any
28     other conflict. It is all Lisp.
29 kaz 1.1
30    
31     Contents
32    
33 kaz 1.9 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 42
34     2. Data Representation Overview . . . . . . . . . . . . . . . . . 75
35 kaz 1.11 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 120
36     2.2 Symbolic links . . . . . . . . . . . . . . . . . . . . 210
37     2.3 Synchronization . . . . . . . . . . . . . . . . . . . . 210
38     2.4 Partial Sandboxes . . . . . . . . . . . . . . . . . . . 210
39 kaz 1.9 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 247
40 kaz 1.11 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 260
41     3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 286
42     3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 320
43 kaz 1.4
44 kaz 1.1
45     1. Introduction
46    
47     The software known as CVS has been in existence since the year
48 kaz 1.3 1986, when its first version, consisting of shell scripts acting
49     as a front end to RCS commands, was posted to Usenet by Dick
50     Grune. Over the next fifteen years, CVS was turned into a C
51     program, enhanced and debugged. But in its present form, version
52     1.11, it still has annoying quirks and some serious limitations.
53 kaz 1.1
54 kaz 1.3 One of the biggest limitations of CVS is it does not treat the
55 kaz 1.5 directory structure of a module as a versioned object. Meta-CVS
56     solves this problem not by intruding in any way into the
57     well-debugged and time-tested CVS code, but by introducing a layer
58     of indirection. Meta-CVS retains the fundamental capabilites of
59     CVS: the ability to branch and merge, to work in parallel, to work
60     over a variety of network transports and so on. CVS worked as a
61     front end for RCS; similarly, Meta-CVS is a front end for CVS.
62    
63     It turns out that Meta-CVS solves a few other infelicities in CVS
64     as well. A few tricky scenarios that cause grief in CVS are no
65     longer problems in Meta-CVS, such as: two developers concurrently
66 kaz 1.1 adding the same file, or one developer removing a file that
67     another is working on.
68    
69 kaz 1.5 Meta-CVS works by creating a special representation of the
70     versioned file tree, and this special representation is what is
71     stored in CVS. Thus the naive direct mapping between the versioned
72     tree and the tree in the repository is avoided.
73 kaz 1.1
74     The aim of this paper is to document this simple representation
75     and explain how it supports the directory versioning operation.
76    
77    
78     2. Data Representation Overview
79    
80     In order to obtain, from CVS, the ability to perform parallel
81 kaz 1.2 version control over any object, it is necessary to represent that
82     object as a text file. This is a given. CVS can effectively handle
83     only text input in its merging and conflict identification
84 kaz 1.3 algorithms. A critical non-functional constraint in the
85 kaz 1.5 requirements of Meta-CVS is that CVS is not to be modified in any
86     way; nobody should have to to install new CVS code on a client or
87     server machine to use Meta-CVS. Morever, the CVS code is fragile C
88 kaz 1.3 that has been debugged for over a decade (and counting).
89 kaz 1.1
90     To treat the file structure as a versioned entity, therefore, it
91     is necessary to represent it as a text file. What structure should
92     that text file have?
93    
94     Firstly, it would be highly desirable if small changes, such as
95     renaming a few files, gave rise to small differences. Moreover,
96     a single change should only affect at most one line or two in the
97     text file. This property would allow for parallel changes with
98     minimal conflicts. The text file representation should also be
99     human readable and editable, because humans will have to resolve
100     conflicts in it.
101    
102     Secondly, a file must somehow retain its identity and CVS history
103     when its path name changes. This means that we must never change
104     the name of the file, at least not the name which is known to CVS.
105    
106 kaz 1.5 Meta-CVS represents the file structure of a project as a simple
107     entity called a ``file mapping''. The file mapping associates path
108     names with a flat database of files. Both the mapping and the
109     files are stored in CVS. The files have machine-generated names;
110     only through the mapping are they given actual names as they
111     appear to the users. The names known to CVS are called ``F-
112     files''.
113 kaz 1.1
114 kaz 1.5 Meta-CVS manipulates the mapping as a simple data structure in the
115 kaz 1.2 Lisp language. Lisp has a built-in parser and formatter for
116 kaz 1.10 reading a printed representation of a list object and producing a
117 kaz 1.5 printed representation. Thus the text file format for the Meta-CVS
118 kaz 1.2 mapping is simply a file containing a Lisp association list, with
119     special care taken to print each element of the association on a
120 kaz 1.10 separate line of text, and maintaining a consistently sorted order.
121 kaz 1.1
122     2.1 File Mapping Example
123    
124 kaz 1.6 Suppose that some project 'foo' consists of these files:
125 kaz 1.1
126 kaz 1.11 foo/README
127     foo/inc/foo.h
128     foo/src/Makefile
129     foo/src/foo.c
130 kaz 1.1
131 kaz 1.5 what does a Meta-CVS representation look like? This is best
132     understood in terms of the working copy checked out from CVS via
133     Meta-CVS, which contains these things:
134 kaz 1.1
135 kaz 1.11 foo/MCVS/CVS/Entries
136     foo/MCVS/CVS/... other CVS metadata ...
137 kaz 1.1
138 kaz 1.11 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
139     foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h
140     foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
141     foo/MCVS/F-1C43C940D8745CAA78752C1206316B55.c
142     foo/MCVS/MAP
143     foo/MCVS/MAP-LOCAL
144    
145     foo/README
146     foo/inc/foo.h
147     foo/src/Makefile
148     foo/src/foo.c
149 kaz 1.1
150     There is a subdirectory called MCVS, which contains a CVS
151 kaz 1.3 subdirectory. This MCVS subdirectory is in fact the CVS
152     ``sandbox''. Everything else under foo are the working files.
153 kaz 1.5 Thus every Meta-CVS working copy is just an ordinary file tree,
154     except that the top level directory contains a MCVS subdirectory
155     with interesting contents.
156 kaz 1.1
157 kaz 1.2 What are these files under MCVS? There are some files with cryptic
158     names like F-123D...438. Then there are two files MAP and
159 kaz 1.1 MAP-LOCAL.
160    
161 kaz 1.3 Firstly, it should be understood that the F- files and MAP are
162     versioned in CVS. On the other hand, MAP-LOCAL is a file that is
163 kaz 1.5 not known to CVS, but important to the functioning of Meta-CVS.
164 kaz 1.1
165     The four F- files are the actual CVS representations of
166     foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
167    
168     What establishes the relationship between the F- names and the
169 kaz 1.3 human readable paths is the association list in the MAP file,
170 kaz 1.11 which, in the early versions of Meta-CVS looked like this:
171 kaz 1.1
172 kaz 1.11 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
173     "README")
174     ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
175     "inc/foo.h")
176     ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
177     "src/Makefile")
178     ("MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
179     "src/foo.c"))
180 kaz 1.1
181     The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
182     The purpose of MAP-LOCAL is to keep track of the actual mapping
183     that exists in the user's checked out copy. When an update
184 kaz 1.3 operation is performed, it may incorporate changes from the
185 kaz 1.1 repository into MAP, causing the MAP to no longer reflect the
186     local file structure. In fact MAP can at that point contain
187 kaz 1.5 unresolved conflicts, so that it is not usable by Meta-CVS,
188     requiring manual intervention. The MAP-LOCAL copy, however,
189     remains untouched and consistent.
190    
191     Because Meta-CVS maintains a local copy of the mapping, the
192     Meta-CVS update operation can compute the differences between the
193     new mapping coming from the repository and the local mapping.
194     These differences can then be translated into
195     filesystem-rearranging actions that change the shape of the
196     working copy to bring it up to date. Then MAP and MAP-LOCAL are
197     once again identical.
198 kaz 1.1
199 kaz 1.5 This rearranging is the heart of the Meta-CVS system. Everything
200     else is largely just manipulations of the mappings. For example,
201 kaz 1.1 renaming a file is simple. Open up MCVS/MAP in a text editor, and
202 kaz 1.3 change a path (taking care not to create a duplicate, or otherwise
203 kaz 1.5 corrupt the mapping). Then save it and run the mcvs update.
204     Meta-CVS will propagate the change you made by physically
205     relocating that file. If you like what you have done, simply
206     commit. You can commit at the CVS level within the MCVS
207     directory. But of course, a Meta-CVS file renaming operation is
208     provided, and so is a commit operation, which in addition to
209     running CVS also ensures that the F- files are properly
210     synchronized with their unfolded counterparts.
211 kaz 1.1
212 kaz 1.11
213     2.2 Symbolic Links
214    
215     In August 2002, support for symbolic links was added to Meta-CVS,
216     and the format of the mapping became more complicated to reflect
217     that. The syntax was extended to allow for different kinds of
218     entries, as well as future extensibility. Each entry now has
219     a Lisp keyword symbol in its first position which identifies
220     its type. The rest of the list specifies the type-specific properties.
221     Currently, there are two types of entries :FILE and :SYMLINK.
222    
223     Right around that time, support for versioned property lists was
224     also added.
225    
226     The previous section's example now looks like this:
227    
228     ((:FILE
229     "MCVS/F-123D61C8FE942733281D2B08C15CD438"
230     "README")
231     (:FILE
232     "MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
233     "inc/foo.h")
234     (:FILE
235     "MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
236     "src/Makefile")
237     (:FILE
238     "MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
239     "src/foo.c"))
240    
241     Executable files have additional material after the path name.
242     Symbolic links look like this:
243    
244     (:SYMLINK
245     "S-DF03GA1200347CF1935509371F8C1765"
246     "src/foo.c"
247     "../foo.c")
248    
249     which asserts the existence of a symbolic link called src/foo.c
250     whose target is ../foo.c.
251    
252     Both currently supported map entries have an ID string in the second
253     position, and an object path in the third position. The syntax
254     varies after that.
255    
256     Incidentally, Meta-CVS continues to recognize and parse the old
257     format. Once the mapping object is read from the MAP file, the
258     abstract syntax tree is examined to determine whether it conforms
259     to the old syntax or new. Nobody uses the old syntax; only old
260     versions stored in the repository of Meta-CVS itself.
261    
262    
263     2.3 Synchronization
264 kaz 1.1
265     The next problem to tackle is how to establish the correspondence
266 kaz 1.5 between the F- files and the working files. Meta-CVS does this in a
267 kaz 1.1 platform-specific way, namely by relying on Unix hard links.
268    
269 kaz 1.5 When Meta-CVS checks out a sandbox, it creates hard links, so that
270     a F- file and its corresponding working file are in fact the same
271 kaz 1.1 filesystem object. Thus ``unpacking'' the F- files through the
272 kaz 1.3 mapping does not require the mass duplication of of file data,
273 kaz 1.1 only the creation of directories and links.
274    
275     The problem is that some operations ``break'' this hard link
276     connection by unlinking a file and overwriting it with a new one
277     that has the same name. The CVS update operation does this, for
278     instance. If cvs up creates a new F- file, that file is no longer
279     connected with the working file.
280    
281 kaz 1.5 To keep the two synchronized, Meta-CVS performs a synchronization
282 kaz 1.1 operation. This operation sweeps over the file map, and repairs
283     any broken links. If either of the two files is missing, then a
284     link is created. If both are present, but are distinct objects,
285 kaz 1.5 then the one with the most recent modification timestamp
286     supersedes; the other is unlinked and replaced with a link to the
287     newer one.
288 kaz 1.1
289 kaz 1.3 A synchronization must be done before any operation which can
290     cause a file to be moved, removed, or to be committed to the CVS
291     repository. In all these situations, the F- files must have
292     the correct contents.
293 kaz 1.1
294 kaz 1.5 The Meta-CVS update operation must perform synchronization twice:
295 kaz 1.2 before the CVS update to ensure that the F- files carry all of the
296     local changes; then after the CVS update to make sure that any
297     newly incorporated changes propagate back to the working copy.
298 kaz 1.11
299     The current behavior of Meta-CVS is more subtle than the above
300     description implies. The synchronization does not process the
301     entire MAP for commands that operate only on a subtree; instead,
302     entries corresponding to that subtree are filtered out of the
303     mapping. Secondly, the synchronization is direction-sensitive.
304     For instance, before a CVS commit, it makes sense to synchronize
305     from the tree to the CVS sandbox, not in the opposite direction.
306     Immediately after a commit, it makes sense to push in the opposite
307     direction, in case CVS modified the commited files (for instance
308     by altering keyword expansions).
309    
310     2.4 Partial Sandboxes
311    
312     Sometimes it is desirable to pull out just a subtree of a larger
313     project from a repository. How can this be done in a version
314     control system that represents the whole tree as a versioned object?
315     Wanting to check out part of the tree seems roughly equivalent to wanting
316     to check out half of a file.
317    
318     Meta-CVS solves this problem by supporting the concept of a partial
319     sandbox. This is a checkout that has the full mapping in the CVS
320     sandbox. A local file called DISPLACED is written which contains
321     the relative pathname of the root of the subtree that is checked out.
322     For example if the testcases/optimization subdirectory of
323     the x-compiler project is checked out, then the DISPLACED file
324     contains the path testcases/optimization.
325    
326     All of the algorithms in Meta-CVS are aware of the DISPLACED path,
327     and properly translate between the /abstract/ paths contained in
328     the mapping and the shorter, /real/ paths in the sandbox tree.
329     This translation is a no-op when there is no DISPLACED file---that
330     is, when the sandbox is a full one.
331    
332     Partial sandboxes behave robustly with respect to mapping changes
333     arriving from the repository. If another user commits a change
334     that moves a file from some currently invisible part of the tree
335     into the visible subtree, this works properly. The opposite direction
336     likewise.
337    
338     Partial sandboxes are used by the grab command to store a new
339     external drop into just a subtree of the project on a particular
340     branch or the trunk. To do this, the grab command's already
341     contorted algorithm had to be infused with translations between
342     abstract and real paths.
343 kaz 1.2
344    
345     3. Surprising Advantages
346    
347 kaz 1.5 The Meta-CVS representation brings with it a few advantages which
348     were not immediately obvious in the design stages, but came to
349     light during development. In addition to the lack of directory
350     structure versioning, CVS has a few other infelicities which go
351     away under Meta-CVS. Also, bringing in the capability to version
352     directory structure also brings in a new concern. Free software
353     developers uses patches to communicate code changes to each other.
354     The traditional tools for producing and applying patches, like
355     CVS, do not handle directory versioning. Meta-CVS has some answers
356     to these problems.
357 kaz 1.2
358     3.1 File Adding Conflicts
359    
360     In CVS, it can happen that two (or more) developers working on the
361     same module, add a file to the same directory, and all use the
362 kaz 1.3 same file name. The first developer commits the file, and then
363     problems occurs for the subsequent developers who try to commit.
364     CVS complains that the file was independently added by a second
365     party, and not allow the commit to proceed.
366 kaz 1.2
367 kaz 1.5 In Meta-CVS, this cannot happen. Meta-CVS recognizes that if two
368     people add a file, it is not the same file. Names do not determine
369     equivalence, semantics does! When a file is added to Meta-CVS, a
370     F- file is created to represent it. That F- file name contains a
371     randomly chosen 128-bit number, expressed in hexadecimal. It is
372     extremely unlikely that two such numbers will collide, so in
373     practice, one will ``never'' see the aforementioned CVS error
374     message.
375 kaz 1.2
376     Instead, what will happen when developers choose the same path
377     name for a file is that either a conflict will arise in the MAP
378     file, which will have to be resolved, or else the mapping will
379 kaz 1.5 contain a duplicate path name, which can be detected by Meta-CVS
380     as an error which again, the users must resolve. Each file is a
381     separate object with its own version history; that two objects
382     accidentally map to the same name is a minor, correctable problem.
383 kaz 1.2
384     3.2 File Removal Conflicts
385    
386     CVS does not behave very well when one developer deletes a file,
387     via cvs remove, and another tries to continue comitting changes.
388    
389 kaz 1.3 This is really just an instance of the classic problem of
390     computing the object lifetimes, translated to the domain of
391     version control.
392 kaz 1.2
393     The cleanest solution to the problem of computing object lifetimes
394     is garbage collection, which ensures that as long as an object can
395 kaz 1.3 still be used, it persists, and thereafter, it is automatically
396     removed when the system finds it necessary or convenient to do so.
397 kaz 1.2
398 kaz 1.5 It turns out that Meta-CVS supports a kind of garbage collection
399 kaz 1.2 concept. When a file is removed, it does not have to be subject to
400     ``cvs remove''. It only has to be removed from the file mapping,
401 kaz 1.3 but the F- file can remain unremoved. What this means is that the
402 kaz 1.10 F- file continues to be checked out, so it occupies bandwidth and
403 kaz 1.5 space. What happens if a user has outstanding changes, and
404     performs an Meta-CVS update which removes the file? The link
405 kaz 1.3 synchronization ensures that the outstanding changes are
406 kaz 1.10 transferred to the F- file before the removal. So the changes are
407 kaz 1.3 not lost! It is possible to manually restore that F- file in the
408     MAP to give it a ``new lease on life''. This is analogous to
409     sifting through garbage, to salvage it by making it reachable
410 kaz 1.10 again. And, of course, changes to the F- file itself can be committed to
411     CVS whether or not it is reentered into the map.
412 kaz 1.2
413 kaz 1.5 The space problem can be dealt with by a Meta-CVS ``garbage
414 kaz 1.2 collection'' routine that can be invoked administratively. This
415     will sweep through the F- files, identify any which have no
416     mapping, and ``cvs remove'' these.
417    
418     3.3 Diffing and Patching
419    
420 kaz 1.5 Another surprising advantage of Meta-CVS is that it addresses the
421 kaz 1.2 problem of distributing patches which patch the file system
422     structure as well as contents.
423    
424 kaz 1.4 The F- and MAP files in fact constitute an interchange format for
425     the distribution of program source which, in principle, amplifies
426     the capabilities of any change management tools that are based on
427     flat files.
428    
429 kaz 1.5 A developer can obtain a copy of a project in Meta-CVS form, then
430     work on making changes, including the renaming of paths. These
431     changes are represented in a new Meta-CVS file set. A diff is
432     computed between the new and the old. Someone with a copy of the
433     original can patch it, to reproduce the changes. All that is
434     needed is the Meta-CVS software to realize the rearrangements.

  ViewVC Help
Powered by ViewVC 1.1.5