/[meta-cvs]/meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6
ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (hide annotations)
Sat Jan 26 20:46:43 2002 UTC (12 years, 2 months ago) by kaz
Branch: MAIN
CVS Tags: cvs-options-passthrough
Changes since 1.2: +100 -84 lines
Lots of wording changes and clarifications.
1 kaz 1.1 MCVS---Meta CVS
2     A Directory Structure Versioning Layer Over
3     The Concurrent Version System.
4    
5     Kaz Kylheku
6     January 25, 2002
7    
8     Abstract
9    
10     This is MCVS, Meta-(Concurrent Versions System), a front end for
11     CVS. It supports the concurrent and independent versioning of
12     files, as well as a directory structure, by several people. I have
13 kaz 1.2 it been using it for a few weeks now, mostly just to version the
14     MCVS sources themselves. It uses the cvs program in such a way that
15     you can not only version the file contents, but you can move and
16     rename files. These changes are committed to the repository, and
17     can be picked up by an update, which will incorporate them by
18 kaz 1.1 rearranging the working copy accordingly. There can be conflicting
19 kaz 1.2 parallel changes to the structure, which can be resolved like any
20     other conflict. It is all Lisp.
21 kaz 1.1
22    
23     Contents
24    
25     1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 32
26 kaz 1.3 2. Data Representation Overview . . . . . . . . . . . . . . . . . 67
27     2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 111
28     2.2 Synchronization . . . . . . . . . . . . . . . . . . . . 200
29     3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 236
30     3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 249
31     3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 274
32     3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 308
33 kaz 1.1
34     1. Introduction
35    
36     The software known as CVS has been in existence since the year
37 kaz 1.3 1986, when its first version, consisting of shell scripts acting
38     as a front end to RCS commands, was posted to Usenet by Dick
39     Grune. Over the next fifteen years, CVS was turned into a C
40     program, enhanced and debugged. But in its present form, version
41     1.11, it still has annoying quirks and some serious limitations.
42 kaz 1.1
43 kaz 1.3 One of the biggest limitations of CVS is it does not treat the
44     directory structure of a module as a versioned object. MCVS solves
45     this problem not by intruding in any way into the well-debugged
46     and time-tested CVS code, but by introducing a layer of
47     indirection. MCVS retains the fundamental capabilites of CVS: the
48     ability to branch and merge, to work in parallel, to work over a
49     variety of network transports and so on. CVS worked as a front end
50     for RCS; similarly, MCVS is a front end for CVS.
51 kaz 1.1
52     It turns out that MCVS solves a few other infelicities in CVS as
53     well. A few tricky scenarios that cause grief in CVS are no
54     longer problems in MCVS, such as: two developers concurrently
55     adding the same file, or one developer removing a file that
56     another is working on.
57    
58     MCVS works by creating a special representation of the versioned
59     file tree, and this special representation is what is stored in
60     CVS. Thus the naive direct mapping between the versioned tree and
61     the tree in the repository is avoided.
62    
63     The aim of this paper is to document this simple representation
64     and explain how it supports the directory versioning operation.
65    
66    
67     2. Data Representation Overview
68    
69     In order to obtain, from CVS, the ability to perform parallel
70 kaz 1.2 version control over any object, it is necessary to represent that
71     object as a text file. This is a given. CVS can effectively handle
72     only text input in its merging and conflict identification
73 kaz 1.3 algorithms. A critical non-functional constraint in the
74     requirements of MCVS is that CVS is not to be modified in any way;
75     nobody should have to to install new CVS code on a client or
76     server machine to use MCVS. Morever, the CVS code is fragile C
77     that has been debugged for over a decade (and counting).
78 kaz 1.1
79     To treat the file structure as a versioned entity, therefore, it
80     is necessary to represent it as a text file. What structure should
81     that text file have?
82    
83     Firstly, it would be highly desirable if small changes, such as
84     renaming a few files, gave rise to small differences. Moreover,
85     a single change should only affect at most one line or two in the
86     text file. This property would allow for parallel changes with
87     minimal conflicts. The text file representation should also be
88     human readable and editable, because humans will have to resolve
89     conflicts in it.
90    
91     Secondly, a file must somehow retain its identity and CVS history
92     when its path name changes. This means that we must never change
93     the name of the file, at least not the name which is known to CVS.
94    
95     MCVS represents the file structure of a project as a simple entity
96 kaz 1.3 called a ``file mapping''. The file mapping associates path names
97     with a flat database of files. Both the mapping and the files are
98     stored in CVS. The files have machine-generated names; only
99     through the mapping are they given actual names as they appear to
100     the users. The names known to CVS are called ``F- files''.
101 kaz 1.1
102 kaz 1.2 MCVS manipulates the mapping as a simple data structure in the
103     Lisp language. Lisp has a built-in parser and formatter for
104     reading a printed representation of a List object and producing a
105     printed representation. Thus the text file format for the MCVS
106     mapping is simply a file containing a Lisp association list, with
107     special care taken to print each element of the association on a
108     separate line of text, and maintaining a consistently sorted order
109     to maximize the chances of minimal merges.
110 kaz 1.1
111     2.1 File Mapping Example
112    
113     Suppose that some project 'Foo' consists of these files:
114    
115     foo/README
116     foo/inc/foo.h
117     foo/src/Makefile
118     foo/src/foo.c
119    
120     what does a MCVS representation look like? This is best
121     understood in terms of the working copy checked out from CVS
122     via MCVS, which contains these things:
123    
124     foo/MCVS/CVS/Entries
125     foo/MCVS/CVS/... other CVS metadata ...
126    
127     foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
128     foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2
129     foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
130     foo/MCVS/F-1C43C940D8745CAA78752C1206316B55
131     foo/MCVS/MAP
132     foo/MCVS/MAP-LOCAL
133    
134     foo/README
135     foo/inc/foo.h
136     foo/src/Makefile
137     foo/src/foo.c
138    
139     There is a subdirectory called MCVS, which contains a CVS
140 kaz 1.3 subdirectory. This MCVS subdirectory is in fact the CVS
141     ``sandbox''. Everything else under foo are the working files.
142     Thus every MCVS working copy is just an ordinary file tree, except
143     that the top level directory contains a MCVS subdirectory with
144 kaz 1.2 interesting contents.
145 kaz 1.1
146 kaz 1.2 What are these files under MCVS? There are some files with cryptic
147     names like F-123D...438. Then there are two files MAP and
148 kaz 1.1 MAP-LOCAL.
149    
150 kaz 1.3 Firstly, it should be understood that the F- files and MAP are
151     versioned in CVS. On the other hand, MAP-LOCAL is a file that is
152     not known to CVS, but important to the functioning of MCVS.
153 kaz 1.1
154     The four F- files are the actual CVS representations of
155     foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
156    
157     What establishes the relationship between the F- names and the
158 kaz 1.3 human readable paths is the association list in the MAP file,
159     which looks something like this:
160 kaz 1.1
161     (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
162     "README")
163     ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2"
164     "inc/foo.h")
165     ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
166     "src/Makefile")
167     ("MCVS/F-1C43C940D8745CAA78752C1206316B55"
168     "src/foo.c"))
169    
170     The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
171     The purpose of MAP-LOCAL is to keep track of the actual mapping
172     that exists in the user's checked out copy. When an update
173 kaz 1.3 operation is performed, it may incorporate changes from the
174 kaz 1.1 repository into MAP, causing the MAP to no longer reflect the
175     local file structure. In fact MAP can at that point contain
176 kaz 1.3 unresolved conflicts, so that it is not usable by MCVS, requiring
177 kaz 1.1 manual intervention. The MAP-LOCAL copy, however, remains
178 kaz 1.3 untouched and consistent.
179 kaz 1.1
180 kaz 1.3 Because MCVS maintains a local copy of the mapping, the MCVS
181     update operation can compute the differences between the new
182     mapping coming from the repository and the local mapping. These
183     differences can then be translated into filesystem-rearranging
184     actions that change the shape of the working copy to bring it up
185     to date. Then MAP and MAP-LOCAL are once again identical.
186 kaz 1.1
187     This rearranging is the heart of the MCVS system. Everything else
188     is largely just manipulations of the mappings. For example,
189     renaming a file is simple. Open up MCVS/MAP in a text editor, and
190 kaz 1.3 change a path (taking care not to create a duplicate, or otherwise
191     corrupt the mapping). Then save it and run the mcvs update. MCVS
192     will propagate the change you made by physically relocating that
193     file. If you like what you have done, simply commit. You can
194     commit at the CVS level within the MCVS directory. But of course,
195     a front end MCVS file renaming operation is provided, and so is a
196     commit operation, which in addition to running CVS also ensures
197     that the F- files are properly synchronized with their unfolded
198     counterparts.
199 kaz 1.1
200     2.2 Synchronization
201    
202     The next problem to tackle is how to establish the correspondence
203     between the F- files and the working files. MCVS does this in a
204     platform-specific way, namely by relying on Unix hard links.
205    
206     When MCVS checks out a sandbox, it creates hard links, so that a
207     F- file and its corresponding working file are in fact the same
208     filesystem object. Thus ``unpacking'' the F- files through the
209 kaz 1.3 mapping does not require the mass duplication of of file data,
210 kaz 1.1 only the creation of directories and links.
211    
212     The problem is that some operations ``break'' this hard link
213     connection by unlinking a file and overwriting it with a new one
214     that has the same name. The CVS update operation does this, for
215     instance. If cvs up creates a new F- file, that file is no longer
216     connected with the working file.
217    
218     To keep the two synchronized, MCVS performs a synchronization
219     operation. This operation sweeps over the file map, and repairs
220     any broken links. If either of the two files is missing, then a
221     link is created. If both are present, but are distinct objects,
222     then the one with the most recent modification timestamp supersedes;
223     the other is unlinked and replaced with a link to the newer one.
224    
225 kaz 1.3 A synchronization must be done before any operation which can
226     cause a file to be moved, removed, or to be committed to the CVS
227     repository. In all these situations, the F- files must have
228     the correct contents.
229 kaz 1.1
230     The MCVS update operation must perform synchronization twice:
231 kaz 1.2 before the CVS update to ensure that the F- files carry all of the
232     local changes; then after the CVS update to make sure that any
233     newly incorporated changes propagate back to the working copy.
234    
235    
236     3. Surprising Advantages
237    
238     The MCVS representation brings with it a few advantages which were
239     not immediately obvious in the design stages, but came to light
240     during development. In addition to the lack of directory structure
241     versioning, CVS has a few other infelicities which go away under
242     MCVS. Also, bringing in the capability to version directory
243     structure also brings in a new concern. Free software developers
244     uses patches to communicate code changes to each other. The
245     traditional tools for producing and applying patches, like CVS, do
246 kaz 1.3 not handle directory versioning. MCVS has some answers to these
247     problems.
248 kaz 1.2
249     3.1 File Adding Conflicts
250    
251     In CVS, it can happen that two (or more) developers working on the
252     same module, add a file to the same directory, and all use the
253 kaz 1.3 same file name. The first developer commits the file, and then
254     problems occurs for the subsequent developers who try to commit.
255     CVS complains that the file was independently added by a second
256     party, and not allow the commit to proceed.
257 kaz 1.2
258     In MCVS, this cannot happen. MCVS recognizes that if two people
259     add a file, it is not the same file. Names do not determine
260 kaz 1.3 equivalence, semantics does! When a file is added to MCVS, a F- file
261 kaz 1.2 is created to represent it. That F- file name contains a randomly
262     chosen 128-bit number, expressed in hexadecimal. It is extremely
263     unlikely that two such numbers will collide, so in practice, one
264     will ``never'' see the aforementioned CVS error message.
265    
266     Instead, what will happen when developers choose the same path
267     name for a file is that either a conflict will arise in the MAP
268     file, which will have to be resolved, or else the mapping will
269     contain a duplicate path name, which can be detected by MCVS as an
270 kaz 1.3 error which again, the users must resolve. Each file is a separate
271     object with its own version history; that two objects accidentally
272     map to the same name is a minor, correctable problem.
273 kaz 1.2
274     3.2 File Removal Conflicts
275    
276     CVS does not behave very well when one developer deletes a file,
277     via cvs remove, and another tries to continue comitting changes.
278    
279 kaz 1.3 This is really just an instance of the classic problem of
280     computing the object lifetimes, translated to the domain of
281     version control.
282 kaz 1.2
283     The cleanest solution to the problem of computing object lifetimes
284     is garbage collection, which ensures that as long as an object can
285 kaz 1.3 still be used, it persists, and thereafter, it is automatically
286     removed when the system finds it necessary or convenient to do so.
287 kaz 1.2
288     It turns out that MCVS supports a kind of garbage collection
289     concept. When a file is removed, it does not have to be subject to
290     ``cvs remove''. It only has to be removed from the file mapping,
291 kaz 1.3 but the F- file can remain unremoved. What this means is that the
292     F- file contines to be checked out, so it occupies bandwidth and
293     space. What happens if a user has outstanding changes, and
294     performs an MCVS update which removes the file? The link
295     synchronization ensures that the outstanding changes are
296     transferred to the F- file before the update. So the changes are
297     not lost! It is possible to manually restore that F- file in the
298     MAP to give it a ``new lease on life''. This is analogous to
299     sifting through garbage, to salvage it by making it reachable
300     again. And, of course, the F- file can be committed to CVS whether
301     or not it is reentered into the map.
302 kaz 1.2
303     The space problem can be dealt with by a MCVS ``garbage
304     collection'' routine that can be invoked administratively. This
305     will sweep through the F- files, identify any which have no
306     mapping, and ``cvs remove'' these.
307    
308     3.3 Diffing and Patching
309    
310     Another surprising advantage of MCVS is that it addresses the
311     problem of distributing patches which patch the file system
312     structure as well as contents.
313    
314     The F- and MAP files can be treated as a data interchange format.
315     That is to say, the raw MCVS representation of a project can be
316 kaz 1.3 exported from CVS and shipped to developers, instead of the
317     unpacked file tree. The developers can place that MCVS
318     representation into their own repositories and work on it using
319     MCVS. When they are done, they can produce a CVS-level diff over the
320     MCVS representation. That patch captures any directory restructuring
321     that was done, and can be applied to a MCVS representation elsewhere
322     to reproduce those changes.
323    
324     In principle, it should be possible to use MCVS for producing
325     patches, without using CVS. Essentially, the MCVS directory
326     with its MAP and F- files is nothing more than a file archive,
327     albeit one that is amenable to patching.

  ViewVC Help
Powered by ViewVC 1.1.5