ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.2 - (show annotations)
Sat Jan 26 06:32:51 2002 UTC (12 years, 2 months ago) by kaz
Branch: MAIN
Changes since 1.1: +118 -35 lines
Nearly says everything I want now.
1 MCVS---Meta CVS
2 A Directory Structure Versioning Layer Over
3 The Concurrent Version System.
5 Kaz Kylheku
6 January 25, 2002
8 Abstract
10 This is MCVS, Meta-(Concurrent Versions System), a front end for
11 CVS. It supports the concurrent and independent versioning of
12 files, as well as a directory structure, by several people. I have
13 it been using it for a few weeks now, mostly just to version the
14 MCVS sources themselves. It uses the cvs program in such a way that
15 you can not only version the file contents, but you can move and
16 rename files. These changes are committed to the repository, and
17 can be picked up by an update, which will incorporate them by
18 rearranging the working copy accordingly. There can be conflicting
19 parallel changes to the structure, which can be resolved like any
20 other conflict. It is all Lisp.
23 Contents
25 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 32
26 2. Data Representation Overview . . . . . . . . . . . . . . . . . 64
27 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 107
28 2.2 Synchronization . . . . . . . . . . . . . . . . . . . . 193
29 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 230
30 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 242
31 3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 265
32 3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 298
34 1. Introduction
36 The software known as CVS has been in existence since the year
37 1986, when its first version was posted to Usenet by Dick Grune.
38 The original version was shell scripts operating over top of
39 RCS commands. Over the next fifteen years, CVS was turned into
40 a C program, enhanced and debugged. But in its present form,
41 version 1.11, it still has a few drawbacks.
43 One of the biggest drawbacks is that CVS does not treat the
44 directory and file name structure of a module as a versioned
45 object. MCVS solves this problem not by intruding in any way into
46 the well-debugged and time-tested CVS code, but by introducing a
47 layer of indirection. MCVS retains the fundamental capabilites of
48 CVS: the ability to branch and merge, to work in parallel, to work
49 over a variety of network transports and so on.
51 It turns out that MCVS solves a few other infelicities in CVS as
52 well. A few tricky scenarios that cause grief in CVS are no
53 longer problems in MCVS, such as: two developers concurrently
54 adding the same file, or one developer removing a file that
55 another is working on.
57 MCVS works by creating a special representation of the versioned
58 file tree, and this special representation is what is stored in
59 CVS. Thus the naive direct mapping between the versioned tree and
60 the tree in the repository is avoided.
62 The aim of this paper is to document this simple representation
63 and explain how it supports the directory versioning operation.
66 2. Data Representation Overview
68 In order to obtain, from CVS, the ability to perform parallel
69 version control over any object, it is necessary to represent that
70 object as a text file. This is a given. CVS can effectively handle
71 only text input in its merging and conflict identification
72 algorithms.
74 To treat the file structure as a versioned entity, therefore, it
75 is necessary to represent it as a text file. What structure should
76 that text file have?
78 Firstly, it would be highly desirable if small changes, such as
79 renaming a few files, gave rise to small differences. Moreover,
80 a single change should only affect at most one line or two in the
81 text file. This property would allow for parallel changes with
82 minimal conflicts. The text file representation should also be
83 human readable and editable, because humans will have to resolve
84 conflicts in it.
86 Secondly, a file must somehow retain its identity and CVS history
87 when its path name changes. This means that we must never change
88 the name of the file, at least not the name which is known to CVS.
90 MCVS represents the file structure of a project as a simple entity
91 called a file mapping. The file mapping associates path names
92 with a flat database of files. Both the mapping and the files
93 are stored in CVS. The files have machine-generated names; only
94 through the mapping are they given actual names as they appear
95 to the users.
97 MCVS manipulates the mapping as a simple data structure in the
98 Lisp language. Lisp has a built-in parser and formatter for
99 reading a printed representation of a List object and producing a
100 printed representation. Thus the text file format for the MCVS
101 mapping is simply a file containing a Lisp association list, with
102 special care taken to print each element of the association on a
103 separate line of text, and maintaining a consistently sorted order
104 to maximize the chances of minimal merges.
107 2.1 File Mapping Example
109 Suppose that some project 'Foo' consists of these files:
111 foo/README
112 foo/inc/foo.h
113 foo/src/Makefile
114 foo/src/foo.c
116 what does a MCVS representation look like? This is best
117 understood in terms of the working copy checked out from CVS
118 via MCVS, which contains these things:
120 foo/MCVS/CVS/Entries
121 foo/MCVS/CVS/... other CVS metadata ...
123 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
124 foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2
125 foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
126 foo/MCVS/F-1C43C940D8745CAA78752C1206316B55
127 foo/MCVS/MAP
130 foo/README
131 foo/inc/foo.h
132 foo/src/Makefile
133 foo/src/foo.c
135 There is a subdirectory called MCVS, which contains a CVS
136 subdirectory. This MCVS subdirectory is in fact the CVS ``sand
137 box''. Everything else under foo are the working files. Thus
138 every MCVS working copy is just an ordinary file tree, except that
139 the top level directory contains a MCVS subdirectory with
140 interesting contents.
142 What are these files under MCVS? There are some files with cryptic
143 names like F-123D...438. Then there are two files MAP and
146 Firstly, it should be understdood that the F- files and MAP are
147 versioned in CVS. MAP-LOCAL is a file that is not known to CVS,
148 but important to the functioning of MCVS.
150 The four F- files are the actual CVS representations of
151 foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
153 What establishes the relationship between the F- names and the
154 human readable paths is the contents of the MAP file, which look
155 something like this:
157 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
158 "README")
159 ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2"
160 "inc/foo.h")
161 ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
162 "src/Makefile")
163 ("MCVS/F-1C43C940D8745CAA78752C1206316B55"
164 "src/foo.c"))
166 The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
167 The purpose of MAP-LOCAL is to keep track of the actual mapping
168 that exists in the user's checked out copy. When an update
169 operation is performed, it will incorporate changes from the
170 repository into MAP, causing the MAP to no longer reflect the
171 local file structure. In fact MAP can at that point contain
172 unresolved conflicts, so that it is not useable by MCVS, requiring
173 manual intervention. The MAP-LOCAL copy, however, remains
174 untouched and therefore consistent.
176 By maintaining a local copy, the MCVS update operation can compute
177 the differences between the local file mapping and the new mapping
178 coming from the repository. These differences can then be
179 translated into filesystem-rearranging actions that change the
180 shape of the working copy to bring it up to date.
182 This rearranging is the heart of the MCVS system. Everything else
183 is largely just manipulations of the mappings. For example,
184 renaming a file is simple. Open up MCVS/MAP in a text editor, and
185 change one of the paths, taking care not to create a duplicate.
186 Then save it and run the mcvs update. MCVS will propagate the
187 change you made by physically relocating that file. If you like
188 what you have done, simply commit. You can commit at the CVS level
189 within the MCVS directory. But of course, a front end MCVS file
190 renaming operation is provided, and so is a commit operation,
191 which in addition to running CVS also ensures that the F- files
192 are properly synchronized with their unfolded counterparts.
195 2.2 Synchronization
197 The next problem to tackle is how to establish the correspondence
198 between the F- files and the working files. MCVS does this in a
199 platform-specific way, namely by relying on Unix hard links.
201 When MCVS checks out a sandbox, it creates hard links, so that a
202 F- file and its corresponding working file are in fact the same
203 filesystem object. Thus ``unpacking'' the F- files through the
204 mapping does not require the movement of massive quantities of data,
205 only the creation of directories and links.
207 The problem is that some operations ``break'' this hard link
208 connection by unlinking a file and overwriting it with a new one
209 that has the same name. The CVS update operation does this, for
210 instance. If cvs up creates a new F- file, that file is no longer
211 connected with the working file.
213 To keep the two synchronized, MCVS performs a synchronization
214 operation. This operation sweeps over the file map, and repairs
215 any broken links. If either of the two files is missing, then a
216 link is created. If both are present, but are distinct objects,
217 then the one with the most recent modification timestamp supersedes;
218 the other is unlinked and replaced with a link to the newer one.
220 Most such clobbering needs to take place when the user edits the
221 working file, in order to propagate the change to the F- file. So,
222 for instance, prior to committing, a synchronization is done.
224 The MCVS update operation must perform synchronization twice:
225 before the CVS update to ensure that the F- files carry all of the
226 local changes; then after the CVS update to make sure that any
227 newly incorporated changes propagate back to the working copy.
230 3. Surprising Advantages
232 The MCVS representation brings with it a few advantages which were
233 not immediately obvious in the design stages, but came to light
234 during development. In addition to the lack of directory structure
235 versioning, CVS has a few other infelicities which go away under
236 MCVS. Also, bringing in the capability to version directory
237 structure also brings in a new concern. Free software developers
238 uses patches to communicate code changes to each other. The
239 traditional tools for producing and applying patches, like CVS, do
240 not handle directory versioning.
242 3.1 File Adding Conflicts
244 In CVS, it can happen that two (or more) developers working on the
245 same module, add a file to the same directory, and all use the
246 same file name. A problem occurs for the developers who lose the
247 race to commit the file. CVS complains that the file was
248 independently added by a second party, and not allow the commit to
249 proceed.
251 In MCVS, this cannot happen. MCVS recognizes that if two people
252 add a file, it is not the same file. Names do not determine
253 equivalence, semantics do! When a file is added to MCVS, a F- file
254 is created to represent it. That F- file name contains a randomly
255 chosen 128-bit number, expressed in hexadecimal. It is extremely
256 unlikely that two such numbers will collide, so in practice, one
257 will ``never'' see the aforementioned CVS error message.
259 Instead, what will happen when developers choose the same path
260 name for a file is that either a conflict will arise in the MAP
261 file, which will have to be resolved, or else the mapping will
262 contain a duplicate path name, which can be detected by MCVS as an
263 error which again, the users must resolve.
265 3.2 File Removal Conflicts
267 CVS does not behave very well when one developer deletes a file,
268 via cvs remove, and another tries to continue comitting changes.
270 This is really just an instance a classic problem of computing the
271 lifetime of an object, transported to the domain of version
272 control.
274 The cleanest solution to the problem of computing object lifetimes
275 is garbage collection, which ensures that as long as an object can
276 still be used, it persists.
278 It turns out that MCVS supports a kind of garbage collection
279 concept. When a file is removed, it does not have to be subject to
280 ``cvs remove''. It only has to be removed from the file mapping,
281 but the F- file can remain in CVS. What this means is that the F-
282 file contines to be checked out, so it occupies bandwidth and
283 space. But it also means that the victim of a file removal is not
284 adversely affected. Upon updating to the latest MAP file in which
285 the working file no longer exists, MCVS will cause that file to
286 disappear locally. However, the F- file still exists, and thanks
287 to the synchronization, it contains that user's working changes.
288 It is possible to manually restore that F- file in the MAP to
289 give it a ``new lease on life''. This is analogous to sifting
290 through garbage, to make it reachable again, thereby salvaging it
291 again.
293 The space problem can be dealt with by a MCVS ``garbage
294 collection'' routine that can be invoked administratively. This
295 will sweep through the F- files, identify any which have no
296 mapping, and ``cvs remove'' these.
298 3.3 Diffing and Patching
300 Another surprising advantage of MCVS is that it addresses the
301 problem of distributing patches which patch the file system
302 structure as well as contents.
304 The F- and MAP files can be treated as a data interchange format.
305 That is to say, the raw MCVS representation of a project can be
306 exported from CVS and shipped to developers. The developers can
307 place that MCVS representation into their own repositories and
308 work on it. When they are done, they can produce a CVS-level
309 diff over the MCVS representation. That patch captures any
310 directory restructuring that was done, and can be applied to
311 a MCVS representation elsewhere to reproduce those changes.

  ViewVC Help
Powered by ViewVC 1.1.5