ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log

Revision 1.9 - (show annotations)
Fri Apr 5 03:12:13 2002 UTC (12 years ago) by kaz
Branch: MAIN
CVS Tags: mcvs-0-14, mcvs-0-11, mcvs-0-13, mcvs-0-12
Changes since 1.8: +10 -9 lines
New quote.
1 Meta-CVS --- A Directory Structure
2 Versioning Layer Over
3 The Concurrent Versions System.
5 Kaz Kylheku
6 January 25, 2002
9 "Directory versioning is a Hard Problem" -- Subversion FAQ
11 "Any problem in computer science can be solved with
12 another layer of indirection" -- David Wheeler
15 Abstract
17 This is Meta-CVS, Meta-(Concurrent Versions System), a front end for
18 CVS. It supports the concurrent and independent versioning of
19 files, as well as a directory structure, by several people. I have
20 it been using it for a few weeks now, mostly just to version the
21 Meta-CVS sources themselves. It uses the cvs program in such a way that
22 you can not only version the file contents, but you can move and
23 rename files. These changes are committed to the repository, and
24 can be picked up by an update, which will incorporate them by
25 rearranging the working copy accordingly. There can be conflicting
26 parallel changes to the structure, which can be resolved like any
27 other conflict. It is all Lisp.
30 Contents
32 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 42
33 2. Data Representation Overview . . . . . . . . . . . . . . . . . 75
34 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 120
35 2.2 Synchronization . . . . . . . . . . . . . . . . . . . . 210
36 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 247
37 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 260
38 3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 286
39 3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 320
42 1. Introduction
44 The software known as CVS has been in existence since the year
45 1986, when its first version, consisting of shell scripts acting
46 as a front end to RCS commands, was posted to Usenet by Dick
47 Grune. Over the next fifteen years, CVS was turned into a C
48 program, enhanced and debugged. But in its present form, version
49 1.11, it still has annoying quirks and some serious limitations.
51 One of the biggest limitations of CVS is it does not treat the
52 directory structure of a module as a versioned object. Meta-CVS
53 solves this problem not by intruding in any way into the
54 well-debugged and time-tested CVS code, but by introducing a layer
55 of indirection. Meta-CVS retains the fundamental capabilites of
56 CVS: the ability to branch and merge, to work in parallel, to work
57 over a variety of network transports and so on. CVS worked as a
58 front end for RCS; similarly, Meta-CVS is a front end for CVS.
60 It turns out that Meta-CVS solves a few other infelicities in CVS
61 as well. A few tricky scenarios that cause grief in CVS are no
62 longer problems in Meta-CVS, such as: two developers concurrently
63 adding the same file, or one developer removing a file that
64 another is working on.
66 Meta-CVS works by creating a special representation of the
67 versioned file tree, and this special representation is what is
68 stored in CVS. Thus the naive direct mapping between the versioned
69 tree and the tree in the repository is avoided.
71 The aim of this paper is to document this simple representation
72 and explain how it supports the directory versioning operation.
75 2. Data Representation Overview
77 In order to obtain, from CVS, the ability to perform parallel
78 version control over any object, it is necessary to represent that
79 object as a text file. This is a given. CVS can effectively handle
80 only text input in its merging and conflict identification
81 algorithms. A critical non-functional constraint in the
82 requirements of Meta-CVS is that CVS is not to be modified in any
83 way; nobody should have to to install new CVS code on a client or
84 server machine to use Meta-CVS. Morever, the CVS code is fragile C
85 that has been debugged for over a decade (and counting).
87 To treat the file structure as a versioned entity, therefore, it
88 is necessary to represent it as a text file. What structure should
89 that text file have?
91 Firstly, it would be highly desirable if small changes, such as
92 renaming a few files, gave rise to small differences. Moreover,
93 a single change should only affect at most one line or two in the
94 text file. This property would allow for parallel changes with
95 minimal conflicts. The text file representation should also be
96 human readable and editable, because humans will have to resolve
97 conflicts in it.
99 Secondly, a file must somehow retain its identity and CVS history
100 when its path name changes. This means that we must never change
101 the name of the file, at least not the name which is known to CVS.
103 Meta-CVS represents the file structure of a project as a simple
104 entity called a ``file mapping''. The file mapping associates path
105 names with a flat database of files. Both the mapping and the
106 files are stored in CVS. The files have machine-generated names;
107 only through the mapping are they given actual names as they
108 appear to the users. The names known to CVS are called ``F-
109 files''.
111 Meta-CVS manipulates the mapping as a simple data structure in the
112 Lisp language. Lisp has a built-in parser and formatter for
113 reading a printed representation of a List object and producing a
114 printed representation. Thus the text file format for the Meta-CVS
115 mapping is simply a file containing a Lisp association list, with
116 special care taken to print each element of the association on a
117 separate line of text, and maintaining a consistently sorted order
118 to maximize the chances of minimal merges.
120 2.1 File Mapping Example
122 Suppose that some project 'foo' consists of these files:
124 foo/README
125 foo/inc/foo.h
126 foo/src/Makefile
127 foo/src/foo.c
129 what does a Meta-CVS representation look like? This is best
130 understood in terms of the working copy checked out from CVS via
131 Meta-CVS, which contains these things:
133 foo/MCVS/CVS/Entries
134 foo/MCVS/CVS/... other CVS metadata ...
136 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
137 foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h
138 foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
139 foo/MCVS/F-1C43C940D8745CAA78752C1206316B55.c
140 foo/MCVS/MAP
143 foo/README
144 foo/inc/foo.h
145 foo/src/Makefile
146 foo/src/foo.c
148 There is a subdirectory called MCVS, which contains a CVS
149 subdirectory. This MCVS subdirectory is in fact the CVS
150 ``sandbox''. Everything else under foo are the working files.
151 Thus every Meta-CVS working copy is just an ordinary file tree,
152 except that the top level directory contains a MCVS subdirectory
153 with interesting contents.
155 What are these files under MCVS? There are some files with cryptic
156 names like F-123D...438. Then there are two files MAP and
159 Firstly, it should be understood that the F- files and MAP are
160 versioned in CVS. On the other hand, MAP-LOCAL is a file that is
161 not known to CVS, but important to the functioning of Meta-CVS.
163 The four F- files are the actual CVS representations of
164 foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
166 What establishes the relationship between the F- names and the
167 human readable paths is the association list in the MAP file,
168 which looks something like this:
170 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
171 "README")
172 ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2.h"
173 "inc/foo.h")
174 ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
175 "src/Makefile")
176 ("MCVS/F-1C43C940D8745CAA78752C1206316B55.c"
177 "src/foo.c"))
179 The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
180 The purpose of MAP-LOCAL is to keep track of the actual mapping
181 that exists in the user's checked out copy. When an update
182 operation is performed, it may incorporate changes from the
183 repository into MAP, causing the MAP to no longer reflect the
184 local file structure. In fact MAP can at that point contain
185 unresolved conflicts, so that it is not usable by Meta-CVS,
186 requiring manual intervention. The MAP-LOCAL copy, however,
187 remains untouched and consistent.
189 Because Meta-CVS maintains a local copy of the mapping, the
190 Meta-CVS update operation can compute the differences between the
191 new mapping coming from the repository and the local mapping.
192 These differences can then be translated into
193 filesystem-rearranging actions that change the shape of the
194 working copy to bring it up to date. Then MAP and MAP-LOCAL are
195 once again identical.
197 This rearranging is the heart of the Meta-CVS system. Everything
198 else is largely just manipulations of the mappings. For example,
199 renaming a file is simple. Open up MCVS/MAP in a text editor, and
200 change a path (taking care not to create a duplicate, or otherwise
201 corrupt the mapping). Then save it and run the mcvs update.
202 Meta-CVS will propagate the change you made by physically
203 relocating that file. If you like what you have done, simply
204 commit. You can commit at the CVS level within the MCVS
205 directory. But of course, a Meta-CVS file renaming operation is
206 provided, and so is a commit operation, which in addition to
207 running CVS also ensures that the F- files are properly
208 synchronized with their unfolded counterparts.
210 2.2 Synchronization
212 The next problem to tackle is how to establish the correspondence
213 between the F- files and the working files. Meta-CVS does this in a
214 platform-specific way, namely by relying on Unix hard links.
216 When Meta-CVS checks out a sandbox, it creates hard links, so that
217 a F- file and its corresponding working file are in fact the same
218 filesystem object. Thus ``unpacking'' the F- files through the
219 mapping does not require the mass duplication of of file data,
220 only the creation of directories and links.
222 The problem is that some operations ``break'' this hard link
223 connection by unlinking a file and overwriting it with a new one
224 that has the same name. The CVS update operation does this, for
225 instance. If cvs up creates a new F- file, that file is no longer
226 connected with the working file.
228 To keep the two synchronized, Meta-CVS performs a synchronization
229 operation. This operation sweeps over the file map, and repairs
230 any broken links. If either of the two files is missing, then a
231 link is created. If both are present, but are distinct objects,
232 then the one with the most recent modification timestamp
233 supersedes; the other is unlinked and replaced with a link to the
234 newer one.
236 A synchronization must be done before any operation which can
237 cause a file to be moved, removed, or to be committed to the CVS
238 repository. In all these situations, the F- files must have
239 the correct contents.
241 The Meta-CVS update operation must perform synchronization twice:
242 before the CVS update to ensure that the F- files carry all of the
243 local changes; then after the CVS update to make sure that any
244 newly incorporated changes propagate back to the working copy.
247 3. Surprising Advantages
249 The Meta-CVS representation brings with it a few advantages which
250 were not immediately obvious in the design stages, but came to
251 light during development. In addition to the lack of directory
252 structure versioning, CVS has a few other infelicities which go
253 away under Meta-CVS. Also, bringing in the capability to version
254 directory structure also brings in a new concern. Free software
255 developers uses patches to communicate code changes to each other.
256 The traditional tools for producing and applying patches, like
257 CVS, do not handle directory versioning. Meta-CVS has some answers
258 to these problems.
260 3.1 File Adding Conflicts
262 In CVS, it can happen that two (or more) developers working on the
263 same module, add a file to the same directory, and all use the
264 same file name. The first developer commits the file, and then
265 problems occurs for the subsequent developers who try to commit.
266 CVS complains that the file was independently added by a second
267 party, and not allow the commit to proceed.
269 In Meta-CVS, this cannot happen. Meta-CVS recognizes that if two
270 people add a file, it is not the same file. Names do not determine
271 equivalence, semantics does! When a file is added to Meta-CVS, a
272 F- file is created to represent it. That F- file name contains a
273 randomly chosen 128-bit number, expressed in hexadecimal. It is
274 extremely unlikely that two such numbers will collide, so in
275 practice, one will ``never'' see the aforementioned CVS error
276 message.
278 Instead, what will happen when developers choose the same path
279 name for a file is that either a conflict will arise in the MAP
280 file, which will have to be resolved, or else the mapping will
281 contain a duplicate path name, which can be detected by Meta-CVS
282 as an error which again, the users must resolve. Each file is a
283 separate object with its own version history; that two objects
284 accidentally map to the same name is a minor, correctable problem.
286 3.2 File Removal Conflicts
288 CVS does not behave very well when one developer deletes a file,
289 via cvs remove, and another tries to continue comitting changes.
291 This is really just an instance of the classic problem of
292 computing the object lifetimes, translated to the domain of
293 version control.
295 The cleanest solution to the problem of computing object lifetimes
296 is garbage collection, which ensures that as long as an object can
297 still be used, it persists, and thereafter, it is automatically
298 removed when the system finds it necessary or convenient to do so.
300 It turns out that Meta-CVS supports a kind of garbage collection
301 concept. When a file is removed, it does not have to be subject to
302 ``cvs remove''. It only has to be removed from the file mapping,
303 but the F- file can remain unremoved. What this means is that the
304 F- file contines to be checked out, so it occupies bandwidth and
305 space. What happens if a user has outstanding changes, and
306 performs an Meta-CVS update which removes the file? The link
307 synchronization ensures that the outstanding changes are
308 transferred to the F- file before the update. So the changes are
309 not lost! It is possible to manually restore that F- file in the
310 MAP to give it a ``new lease on life''. This is analogous to
311 sifting through garbage, to salvage it by making it reachable
312 again. And, of course, the F- file can be committed to CVS whether
313 or not it is reentered into the map.
315 The space problem can be dealt with by a Meta-CVS ``garbage
316 collection'' routine that can be invoked administratively. This
317 will sweep through the F- files, identify any which have no
318 mapping, and ``cvs remove'' these.
320 3.3 Diffing and Patching
322 Another surprising advantage of Meta-CVS is that it addresses the
323 problem of distributing patches which patch the file system
324 structure as well as contents.
326 The F- and MAP files in fact constitute an interchange format for
327 the distribution of program source which, in principle, amplifies
328 the capabilities of any change management tools that are based on
329 flat files.
331 A developer can obtain a copy of a project in Meta-CVS form, then
332 work on making changes, including the renaming of paths. These
333 changes are represented in a new Meta-CVS file set. A diff is
334 computed between the new and the old. Someone with a copy of the
335 original can patch it, to reproduce the changes. All that is
336 needed is the Meta-CVS software to realize the rearrangements.

  ViewVC Help
Powered by ViewVC 1.1.5