/[meta-cvs]/meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6
ViewVC logotype

Contents of /meta-cvs/F-54B5FF01DC6392F28A104A8A58761CB6

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.4 - (show annotations)
Sun Jan 27 16:49:51 2002 UTC (12 years, 2 months ago) by kaz
Branch: MAIN
CVS Tags: mcvs-0-1, mcvs-0-0
Changes since 1.3: +26 -22 lines
Added David Wheeler quote about indirection.
Fix section numbering.
Condense text about diffing and patching.
1 MCVS---Meta CVS
2 A Directory Structure Versioning Layer Over
3 The Concurrent Version System.
4
5 Kaz Kylheku
6 January 25, 2002
7
8
9 "Any problem in computer science can be solved with
10 another layer of indirection" -- David Wheeler
11
12
13
14 Abstract
15
16 This is MCVS, Meta-(Concurrent Versions System), a front end for
17 CVS. It supports the concurrent and independent versioning of
18 files, as well as a directory structure, by several people. I have
19 it been using it for a few weeks now, mostly just to version the
20 MCVS sources themselves. It uses the cvs program in such a way that
21 you can not only version the file contents, but you can move and
22 rename files. These changes are committed to the repository, and
23 can be picked up by an update, which will incorporate them by
24 rearranging the working copy accordingly. There can be conflicting
25 parallel changes to the structure, which can be resolved like any
26 other conflict. It is all Lisp.
27
28
29 Contents
30
31 1. Introduction . . . . . . . . . . . . . . . . . . . . . . Line 41
32 2. Data Representation Overview . . . . . . . . . . . . . . . . . 74
33 2.1 File Mapping Example . . . . . . . . . . . . . . . . . . 118
34 2.2 Synchronization . . . . . . . . . . . . . . . . . . . . 207
35 3. Surprising Advantages . . . . . . . . . . . . . . . . . . . . 243
36 3.1 File Adding conflicts . . . . . . . . . . . . . . . . . 256
37 3.2 File Removal conflicts . . . . . . . . . . . . . . . . . 281
38 3.3 Diffing and Patching . . . . . . . . . . . . . . . . . 315
39
40
41 1. Introduction
42
43 The software known as CVS has been in existence since the year
44 1986, when its first version, consisting of shell scripts acting
45 as a front end to RCS commands, was posted to Usenet by Dick
46 Grune. Over the next fifteen years, CVS was turned into a C
47 program, enhanced and debugged. But in its present form, version
48 1.11, it still has annoying quirks and some serious limitations.
49
50 One of the biggest limitations of CVS is it does not treat the
51 directory structure of a module as a versioned object. MCVS solves
52 this problem not by intruding in any way into the well-debugged
53 and time-tested CVS code, but by introducing a layer of
54 indirection. MCVS retains the fundamental capabilites of CVS: the
55 ability to branch and merge, to work in parallel, to work over a
56 variety of network transports and so on. CVS worked as a front end
57 for RCS; similarly, MCVS is a front end for CVS.
58
59 It turns out that MCVS solves a few other infelicities in CVS as
60 well. A few tricky scenarios that cause grief in CVS are no
61 longer problems in MCVS, such as: two developers concurrently
62 adding the same file, or one developer removing a file that
63 another is working on.
64
65 MCVS works by creating a special representation of the versioned
66 file tree, and this special representation is what is stored in
67 CVS. Thus the naive direct mapping between the versioned tree and
68 the tree in the repository is avoided.
69
70 The aim of this paper is to document this simple representation
71 and explain how it supports the directory versioning operation.
72
73
74 2. Data Representation Overview
75
76 In order to obtain, from CVS, the ability to perform parallel
77 version control over any object, it is necessary to represent that
78 object as a text file. This is a given. CVS can effectively handle
79 only text input in its merging and conflict identification
80 algorithms. A critical non-functional constraint in the
81 requirements of MCVS is that CVS is not to be modified in any way;
82 nobody should have to to install new CVS code on a client or
83 server machine to use MCVS. Morever, the CVS code is fragile C
84 that has been debugged for over a decade (and counting).
85
86 To treat the file structure as a versioned entity, therefore, it
87 is necessary to represent it as a text file. What structure should
88 that text file have?
89
90 Firstly, it would be highly desirable if small changes, such as
91 renaming a few files, gave rise to small differences. Moreover,
92 a single change should only affect at most one line or two in the
93 text file. This property would allow for parallel changes with
94 minimal conflicts. The text file representation should also be
95 human readable and editable, because humans will have to resolve
96 conflicts in it.
97
98 Secondly, a file must somehow retain its identity and CVS history
99 when its path name changes. This means that we must never change
100 the name of the file, at least not the name which is known to CVS.
101
102 MCVS represents the file structure of a project as a simple entity
103 called a ``file mapping''. The file mapping associates path names
104 with a flat database of files. Both the mapping and the files are
105 stored in CVS. The files have machine-generated names; only
106 through the mapping are they given actual names as they appear to
107 the users. The names known to CVS are called ``F- files''.
108
109 MCVS manipulates the mapping as a simple data structure in the
110 Lisp language. Lisp has a built-in parser and formatter for
111 reading a printed representation of a List object and producing a
112 printed representation. Thus the text file format for the MCVS
113 mapping is simply a file containing a Lisp association list, with
114 special care taken to print each element of the association on a
115 separate line of text, and maintaining a consistently sorted order
116 to maximize the chances of minimal merges.
117
118 2.1 File Mapping Example
119
120 Suppose that some project 'Foo' consists of these files:
121
122 foo/README
123 foo/inc/foo.h
124 foo/src/Makefile
125 foo/src/foo.c
126
127 what does a MCVS representation look like? This is best
128 understood in terms of the working copy checked out from CVS
129 via MCVS, which contains these things:
130
131 foo/MCVS/CVS/Entries
132 foo/MCVS/CVS/... other CVS metadata ...
133
134 foo/MCVS/F-123D61C8FE942733281D2B08C15CD438
135 foo/MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2
136 foo/MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0
137 foo/MCVS/F-1C43C940D8745CAA78752C1206316B55
138 foo/MCVS/MAP
139 foo/MCVS/MAP-LOCAL
140
141 foo/README
142 foo/inc/foo.h
143 foo/src/Makefile
144 foo/src/foo.c
145
146 There is a subdirectory called MCVS, which contains a CVS
147 subdirectory. This MCVS subdirectory is in fact the CVS
148 ``sandbox''. Everything else under foo are the working files.
149 Thus every MCVS working copy is just an ordinary file tree, except
150 that the top level directory contains a MCVS subdirectory with
151 interesting contents.
152
153 What are these files under MCVS? There are some files with cryptic
154 names like F-123D...438. Then there are two files MAP and
155 MAP-LOCAL.
156
157 Firstly, it should be understood that the F- files and MAP are
158 versioned in CVS. On the other hand, MAP-LOCAL is a file that is
159 not known to CVS, but important to the functioning of MCVS.
160
161 The four F- files are the actual CVS representations of
162 foo/README, foo/src/foo.c, foo/src/Makefile and foo/inc/foo.h.
163
164 What establishes the relationship between the F- names and the
165 human readable paths is the association list in the MAP file,
166 which looks something like this:
167
168 (("MCVS/F-123D61C8FE942733281D2B08C15CD438"
169 "README")
170 ("MCVS/F-156CAB88D4EEE703E8C4B4146B5094E2"
171 "inc/foo.h")
172 ("MCVS/F-15EA9689ACF749C314CE6FC5255DC4B0"
173 "src/Makefile")
174 ("MCVS/F-1C43C940D8745CAA78752C1206316B55"
175 "src/foo.c"))
176
177 The MAP-LOCAL file, upon checkout, is simply an exact copy of MAP.
178 The purpose of MAP-LOCAL is to keep track of the actual mapping
179 that exists in the user's checked out copy. When an update
180 operation is performed, it may incorporate changes from the
181 repository into MAP, causing the MAP to no longer reflect the
182 local file structure. In fact MAP can at that point contain
183 unresolved conflicts, so that it is not usable by MCVS, requiring
184 manual intervention. The MAP-LOCAL copy, however, remains
185 untouched and consistent.
186
187 Because MCVS maintains a local copy of the mapping, the MCVS
188 update operation can compute the differences between the new
189 mapping coming from the repository and the local mapping. These
190 differences can then be translated into filesystem-rearranging
191 actions that change the shape of the working copy to bring it up
192 to date. Then MAP and MAP-LOCAL are once again identical.
193
194 This rearranging is the heart of the MCVS system. Everything else
195 is largely just manipulations of the mappings. For example,
196 renaming a file is simple. Open up MCVS/MAP in a text editor, and
197 change a path (taking care not to create a duplicate, or otherwise
198 corrupt the mapping). Then save it and run the mcvs update. MCVS
199 will propagate the change you made by physically relocating that
200 file. If you like what you have done, simply commit. You can
201 commit at the CVS level within the MCVS directory. But of course,
202 a front end MCVS file renaming operation is provided, and so is a
203 commit operation, which in addition to running CVS also ensures
204 that the F- files are properly synchronized with their unfolded
205 counterparts.
206
207 2.2 Synchronization
208
209 The next problem to tackle is how to establish the correspondence
210 between the F- files and the working files. MCVS does this in a
211 platform-specific way, namely by relying on Unix hard links.
212
213 When MCVS checks out a sandbox, it creates hard links, so that a
214 F- file and its corresponding working file are in fact the same
215 filesystem object. Thus ``unpacking'' the F- files through the
216 mapping does not require the mass duplication of of file data,
217 only the creation of directories and links.
218
219 The problem is that some operations ``break'' this hard link
220 connection by unlinking a file and overwriting it with a new one
221 that has the same name. The CVS update operation does this, for
222 instance. If cvs up creates a new F- file, that file is no longer
223 connected with the working file.
224
225 To keep the two synchronized, MCVS performs a synchronization
226 operation. This operation sweeps over the file map, and repairs
227 any broken links. If either of the two files is missing, then a
228 link is created. If both are present, but are distinct objects,
229 then the one with the most recent modification timestamp supersedes;
230 the other is unlinked and replaced with a link to the newer one.
231
232 A synchronization must be done before any operation which can
233 cause a file to be moved, removed, or to be committed to the CVS
234 repository. In all these situations, the F- files must have
235 the correct contents.
236
237 The MCVS update operation must perform synchronization twice:
238 before the CVS update to ensure that the F- files carry all of the
239 local changes; then after the CVS update to make sure that any
240 newly incorporated changes propagate back to the working copy.
241
242
243 3. Surprising Advantages
244
245 The MCVS representation brings with it a few advantages which were
246 not immediately obvious in the design stages, but came to light
247 during development. In addition to the lack of directory structure
248 versioning, CVS has a few other infelicities which go away under
249 MCVS. Also, bringing in the capability to version directory
250 structure also brings in a new concern. Free software developers
251 uses patches to communicate code changes to each other. The
252 traditional tools for producing and applying patches, like CVS, do
253 not handle directory versioning. MCVS has some answers to these
254 problems.
255
256 3.1 File Adding Conflicts
257
258 In CVS, it can happen that two (or more) developers working on the
259 same module, add a file to the same directory, and all use the
260 same file name. The first developer commits the file, and then
261 problems occurs for the subsequent developers who try to commit.
262 CVS complains that the file was independently added by a second
263 party, and not allow the commit to proceed.
264
265 In MCVS, this cannot happen. MCVS recognizes that if two people
266 add a file, it is not the same file. Names do not determine
267 equivalence, semantics does! When a file is added to MCVS, a F- file
268 is created to represent it. That F- file name contains a randomly
269 chosen 128-bit number, expressed in hexadecimal. It is extremely
270 unlikely that two such numbers will collide, so in practice, one
271 will ``never'' see the aforementioned CVS error message.
272
273 Instead, what will happen when developers choose the same path
274 name for a file is that either a conflict will arise in the MAP
275 file, which will have to be resolved, or else the mapping will
276 contain a duplicate path name, which can be detected by MCVS as an
277 error which again, the users must resolve. Each file is a separate
278 object with its own version history; that two objects accidentally
279 map to the same name is a minor, correctable problem.
280
281 3.2 File Removal Conflicts
282
283 CVS does not behave very well when one developer deletes a file,
284 via cvs remove, and another tries to continue comitting changes.
285
286 This is really just an instance of the classic problem of
287 computing the object lifetimes, translated to the domain of
288 version control.
289
290 The cleanest solution to the problem of computing object lifetimes
291 is garbage collection, which ensures that as long as an object can
292 still be used, it persists, and thereafter, it is automatically
293 removed when the system finds it necessary or convenient to do so.
294
295 It turns out that MCVS supports a kind of garbage collection
296 concept. When a file is removed, it does not have to be subject to
297 ``cvs remove''. It only has to be removed from the file mapping,
298 but the F- file can remain unremoved. What this means is that the
299 F- file contines to be checked out, so it occupies bandwidth and
300 space. What happens if a user has outstanding changes, and
301 performs an MCVS update which removes the file? The link
302 synchronization ensures that the outstanding changes are
303 transferred to the F- file before the update. So the changes are
304 not lost! It is possible to manually restore that F- file in the
305 MAP to give it a ``new lease on life''. This is analogous to
306 sifting through garbage, to salvage it by making it reachable
307 again. And, of course, the F- file can be committed to CVS whether
308 or not it is reentered into the map.
309
310 The space problem can be dealt with by a MCVS ``garbage
311 collection'' routine that can be invoked administratively. This
312 will sweep through the F- files, identify any which have no
313 mapping, and ``cvs remove'' these.
314
315 3.3 Diffing and Patching
316
317 Another surprising advantage of MCVS is that it addresses the
318 problem of distributing patches which patch the file system
319 structure as well as contents.
320
321 The F- and MAP files in fact constitute an interchange format for
322 the distribution of program source which, in principle, amplifies
323 the capabilities of any change management tools that are based on
324 flat files.
325
326 A developer can obtain a copy of a project in MCVS form, then work
327 on making changes, including the renaming of paths. These changes
328 are represented in a new MCVS file set. A diff is computed between
329 the new and the old. Someone with a copy of the original can patch
330 it, to reproduce the changes. All that is needed is the MCVS
331 software to realize the rearrangements.

  ViewVC Help
Powered by ViewVC 1.1.5