========================================================== PHILIP-JOSE, a farmer for distributed computations in Lisp ========================================================== Summary ======= This package implements a "farmer": a program to control a farm of computers and coordinate them into achieving a large computation. The basic usage model is that all the flow control happens in the farmer, whereas all the intensive computation happens in worker processes each of which needs only know how to do a simple part of the whole. The overall computation may have many components and sub-components that include both massive parallelism and sequential dependencies, where the control-flow that may depend on the intermediary results, where failure of components can happen, be detected and acted upon, etc. License ======= This software is released under the bugroff license. Use at your own risk. http://www.geocities.com/SoHo/Cafe/5947/bugroff.html At the insistence of several hackers, I hereby state what is obvious to me, that they can reuse any software released under the bugroff license and publish it as part or totality of packages under any other license they see fit if it really matters to them, including a BSD-style license or a MIT license. Yes they can. Of course, if they choose a proprietary software license, they only deserve scorn. But even that, they may do! Communication ============= Communication happens through a simple request/response protocol (like HTTP). The current protocol is designed to make it very easy to pass around Lisp data as the arguments and results of computations. It does that well, but its current incarnation has many limitations. Most importantly, the current system is intrinsically unsecure. DO NOT USE IT ON THE INTERNET. Use it only within a trusted network, or wrapped over encrypted lines. Also, the protocol is NOT designed for transmitting large amounts of data. If workers need share data with the farmer and/or with each other, they should access it through independent means, such as a file server. The farmer as it currently is is only designed to handle the general flow control of the computation. Workers are spawned by the farmer itself, using ssh or fork+exec, from a farm of registered machines. A simple mechanism to register and unregister machines is provided. A validation may be run on each worker machine to qualify it or not for the current computation. Because simple forking (without exec) is not (currently) supported (and because it is not usually available for distributed computations), an independent means of storing and retrieving data is once again needed if workers are to share it. Flow Control ============ On the plus side, the flow-control of the computation can be written in a very natural way. The farmer supports for sequential, parallel and first-wins subcomputations. Moreover, it provides a universal primitive (i.e. delimited continuations) on top of of which you can build your own control structures if you ever need something more elaborate. And indeed, the previous high-level control structures were built in terms of these primitives. I am told the implementation of continuations I am using (arnesi) allows continuations to be serialized, so in theory, you could build upon this to achieve mobile code between several farmers (as in the Tube), persistent threads, load-balancing, etc. Note that arnesi's call/cc and kall are more like shift and reset than like Scheme's call/cc. I admit I haven't tried nesting them syntactically and see if it still does the right thing, and even less to see if it handles exceptions properly. I'd have to compare the results to those of the examples given in the following documents: * http://okmij.org/ftp/Computation/Continuations.html * http://mumble.net/~campbell/scheme/bshift.scm * http://calculist.blogspot.com/2007/01/non-native-shiftreset-with-exceptions.html Threading ========= The farmer is written with the assumption that it will work on a sequential Lisp implementation, or in a single thread of a multithreaded Lisp implementation. IT WILL NOT WORK CORRECTLY IN A MULTITHREADED ENVIRONMENT. Its concurrent programming activities are built out of a home-grown green thread mechanism on top of arnesi's call/cc. Philip-jose provides green threads intended to be used on the farmer. Because of their execution model, all computations between two calls to some threading or I/O primitives are done in a same atomic transaction. The upside is that you can get a lot done without ever having to use locks or any other explicit mutual exclusion mechanism -- just avoid calling any computation with complex control flow between two operations the results of which should be seen atomically by other threads. Many places in the existing code indeed assume this sequential execution to access to various shared data-structures in atomic transactions without the explicit use of locks. The downside is that when you really need such mutual exclusion mechanism, it is not provided (yet), and other parts of the code haven't been made to use any such mechanism (yet). Writing and providing such mechanisms should be pretty easy however, and are let as an exercise to the astute contributor. Though a skeleton of multithreaded server is provided, it is not tested, probably buggy, lacking in features, and will not work in conjunction with other code in philip-jose. I'm told that the author of IOLib is building a better, more featureful, more stable, infrastructure for I/O in Lisp. Philip-Jose is no such infrastructure. TO-DO List ========== 1- package the system into something usable. Provide documentation, etc. 2- use non-blocking IO to provide for better networking in farmers: the ability to be both a server and client, etc. 3- find how to serialize continuations to provide richer capabilities: persistent (more robust) threads, and mobile code (including load balancing). 4- Implement fork() as a more efficient (though trickier) alternative to fork+exec() for spawning new clients that share initial state. 5- provide a faster alternative to arnesi's with-call/cc, based on e.g. Screamer. 6- implement interfaces that are compatible with existing distributed systems? (Scheme) Termite, Askemos, Tube, Kali, Dreme, [Something by Queinnec] (CL) StarLisp, NetCLOS, GBBopen, ... Erlang 7- use hunchentoot or such to support HTTP instead of the current protocol. 8- implement a simple map/reduce for local-tasks, for worker-jobs 9- implement pure streams, and mappings between pure & impure streams, so that I/O in competing threads only gets committed in the winning thread; also allows for easy backtracking with I/O operations. One thread may (read-string "foo" s) and just catch the exception and die if not available. 10- have a get-read-buffer interface to not read character-by-character 11- have thread IDs, and guard various event-handlers with a test for the thread still being alive. Alternatively have a mechanism to (atomically) remove event-handlers, queued jobs, etc., from their respective queue, set, etc. 12- enforce atomicity with a global flag \*atomic* or such; when this flag is on, weak yields turn into NOPs and strong yields raise error when it is on. Now we also have to distinguish between weak yields and strong yields. 13- Have sub-queues, etc., for scheduling. 14- provide some explicit mechanisms for mutual-exclusion and transactionality. 15- merge with Erlang-in-Lisp Behaving like an Erlang node ============================ * see distel: http://bc.tech.coop/blog/070528.html * the Erlang distribution protocol can be found in the source release at erts/emulator/internal_doc/erl_ext_dist.txt Using ancillary data to BSD UNIX sockets ======================================== If you are send multiple fds in a same sendmsg() make sure you send them all in a single CMSG because multiple CMSGs of same type are broken on many kernels. man cmsg. See SCM_RIGHTS.