Summer of Code is Google's program to help students get involved in free software development; they provide a formal structure, and funding, for students to spend the summer helping real software projects.

Here at Monotone, we are (hopefully!) a mentor organization for Summer of Code 2007. The purpose of this page is to answer any questions you might have about applying to join our project under the Summer of Code program. But, if you still have any questions after reading it, feel free to drop us a line at monotone-devel@nongnu.org.

Who are we/what is Monotone? (overview)

Monotone is a version control system (VCS) -- i.e., the basic program that other projects use to distribute their code, coordinate their work, and generally rely on to get everything done. Because people rely on us to hold all of their work, we have to be extremely reliable. Because people can't get work done at all without their VCS, we need to be portable, keep working in strange situations (e.g., even when people are offline), and easy to use -- so we end up with all sorts of interesting distributed system and user interface problems. And if we do a good enough job, we can guide people to whole new styles of working, so that ''everyone's'' hacking becomes easier, more efficient, and less stressful.

The code itself is written in C++, with a somewhat idiosyncratic style that makes C++ much more pleasant to use than it might otherwise be. Configuration and testing are done using Lua. A comprehensive test suite also makes it easy to make complex changes, without worrying that one has accidentally broken critical functionality. Generally speaking, there are tasks to be done involving user interface design, networking code, database storage, novel algorithms... if you like systems programming and are looking for a project that's not "just another web app", then maybe monotone is for you.

For more information on the project generally, good places to start are the homepage and manual. You might also want to browse over recent commits, or recent mailing list posts. To see the source, you can download a recent release, and the file "HACKING" in the source has a number of useful tips on some of the more odd bits of monotone.

If you have any questions, or just want to chat, then our IRC channel (#monotone on irc.oftc.net, NOT on freenode, or you may be able to use this link: irc://irc.oftc.net/#monotone) is always open -- though depending on the time of day, you might have to hang around for a bit for someone to turn up and notice your question. And, of course, there's always the mailing list.

Project ideas

The classic model for a Summer of Code project is a complete new program or module, that one student starts writing at the beginning of the summer, and finishes at the end. It's our experience, though, that this is totally unlike how we actually write software. We all work on a single (and relatively small by lines-of-code) program, and whenever we think of a change we want to make, the very first question is how we can break it down into small independent changes, so that we can distribute the work and get our initial work into a release as soon as possible. As far as we're concerned, the whole point of Summer of Code is to get you participating in this communal effort and learning how this kind of software actually gets written.

So this year we're trying an experiment. Instead of putting up a bunch of giant projects and telling you to pick 1, we're putting up a list of small to medium sized projects, and you get to pick several. Some of these (esp. at the top of the list) are quite simple and straightforward; some of them (esp. at the bottom of the list) require significant design effort and knowledge of monotone's architecture. (For your information, the magic word is "xyzzy"; see the application below for details.) We strongly recommend that you start with some easier projects and move on to harder ones; keep in mind that when you first start, you will have lots of things to figure out (how monotone's code is put together at all, how to write tests, how to submit patches, ...), and the easier projects are designed to help you figure that stuff out and prepare you for harder ones.

You can also propose to work on projects not on this list, but in that case we strongly recommend that you discuss your ideas with us first, on IRC or the mailing list.

  • mtn edit-log: Like most VCSes, when you commit a revision in monotone it prompts you to describe your changes. In monotone, though, you don't actually have to wait until you commit -- if you prefer, you can describe your changes while you go, by editing the file _MTN/log. This project is simply to create a command that makes this more convenient -- instead of having to know the name of the internal log file, users can simply run this command and have an editor pop up, exactly as if they had typed mtn commit, just without any committing happening. (This task probably involves some cleanup of the code to edit commit messages, too, it's kind of grotty right now.)
  • Support for mtn attach/mtn detach: mtn detach removes bookkeeping information (i.e., the MTN directory) from a workspace. mtn attach turns an existing directory into a workspace by creating a MTN directory. This is useful, for instance, to people who repeatedly import other people's software (released as .tar.gz files) into monotone.
  • Database access pattern visualization: Speeding up monotone requires having really good insight into how it is accessing the database, because the most important factor in how fast we go is how well we utilize caching. Therefore, it would be nice to first instrument monotone to somehow output what files/rosters it is accessing in the database, and then build some scripts that display this information usefully. (For instance, by using graphviz to draw a graph, and then animating it.)
  • Fix commit message output: When you run mtn commit, a text editor pops up to let you describe your changes. Monotone automatically puts a description of those changes into the buffer, for you to refer to while writing your description. However, right now, that description is not very good -- it is the raw revision text that we will save to the database. We have code to generate a nice user-friendly description of changes -- it's what mtn status uses already. So the task is simply to make the message that comes up when you type mtn commit be the same as you would get if you typed mtn status.
  • mtn rename --guess: People sometimes rename files, but do it with mv or in their filemanager, instead of using mtn rename. This means that mtn doesn't know what has happened. Right now, they can fix things up after the fact by issuing a series of mtn rename commands describing each rename they actually did, but it would be convenient to have a command that figured out what had happened by itself, using heuristics. mtn rename --guess should look at what files have disappeared and what unknown files have appeared, and if they appear to be the same, internally issue a rename for them.
  • mtn clean: This command simply deletes unknown or ignored files from the workspace.
  • Ignoring whitespace in diff: It would be nice if monotone's internal diff could ignore whitespace changes on request.
  • Commit message templates: Some projects would like to set a template, so that when a user runs mtn commit, their text editor is pre-filled with some text. To implement this, pre-fill _MTN/log with the contents of the file .mtn-template, if it exists.
  • Automatic updating of sample output in tutorial: Our Tutorial walks the user through a bunch of basic uses of monotone, showing commands and their outputs. But the outputs are always getting out of date, as we change monotone. Come up with some way to automatically go through the tutorial executing each command with the current version of monotone, and putting the current output into the file. (Note, you don't have to do this by parsing the .texi source directly, you could move the tutorial text out into another file that is easier to parse, and then generate the .texi from it.)
  • Make our ssh-agent support work on win32: We recently grew support for using ssh-agent for key storage and signing requests, but this so far only the code to talk to ssh-agent on POSIX-style systems has been written. We could support PuTTY's "Pageant" ssh-agent variant on Win32. This would involve looking at the source code for PuTTY to understand how PuTTY communicates with Pageant (to give you an idea, it uses the Windows thread message queues and shared memory) and writing our own implementation of this code so that we can talk to Pageant directly.
  • Subversion importer: Add code to monotone to read in Subversion dump format, and write it out to a monotone database.
  • Remove boost::filesystem::path dependency: boost::filesystem::path is a library that we use for some path handling and file io stuff. It is not very good, and it is one of our largest external dependencies. Fortunately, our use of it is mostly split off into a few files and insulated from the program, so it should not be too hard to replace it. (The most exciting part is writing a function that can take normalize arbitrary file paths.)
  • Line-ending handling: Many version control systems have a feature where you can arrange that certain files automagically change their line endings when they are checked out -- so, for instance, a source file could have unix-style line endings (LF) when checked out on unix, and windows-style line endings (CRLF) when checked out on windows. Add support for this feature to monotone. (This is requires coming up with a design for this feature that people can agree on, and some changes to how monotone access the workspace.)
  • Output formatting support for mtn automate: Monotone has lots of commands that are designed to be easy to use programmatically, from scripts or whatever. Many of them involve dumping lots of revision ids to the screen. Invent a simple templating language or some other sensible design, and arrange that users can, for instance, request that they see dates instead of revision ids when calling a certain command. (E.g., mtn automate select h:net.venge.monotone --format='$date $author $revision', would print out, for each head of the branch net.venge.monotone, the date it was committed, the author, and the revision id. Or whatever the templating looked like, you get to design that part, this is just an example.)
  • Selectors overhaul: Monotone has a mini-language for referring to revisions, called selectors. So, for instance, anywhere you specify a revision, you can use a hexadecimal id, or you can say something like h:net.venge.monotone to refer to the current head of the branch net.venge.monotone, or you could say t:monotone-0.33 to refer to the revision with tag monotone-0.33, etc. This works well -- in fact, it works so well that we want to redo the current somewhat hacky and incomplete implementation and turn it into a more full-fledged and complete feature. There are lots of ideas for what exactly this would look like; you get to figure out which of them is best, get consensus from the community, and implement an interpreter for your new mini-language.
  • Automate man page generation: We recently deleted our manual page, because it had gotten so out of date that it was totally useless. We already maintain reference documentation in two places -- in the source code (for the use of mtn --help), and in the reference manual. Two places is enough. So this project is, add some code to monotone so that it can generate its own man page from the --help strings that are already in the binary, and modify the build system to automatically generate the man page in this way.
  • Make netsync start up faster: Right now, whenever you push your work to a server, or pull other people's latest work from a server (mtn push, mtn pull, mtn sync), there is an annoying pause while monotone builds some "merkle tries" in memory (these are a neat data structure that lets us do very efficient synchronization). Eliminate this pause, probably by creating some sort of efficient cache of them on disk.
  • Redo subprocess pipe support on win32: Monotone has code to spawn a child process and talk to its stdin/stdout, which it uses to implement, for instance, network synchronization over ssh. The code for this is really really elaborate and sucky, because there are two completely different implementations for unix and win32, and the win32 code is completely insane. This is because the win32 api is mostly insane, and we can't fix the win32 api, but we recently learned of a nice trick that would eliminate the insanity in this particular case, and make this code nice and simple and work the same way on both unix and win32. (Basically, the trick is to bind a port on localhost, connect to it over loopback, and then treat the two sockets the same way you treat the sockets that socketpair(2) gives you on a POSIX OS.)
  • Synchronization over dumb protocols: Right now, monotone can only synchronize to another monotone server, over its own special protocol. This causes annoyance for people who don't want to run another server, or who can't get to such a server because of firewalls, or who only have web hosting, etc. etc. So it would be nice if we could synchronize over protocols like HTTP and SFTP. Fortunately, we have a design for how to accomplish such synchronization, which exists in the net.venge.monotone.dumb branch, along with a prototype implementation in Python. Your job is to clean up the on-disk format into something sustainable long-term (some parts are a bit hacky right now), and start moving support for this kind of synchronization into monotone proper. In the ideal case, we end up with the ability to pull over HTTP and push/pull over SFTP directly from monotone.
  • Log UI: There are a lot of ways that mtn log's user interface could be better (see, e.g., ["LogUI"]). Make it better.
  • Conflict handling: Sometimes, when you try to merge two revisions, there are conflicts. Right now, we do not have a nice way to write such conflicts into the workspace, and let the user deal with them. We do know pretty much how to accomplish this, but there's a bunch of code to be written to keep track of what conflicts exist, present them to the user, let the user deal with them, etc.
  • .mtn-ignore cleanup: .mtn-ignore is a file that lists patterns; any files in the workspace that match those patterns are considered "ignored". However, the code that implements this is (a) grotty, and split in a strange way between C++ and Lua, (b) using regular expressions for its patterns, while we decided at the MtnSummit that it should be some form of glob. Fix these issues.
  • Grand unified workspace scan caching: One of the bottleneck operations in a VCS is looking at the user's workspace and figuring out which files they have actually modified. This requires looking in some way at every file in the project (which may be 100,000+ files), and practically every interesting operation (diff, status, commit, update, ...) has to do this before it can do any useful work. Fortunately, there are a lot of optimizations that are possible, including scanning directories in just the right order, uses directory timestamps to detect the creation of new files or the deletion of old ones, etc. Right now, we do have a basic cache to detect changed files by using timestamps, called inodeprints. However, it does not implement any of these advanced optimizations, and is not useful for detecting added or removed files. Implement this.

  • botan/ajisai: Monotone currently uses Botan for its crypto requirements. It might be useful if we could use the in-development Ajisai library for SSL of Netsync (see NetsyncTodo). TODO: find out what the status of Ajisai is. (Short answer: SSLv3/TLSv1.0 support, works, missing many features and doesn't support anything but BSD sockets). One immediate improvement to Botan would be making use of its assembly-code modules for SHA1 and big-number math routines - they would just need integration with Monotone's autoconf/automake-based build system.

Guitone is a front-end to monotone, written using C++ and Qt. There some tasks that could also improve it:

  • A graph view: guitone is still lacking a kick-ass graph view, one of the most demanded features. Possible C++/Qt libraries which could be used here include libqanava, libtulip or graphviz.
  • A kpart module: Beside the stand-alone Qt applications there have been thoughts about creating a kpart add-in for KDE's konqueror. Implement this!
  • A Windows Explorer Add-In: For Win32 there exist popular VCS frontends including TortoiseCVS for CVS and TortoiseSVN for Subversion which make it plain dead simple for users to deal with version control, by acting as Windows Explorer add-ins. We need one of these for monotone as well!
  • A Win32 installer: Since the majority of guitone users are Win32 users it would be great to have a full-featured installer and uninstaller for the application.
  • A new build system: This is for Python enthusiasts: waf is an alternative build system based on SCons which unfortunately not yet supports Win32 native builds. guitone probably needs some some more tweaks here and there to make this really rock (including further automation for creating all kinds of distribution packages). Work started in the net.venge.monotone.guitone.waf branch.

Questions for applicants

This is the application questionnaire which we have submitted to Google; when you apply through their website, they will ask you a number of generic questions, and then give you this template to complete:

(If you have any questions about this application form, you can try #monotone on irc.oftc.net via IRC or monotone-devel@nongnu.org via email for general questions, and njs@pobox.com for private questions.)

BACKGROUND:
Who are you, beyond some characters on a screen?  Tell us a little about yourself.

Why are you applying to SoC?  What do you hope to get out of it?

Why are you applying to monotone, instead of some other project?  What about it appeals to you?

We would very much like to read some code, even if it just for a school project or whatever.  Please give a URL for some code you have written in the past (alternatively, email a .zip or .tar.gz file to njs@pobox.com, and note here that you have done so):

Have you talked to us already, for instance on IRC or the mailing list?  If so, what nick/email did you use?  (This is to help us match up the people we remember talking to with the apps on Google's site.)

LOGISTICS:
What is your work schedule this summer?  In particular, when do you anticipate starting work, will you be gone for any time during the summer, and roughly how much time do you anticipate spending on Summer of Code work each week?

How can we contact you if we have questions about your application?  Please include any or all of IRC nicks, email addresses, IM screennames, phone numbers, whatever you feel comfortable giving us (we will not make them public) and will allow us to reach you.

We know that there is a lot of writing on our Summer of Code wiki page, but we wouldn't have written it all if we didn't think it was important.  To show us that you've read that page, what is the magic word that is mentioned in parentheses at one point in the text?

PLANS:
Please list the projects you plan to work on this summer, if accepted.  At a minimum, this list should include what the projects are, how long you anticipate each will take, a breakdown of where that time will go, and what results you will achieve at each stage:

Please list some places where the schedule you just gave could run into problems.  Which do you think are the most likely, and what will you do to adjust if they do occur?

Final question: Why should we choose you to receive money for doing the above work, rather than some other student?

What happens if I'm selected?

TBD

subscribe to mailing list, start lurking on IRC, requirement of weekly status updates, probably requirement that you complete or at least have code to show for one quickie before the actual start date of the program (per Leslie's comments that we can require some small amount of work before the first payment this year, but need to figure out what the official word on such things is), note that we don't actually expect that anyone will end up following their schedule exactly, we just want you to be working and engaging. Something about how we define "success".