Hobby-hacking Eric

2008-11-19

iterative committing

A few weeks ago, I saw this interesting complaint about distributed revision control advocacy:
But really, to read some of these articles, you'd think 99.9% of OSS contributions come from people who live on planes, only get 10% uptime on their broadband at home, and are incapable of spending the five minutes required to install something like Subversion locally for use with side projects.
This particular complaint resonated with me because I've always had a slight feeling that all this talk of airplanes and intermittent online access is missing the point.

I think what would help these discussions is to introduce the idea that there are really two ways to be disconnected: the involuntary way that most people talk about, and the voluntary way which is the really interesting one.

To be involuntarily disconnected is to be literally or technically offline. The universe prevents you from phoning home because it broke your wifi card or plunked you deep in an Amazonian rain forest. True, a distributed revision control system lets you continue hacking in the face of such adversity; but this fact isn't very convincing to some folks who are used to centralised revision control. How often in today's world are you really involuntarily offline? The trick is that sometimes your disconnectedness is entirely voluntarily. I don't really mean that in the sense of unplugging your cable modem and calling a moratorium on network access for the day. The minute you want to commit to a server and can't because of missing network access, you are offline involuntarily, even if this came as the result of a voluntary decision.

What I mean is that distributed revision control allows you to have pockets of deliberate disconnectedness from your peers. You want to work on something in little bits and pieces, you want to version control your work in progress, but you don't want to inflict your uncompleted work on your friends. A distributed VCS gives you a chance to step back for a moment and continue working with the benefit of version control. There are two alternatives to stepping back, neither of which are really acceptable. The first is to go ahead and commit your stuff with wild abandon, the consequences of which being that you pollute the change history with unfinished work and make life potentially difficult for your friends. The second alternative is NOT to commit your stuff at all, the consequences of which being that you lose the ability to track and log your your work as you go along.

A distributed revision control system gives you the choice of iterative committing. It doesn't really matter if you are online or offline actually. Sometimes you just want to commit something for your own sake and only later decide if the commits should be shared with the main repository or not. In the meantime you can choose to go back, undo a commit, redo a commit, undo all your intermediary commits and lump them all into one big commit, update from the main repository and then rework your commit in the new context. These are the choices that a distributed revision control system offers.

It's heartening to see that the idea of using a distributed VCS is catching on, that people are starting to adopt the likes of darcs, git, mercurial and bzr for their work. It means that the joy of iterative committing is spreading. Of course, I am partial to one of these systems in particular. Perhaps in a future article, I can describe what I think is the essential difference between darcs and our estimable competitors. I think I will call it iterative merging.

Happy committing in the meantime!


14 comments:

ernestkoe said...

great post, does darcs support binary diffs?

kowey said...

Alas, it does not at the moment. Our current approach to handling binary diffs is to treat each new version of the binary as removing the old version and replacing it with a whole new one. It is certainly less than ideal. Binary diffs is one of our planned optimisations however and it is certainly well within the reach of darcs. It's just one of those fiddly things that we need to implement one day :-)

Please post a comment on our bugtracker, specifically on issue1233 for why this might be important to you

Jakub Narebski said...

First, iterative committing looks like fancy word for creating perfect patch series. If the feature you work on is large and complicated to implement in full, you should do this in steps. And that means work on topic branch. Additionally you usually would not create patch series perfectly on first try, so you would have to insert patches in series (for example fix to a bug which you noticed during implementing feature), split patches if they became too large, etc. And you can sanely do that only if series of patches you work on is not published; i.e. if you are voluntarily disconnected (at leats wrt. given topic branch you work on).

Second, merging for me is joining two lines of history, selecting some state as representing this join. By definition merge then cannot be incremental; you can have incremental resolving of merge conflict (e.g. using git-stash in git). What you call incremental merge is IMHO not merge at all...

Side note: I am Git user (and a bit of developer).

Michael said...

It seems to me that this advantage of DVCSen - a voluntarily disconnect from collaborators, while retaining the backup benefits of revision control - could be, albeit awkwardly, simulated in a centralized system.

To do this, we could have everyone create a private branch on the repository/depot. You would commit to that branch first, then merge onto the trunk or shared parent branch when you feel your work is finished.

This approach would largely solve the "voluntary disconnect" problem while offering some benefits over the DVCS solution, such as the ability for other people to see what you are doing in case you need their help with a hard problem.

The downside, of course, is that you would lose all the other benefits of using DVCSen. Also, if your VCS is not so smart about keeping its repository small, this approach could radically increase the size of the repository - especially bad when you have a hosted system with limited space.

Anonymous said...

Michael: There are commercial systems which do exactly that, for companies which want the control of a centralised system but the convenience of lightweight branching. One such is Accurev.

Jakub Narebski said...

@Michael: true, you can simulate private branches (voluntary disconnect) in centralized SCM, but the way they can be implemented (more like: faked) in centralized VCS (SCM) has the following disadvantages: they would be visible whether you want them or not (you can create new public repository to share work in progress with DVCS); you have administrative headaches with allowing to create new branches and to delete failed experiments; and you have to deal with headaches with creating namespaces so everybody would be able to name their branches as they want; and of course you need tools to be able to work incrementally on series of patches effectively (equivalents of "git commit --amend", "git rebase --interactive", various patch management interfaces). So yes, you can reimplement this part of DVCS, badly ;-) And why to use poor imitation instead of proper distributed version control system... ?

What you cannot do with centralized VCS by the very definition is create network of trust that Linus Torvalds was talking about in his Google Tech Talk about Git (IMHO most important part of this presentation). This for example allows to work on some experimental tipics and features in groups and subgroups.

kowey said...

Thanks for the comments, everybody!

@Jakub: I'm having fun learning how to use git right now :-) Hopefully if enough of us in the darcs community get a good handle on git, it will help us to improve darcs.

One of the things we want to steal from you is the ability to do something akin to a rebase. For the most part, patch commutation makes rebase unnecessary (spurious dependencies can just be automatically commuted out)... but sometimes commutation (what we think of as the clean way) isn't enough and you have to bring out a more forceful approach. Otherwise, I am also intrigued by this CryptoDAG model that Zooko seems to be very excited about, and want to learn more.

I'm not going to say very much about merging right now until I've gathered my thoughts. For what it's worth, you seem to be distinguishing between an implicit merge and an explicit merge (I'm just making these words up again, uh-oh). To achieve an explicit merge in the darcs world, I think we would just create a tag (which is a trivial patch that depends on other patches).

kowey said...

@Michael and @Silhouette

Good point, having a branch in centralised revision control system allows you to have a certain distance from your peers. But I think Jakub's answer is pretty much the right one -- centralised branches just aren't good enough. Their being public, for example, means that you don't actually have the choice of making something vanish from your history altogether, or editing a commit.

It is a question of scale as Silouhette points out, although not in the sense that you might think. It's not so much a question of "I need to do a lot of branching, therefore I need a system that allows for lightweight branching". It's more a question of "hey, I've got this system that allows for lightweight branching and WOW, actually I never realised how useful it was to be able to create personal branches all over the place".

This is why selling the idea of a DVCS isn't always easy. It does no good to convince people that a tool supports certain features, unless you can also convince them that the new workflow associated with these features is useful too.

The idea is that when you reduce the cost of something (branching) from merely small to truly infinitesimal, something new emerges! It doesn't just mean that you can do what you used to do, only cheaper; it means that you can do something completely different, taking advantage of the fact that, say branching, is now free. Anybody got a good analogy for this phenomenon?

Jakub Narebski said...

@kowey: The problem with comparing Darcs (and its idea what merge means) with other distributed version control system lies IMHO in the fact that Darcs is something between classic version control system (SCM) and patch management system.

I wonder if it would makes sense for TopGit, (which is new patch management interface on top of Git, with its concept of branch dependency and explicit serialization to Quilt-like patch series) to "borrow" detecting textual patch dependencies by the use of patch commutation...

Sam said...

@kowey: "It doesn't just mean that you can do what you used to do, only cheaper; it means that you can do something completely different, taking advantage of the fact that, say branching, is now free. Anybody got a good analogy for this phenomenon?"

It reminds me of the story I heard about IBM deciding not to invest in Xerox. IBM estimated the market for photocopiers based on the existing market for carbon paper.

Anonymous said...

Having switched from centralized control to a distributed version control system (git) a few years ago, I agree very much with your post.

One way I have described the difference between the two has been to say that a DVCS splits the act of committing from the act of publishing. So maybe thinking of it that way is a good way to communicate it to non-believers.

As an aside, you could potentially have a centralized VCS that gives you commit-like powers without publishing (or publishing to a non-public area on the central server). But at that point, you almost have a DVCS, so why not just use one?

kowey said...

@Jakub: that's an interesting way of looking at it.

If you or any of the of the TopGit developers have questions about patch commutation or the feasibility of using patch commutation in TopGit, I'm sure the darcs community would be more than happy to help

Aidan Delaney said...

I like the term iterative committing. It's closer to what my mind does than what the tool does.

I'm a mere human so I tackle large problems in some kind of stepwise manner, be it iterative improvement or divide and conquer etc... So to have a tool that allows me to check in my small changes that map my mental model can only help to communicate what I was thinking about the code at the time. This, in turn, can help others understand the design changes I've made. I for one welcome our new iterative committing overlords.

I wrote a paper on something similar a while back http://foss.it.brighton.ac.uk/downloads/ppig2006.pdf

Plus, I've finished the bottle of leffe you left in my house on Sunday.

Zooko said...

I'm a bit late in contributing to these blog comments, but this is why the act which is "darcs record" in darcs and "svn commit" in svn should not be called "commit" in darcs. Because you aren't *committing* to anything. You can still change your mind. Now, "darcs push" -- that's commitment. You can't change your mind and make it as though you never did it. (Unless you get everyone to whom you pushed the patch to agree to pretend you didn't do that.)