Hobby-hacking Eric

2006-12-31

distributed chiming in

It seems that distributed version control has become a somewhat hot topic lately. The latter two posts make the case that being able to work offline is extremely useful, both for road warriors and for users with less than ideal Internet access. Yes, this does seem like a pretty good motivation for distributed version control. Indeed, it was my laptop and my dialup connection (aside from sheer curiosity) that first got me using darcs two years ago. But now I have one of these fancy ADSL connections and a less need to travel or hack offline while doing so. Yet I continue to use and love darcs. I'm sure this is something that bzr, git, mercurial, etc users can attest to: yes, offline versioning is indeed a great feature, but there is something more.

Warning: this is a rather long post. My apologies to planet haskellers and other busy readers.

one mechanism - many benefits


The thing that attracts me to a system like darcs is its conceptual elegance. From one single mechanism, you get the following features for free:
  1. Painless intialisation
  2. Offline versioning
  3. Branching and merging
  4. Easier collaboration with outsiders


These are all the same thing in the darcs world, no fanciness at work whatsoever. I suppose it's not very convincing to sell simplicity in itself, so in the rest of this post, I'm going to explore these four benefits of a distributed model and discuss what their implications might be for you the developer.

Painless initialisation

Getting started is easier because you don't have any central repositories to set up. That might sound awfully petty. After all, setting up a central repository is nothing more than a mkdir and cvs checkout. But it's much more of an inconvenience than you might think.

Setting up a central repository means you have to think in advance about putting your repository in the right place. You can't, for instance, set something up locally, change your mind, deciding that you want a server and switch over instantaneously. You COULD tarball your old repository, move it to the server, and either fiddle with your local configuration or checkout your repository again. But why should you? Why jump through all the hoops? The steps are simple, but they add friction. How many times have you NOT set up a repository for your code because it would have been a pain (a 30 second pain, but a pain nonetheless?). How many times have you put off a repository move because it was a pain? Painless initialisation means two things (1) instant gratification (2) the ability to change your mind. I would argue that such painlessness is crucial because it brings down the barrier of inconvenience to the point where you actually do the things you are supposed to do.

Branching and merging

A well thought out distributed version control system does not need to have a notion of branching and merging? Why? Because a branch can simply exactly the same concept as a repository, as a checkout. No need to learn two sets of concepts of operations or two views of your version control universe. Just think of them as one and the same. Now, you might be worried about say, the redundancy of all this (gee! wouldn't that mean that branches take up a lot of space?)... but eh... details.

For starters, disk space is cheap, at least much cheaper than it was in the past. There might be cases where you are trying to version very large binary files, but for many programming jobs, we are only shuffling text around, so why worry? Besides, branches are supposed to be disposed of one day or another (merged), right? It's not like they're going to be that long lived, otherwise it's just a fork. Moreover, worrying about disk space is the version control system's job, not yours. You could have a VCS that tries very very hard to save space. For example, it could try to use hard links whenever possible, in much the same manner as "snapshot" backup systems. Disk space is not what you the programmer should be worrying about. It's similar to the case being made for having a second or third monitor: programmer time is more valuable than disk space.

Offline versioning

Previous posts have discussed this at length. It's still useful, even if your Internet connection is superb. It's useful because it lets you hold on to things that aren't quite ready for the central repository, but worth versioning until you're more confident about your code.

Collaboration with outsiders

Open source and free software projects thrive on contributions from outsiders. For example, in the past year, 80% of the 360 patches to the darcs repository have come from somebody other than David Roundy, the original author of darcs. I'm cheating a little bit because many of these patches are also from known insiders. Then again, all of the known insiders were outside contributors at some point. The switch from outsider to insider has been for the most part informal: you send enough patches in and people eventually get to know you. And that's it; very little in the way of formal processes.

Outsider collaboration is made easier for two reasons, offline versioning and decentralisation.

By offline versioning, I mean that people can make their modifications locally, retrieve changes from the central repository and still have their modifications intact, versioned and ready to go. Consider a popular project, like the mail client mutt. Some mutt users have patches that are useful for a few people, but not appropriate for the central repository. So they make their changes available in the form of a Unix patch. If you're lucky, the patch applies to the version of mutt that you've downloaded. If you're not so lucky, you've got some cleaning up to do and a new set of patches. I'm not talking about merging or conflict resolution, per se. Assume the conflict resolution requires human intervention. You've fixed things so that it compiles against the new version. What do you do exactly? Make a patch to the patched version? "Update" the original patch so that it works with the new version of the repository? And what about the original author, what does s/he do with your patch? These kinds of things are not so hard in themselves, but they are a major source of friction. They gum up the works of free software development, or any large project, open source or closed.

If you are a project maintainer, having a tool that handles offline versioning means that it is easier for you to accept changes from outsider contributors (zero insertion force - no need to apply patches and re-commit them).

If you a contributor, having an offline versioning tool means that it's easier for you to submit modifications to the project. You don't have manually create patches: you don't have to keep around a clean and working copy of the project, you don't have to worry about where you do your diffs (so that the patch --strip options come out right), you don't have to worry about what happens when the central repository changes and your patch no longer applies. Again, I'm not referring to conflict resolution. If there are conflicts, somebody will have to resolve them; but the resolution and versioning of these conflicts should involve as little bureaucracy as possible. For extra credit points, some version control systems even implement a "send" feature in which you submit your local modifications via email. The maintainers of the repository can then choose to apply the patch at their leisure. These aren't regular Unix diff patches, mind you, they are intelligent patches with all the version-tracking goodness built in.

Offline versioning adds convenience to the mix, a technical benefit. If you flip it around and look at it in terms of distributed controls, you can see some pretty subtle social consequences as well. Since there is no need for a central repository, there is a lot less pressure for the central maintainer to accept patches or reject them outright because you know that the outsider contributors can get along fine with their modifications safely versioned in their local repositories. Worst come to worse, the outside contributors can place their repositories online and have people work from there instead. It sounds like a fork, which sucks... but sometimes, fork happens. Look, sometimes you get forks for differences in opinion, disagreements between developers, or general unpleasantness. But sometimes you get more innocent forks, for example, the main developers suddenly got a new job and is now working 60 hours a week. S/he is still committed to the project, but to be honest s/he hasn't been looking at patches for the last month. No big deal, the rest of us will just be working from this provisional repository until the main developer gets back on his/her feet. There's a social and a technical aspect to forking. Distributed version control greatly simplifies the technical aspect, and that in turn mellows out the social one. Distributed version control means that life goes on.

simplicity and convenience


I'm really only making two points here. Simplicity matters. It reduces the learning curve for newbies and removes the need for experienced users to carry mental baggage around. Convenience matters. It reduces the friction that leads to put off the things you could be doing and it removes some of the technical barriers to wide-ranging collaboration. I could always be mistaken, of course. Perhaps there is some bigger picture, some forest to my trees; and upon discovering said forest I find myself deeply chagrined, getting all worked up over something so silly as patches. But until that time, I will continue to use darcs and love it for how much easier it makes my life.


2006-12-27

the Haskell metatutorial

Thanks to ndm for so clearly articulating the idea. Here is my attempt at implementing the Haskell metatutorial. Please add your stuff or even just expand the guide tree. Embrace the explosion of tutorials. Make something useful of it.


15:05:09 < AStorm> YAHT leaps a bit too far for me. I'd like something complete but less steep.
15:05:47 < metaperl> YAHT is probably as good as it gets INHO
15:05:49 < metaperl> IMHO
15:06:01 < uccus> there should be a grant unification project for Haskell tutorials
15:06:02 < metaperl> the "algorithms" book is not bad either
15:06:26 < kowey> uccus: the wikibook attempts to remix heavily
15:06:28 < uccus> *grand [blushes]
15:07:01 < kowey> we've got yaht, write yourself a scheme, jeff newbern's stuff, some original content, all mashed up and duct-taped together
15:07:03 < ndm> what i would like is a meta-tutorial
15:07:14 < ndm> a list of questions about haskell, what does this do, do you understand this etc
15:07:26 < ndm> and if you say no, it points you at a tutorial which explains it
15:07:28 < uccus> well, mashed up and duct-taped is not good
15:07:41 < ndm> is there a tutorial on pattern matching, for instance?
15:07:44 < uccus> aah. yes. I agree with ndm
15:07:47 < kowey> we could use some heavy editing
15:07:48 < ndm> which covers ~, !, @ etc
15:08:12 < kowey> right, me too... it's like the malcolm gladwell stuff
15:08:17 < uccus> the wikibook can do that
15:08:31 < kowey> many "right" flavours of coffee, pepsi; extra-chunky tomato sauce, etc
15:08:40 < uccus> it's divided into sections... they can contain links to complete tutorials
15:08:59 < uccus> everyone has a different style of tutoring you know...
15:09:05 < kowey> i agree
15:09:16 < kowey> the wikibook right now is newbie-oriented
15:09:28 < kowey> but we could steer it towards choose-your-own-adventureness
15:09:43 < uccus> kowey: the wikibook right now has different steams for newbie/advanced(?)
15:09:45 < kowey> comments on the discussion page on how we could implement this would be quite welcome
15:09:55 < kowey> we have two tracks, newbie and advanced
15:10:16 < kowey> although the advanced track assumes you've basically just gotten through the newbie track... it tries to be a friendly "advanced"
15:10:19 < uccus> yes, but shouldn't there be more?
15:10:39 < uccus> tracks?
15:10:41 < kowey> well, it's got two tracks in terms of material, one track in terms of style
15:11:04 < kowey> what ndm is talking about is having multiple tracks in terms of style (well... style, level)
15:11:10 < uccus> I mean, the grand Haskell wikibook should contain things that are really advanced
15:11:25 < uccus> like tutes for gtk2hs...
15:11:37 < ndm> kowey: i more meant accepting there will be loads of tutorials, but trying to point people at those which will teach them something new
15:11:42 < uccus> *that* should be called advanced
15:11:44 < kowey> i tend to suspect that's more the Haskell wiki's job
15:12:13 < kowey> although there is http://en.wikibooks.org/wiki/Haskell/GUI
15:13:07 < uccus> aaah. thanks kowey. that's enough I think.
15:13:33 < kowey> ndm: i think we're saying the same thing, although i'm speaking horribly imprecisely


2006-12-25

unraveller

I wish unravelling code was easier. For me, coming to grips with a large software project often consists of undoing bundles of yarn, the big chain, multiple dependencies coming in, multi dependees going out. You want to find out where function foo is used. Well, it turns out that it's used by bar, so now you have to find out where bar is used, and oh it's used by quux... and... oh boy.

I wish there was some tool that would let me start from the center of the big old yarn ball and swim out to the surface. I'm not sure how something like this would work either. Ultimately, all the low-level stuff is used in your main, right? So how can you display this kind of information without just overwhelming the user?

Ctags/Hasktags aren't too useful here. They solve the opposite the problem, in which you've got some top level function and you want to drill down to the guts. Graphviz-ing your modules just gives you a thicket, something to marvel at, but not a great source of insight. Dep trees in some hypothetical browser don't sound that useful either (sigh, click the triangle, expand the subtree, whoah! too much! collapse! collapse!).

The mental image I get is that you run it on your code:
unravel Population.applyToPop *.lhs *.hs
and you'd get a browseable, filtered view of the source code, the relevant bits highlighted, useful links in the right places, but not so many that you feel overwhelmed. I want it to be a standalone tool (like a graphical diff utility), something I can run without having to learn how to use your favourite IDE. Likewise, I want it to be quick and easy (again, the standard for ease being a graphical diff). I don't want to set up a project folder or whatever just so I can run the tool.

This is one annoying aspect of being a user, hell a consumer of any kind. I know I want something, but I don't know exactly what, except that I'd probably recognise it when I see it. The ideas in my head are muddled. If somebody sat me down for a user-interview, I'd just sort of ramble on incoherently for 10 minutes, get confused and wander off. Dear whoever, please build me an unraveller. Think like an Apple UI engineer here. Build the tool from user-experience-in, not from functionality-out. Don't listen to my precise demands, because I don't really know what I want. Just help me understand this code.

Anybody know what it is I'm looking for? (If you do have such a tool, a good exercise would be to run it on darcs. Can it help you figure out what a Slurpy is? How about a Population?)


2006-12-22

Denotational semantics

Apfelmus from IRC just wrote what looks like a very nice chapter on denotational semantics. I am reading it now and learning lots. Why not check it out and post comments on the talk page? I'm sure he'll appreciate any comments the Haskell community might have, or anyone with an interest for that matter.

Thanks, afpelmus; what a great Christmas present!


2006-12-17

It's alive!

Hier sur #haskell.fr, on était plus que 2 personnes pour la première fois. Grace aux iniatives de Nanar et cie, il est possible qu'un veritable communauté francophone d'Haskelliens (?) verra le jour.

Sorry for that attempt at parlezing français. I'm pleased to see that the French Haskell community has now increased two-fold to an amazing four people (on the #haskell.fr at least). Venez nombreux or something. It'll be fun. Thanks to Nanar for kickstarting things with a wiki/mailing list/channel creation initative.


2006-12-10

rewriting PLEAC Haskell?

One complaint I have heard about the Haskell cookbook is that it is rather unidiomatic and thus not very helpful for people trying to learn Haskell. For example, one particularly shocking thing the implementation does is to shadow (.) operator to make more "object-like": o . f = f o (reverse dollar? euro?). This leads to snippets of code which, as one of the wikibook commentors put it, are barely recognisable as Haskell:
s''' = s1.split " ".reverse.join " "


PLEAC Haskell in its present form is not very suitable for educational purposes, but what if the Haskell community ran through and cleaned it up? Only the first two chapters are implemented anyway, so it doesn't seem to be all that much; the only substantial thing to rewrite perhaps being in soundex code. I personally cannot invest any time in this, being already behind in other projects like darcs and wxhaskell, but it might be a fun project for Haskell enthusiasts, or even yellow-belt Haskellers trying to come to grips with the language.

If interested, you should probably subscribe to their mailing list and maybe bounce around some ideas on the Haskell café. Another thing to consider is contacting the original author Yoann. It would be good to get him on board, maybe with a little gentle persuasion. I mean, he probably thought it was a good idea to make the language more recognisable to newcomers. Nice thought... but maybe he would now agree that newbies would be better off with more idiomatic Haskell.