koweycode

2013-03-17

moved to erickow.com

I've migrated my various blogs to erickow.com. Readers of this blog may more interested in and want to point their RSS readers to a limited set of posts instead, for example, those tagged haskell or darcs than the unified whole. (That said, I don't seem to blog very much these days anyway and tend to use G+ for my techie commentary.)

Also if you have a Wordpress and/or Blogger blog you'd like to import to hakyll, you may be interested in my hakyll-convert utility.

2011-05-03

untangling a cabal install problem

I sometimes have trouble translating abstract general explanations to my particular concrete cases. I hope that by sharing a very concrete situation I experienced, other users may recognise themselves and get unstuck on their own problems.

I finally untangled out a cabal install problem that's been bugging me for some time, almost driving me to use cabal-dev on all my packages (which seems like it might be a bit inconvenient)

So I have a fairly standard setup (at least, it was standard when I wrote this post), GHC 6.12.3 with the latest released Haskell Platform. I'm working on two packages, GenI and nltg-hillwalking simultaneously. Switching from one to the other is painful. When I try to install GenI typing "cabal install" results in this horribly disheartening sequence, where it installs random, haskell98, cpphs, haskell-src-exts, derive and finally GenI. If I then switch back to working on hillwalking, I then get this another discouraging sequence involving random (again!), QuickCheck, test-framework, ntlg-hillwalking. And going back to working on GenI, I go through the same pain again.

It took me a while to work out that the problem was just the interaction between these two packages. Having had a chance to chat about this with Duncan and Ian, I got a bit of a clue about what the problem might be. Indeed, when I ran "cabal install --dry-run -v2", this little bit of output caught my eye:

In order, the following would be installed:

random-1.0.0.3 (reinstall) changes: time-1.1.4 -> 1.1.2.4
haskell98-1.0.1.1 (reinstall)
cpphs-1.11 (new package)
haskell-src-exts-1.10.2 (new package)
derive-2.4.2 (new package)
GenI-0.21 (new package)

See that little arrow? It says that the reason random, the cause of all my heartache, is being reinstalled because of it wants to depend on an older version of time. Why on earth would it want to do that? ... Oh, because I told it to. Apparently, some past version of myself decided to put this dependency in GenI.cabal: time == 1.1.2.4

Oops!

I think the problem looks like this. GenI uses the derive package, which triggers a chain of dependencies all the way down to random and time. Unfortunately, GenI also directly depends on time but now we have an issue. I'm not entirely clear on why this causes a recompile as opposed to the more usual "this will likely cause an error" output (maybe the latter is only appropriate for direct dependencies, ie. if derive depended on time itself?).

By forcing GenI to use this old version of time, I was indirectly forcing it to install a version of the random package that depends on this old version. In doing so, I would clobber the version of the random package that QuickCheck uses.

Fixing the issue in GenI was relatively straightforward. Did I really need to be using such a constrained version of time? It turns out that time == 1.1.* works perfectly fine (taking advantage of the PVP promise of backwards compatibility in all A.B.* versions of a package). Just one little dependency and everything works a lot more smoothly.

So what did I learn from this?

take a deep breath - I think when I'm faced with these issues, I'm feeling really impatient to get on with my work. But solving the issue involves recognising just some silly little problem, which can be hard to do when I'm being impatient. So part of the trick is to defocus somehow and shift to poking mode.
use cabal install --dry-run -v2 and study the end part : what packages are we trying to install and why? The -v2 is important because it tells you why packages are being installed.
??? hunt for the offending dependency - for me this was a simple case of staring at GenI.cabal. What if GenI depended on some library which in turn depended on time-1.1.2.4? I guess the answer would lie in the list of packages that cabal-install says it would install. The dependency must lie *somewhere* in the chain.

If I understand correctly, this may actually an improvement over the pre GHC 6.12 days before the ABI hash was introduced. I don't actually know, but I could imagine there's something that'd make one random not-quite-compatible with the other, even if they're both version 1.0.0.3 and silently swapping one out for the other would cause subtle breakage. At least now, we know if something is wrong and we can fix it relatively easily by just reinstalling the missing package.

This dependency stuff must be really tricky! It looks like there may be some work that could make life better, for example, a Nix-like approach where both versions of random 1.0.0.3 could co-exist. But we should be glad in the meantime that Duncan et al have not torn their hair out yet. (Just think of the pre-Cabal-install days if it helps, life's much better now, isn't it?)

In order, the following would be installed:

random-1.0.0.3 (reinstall) changes: time-1.1.4 -> 1.1.2.4
haskell98-1.0.1.1 (reinstall)
cpphs-1.11 (new package)
haskell-src-exts-1.10.2 (new package)
derive-2.4.2 (new package)
GenI-0.21 (new package)

Oops!

take a deep breath - I think when I'm faced with these issues, I'm feeling really impatient to get on with my work. But solving the issue involves recognising just some silly little problem, which can be hard to do when I'm being impatient. So part of the trick is to defocus somehow and shift to poking mode.
use cabal install --dry-run -v2 and study the end part : what packages are we trying to install and why? The -v2 is important because it tells you why packages are being installed.
??? hunt for the offending dependency - for me this was a simple case of staring at GenI.cabal. What if GenI depended on some library which in turn depended on time-1.1.2.4? I guess the answer would lie in the list of packages that cabal-install says it would install. The dependency must lie *somewhere* in the chain.

2011-04-19

why darcs users care about consistency

In the Darcs community, we've been discussing the recent blog posts saying that Git is inconsistent, that it cannot be made to be consistent.

With Darcs being the foil to Git for the purposes of this discussion, I thought it would be useful if I cleared up a few points, particularly this first one:

consistency is a usability issue

When people say they like Darcs, they don't generally talk about it having a beautiful or elegant theory. Instead, they talk about how easy and simple it is to use, about how they never really had to grapple with a learning curve or feel stupid for doing something wrong.

What makes Darcs so simple to use? Did it hit the right notes by accident or through David Roundy's good taste? Or is usability merely in the eye of the beholder? Some of these explanations may be true, but I think what lies at the heart of Darcs' usability is that it supports a very simple way of understanding a repository:

a darcs repository is a set of patches

This mental model may not be suitable for everybody, and in the long run Darcs may need to improve its support for history tracking. But if you want to understand why, for all its current shortcomings, people continue to use and develop Darcs, you must appreciate how refreshingly simple the set-of-patches mental model can be. As a Darcs user you are freed from a lot of the artefacts of worrying about commit order. Collaborating with people is just question of shuffling patches around, with no merge states, no rebases, way fewer spurious dependencies to worry about.

But simplicity is hard. In order to make this simple world view possible, Darcs has to guarantee a property that any ordering of patches allowed by Darcs commutation rules is equivalent. If Darcs gives you the option of skipping a patch, it has to work hard to make sure that if you include the patch later on, that the repository you get is equivalent. That's what the patch theory fuss is about. While it's useful that Darcs tends to attract purists and math geeks, we're really not engaged in the pursuit of some sort of ivory tower theoretical elegance for its own sake. Ultimately what we're after is usability.

A good user interface minimises work for the user, be it cognitive, memory or physical work. The joy of Darcs is being able to focus cognitive work on our real jobs, and not on babysitting version control systems. So when Russell O'Connor says that merges ought to be associative, he's not saying this to tick some sort of mathematical box, what I think he's really saying is as a Darcs user, he doesn't want to worry about the difference between pushing patches one at a time vs all in one go. Consistency is a usability issue.

darcs is imperfect

Darcs is very much a work in progress. Some users have felt let down by Darcs: whenever performance grew to be unacceptable for their repositories, when they hit one exponential merge too many, or when Darcs just plain did something wrong. Even our much vaunted usability has cracks at the edges, a confirmation prompt too many, an inconsistent flag set, a non-reversible operation or two.

I particularly want to make sure I'm very clear about this point:

darcs patch theory is incomplete

We still don't know how to cope with complicated conflicts. Moreover the implementation of our first two theories is somewhat buggy. Darcs copes well enough with most every day conflicts, but if a conflict gets hairy enough, Darcs will crash and emit a nasty message. This is one of the reasons why we don't recommend Darcs for large repositories.

Our version of "don't do that" is not to maintain long term feature branches without merging back to the trunk on a regular basis. This is not acceptable for bigger projects, but for smaller projects like Darcs itself, the trade-off between a simple user interface in the general case, and the occasional hairy conflict can be worth it. In the long run, we have to fix this. We are revising our patch theory again, this time taking a much more rigorous and systematic approach to the problem.

In the interim, we will be gaining some powerful new tools to help work around the problem, namely a new "darcs rebase" feature that will allow users to smooth away conflicts rather than letting them get out of hand. This will be a crucial bridging tool while we continue to attack the patch theory problem.

patch theory is simple at heart

I am in the awkward position of being a non-expert maintainer, having to defer a lot of thinking about software engineering and patch theory to the rest of the Darcs team. In a way, this is healthy for Darcs, because we have long suffered from an excess concentration of expertise. Inverting the pie so that you basically have the number one Darcs Fan as the maintainer is useful because it forces everybody else to break things down into words an Eric can understand.

The good news is that basic patch theory is one of these things an Eric can understand: patches have inverses and may sometimes be commuted. Just learning the core theory teaches you how merging and cherry picking works, why you can trust the set-of-patches abstraction and most importantly, how simple Darcs is. So we're not after some kind of magical AI here, nor are we trying to guess user intention. The things we do with patches are much more mechanical, systematically adjusting patches to context, one at a time, click-clack on the abacus until the merge is complete.

patch vs snapshot is not so important

We think it's important to continue working on Darcs because we are exploring territory that no other version control system is looking at - patch-based version control. That said, patches and snapshots are duals of each other. We think that things that Darcs can do are possible in snapshot based version control and we would be very interested to see work in that direction.

The secret to Darcs merging is that it replaces guesswork (fuzz factor) with history. A darcs patch only exists in the context of its predecessors, and if we want to apply a patch to a different context, we mechanically transform the patch to fit. We think this sort of history-aware merging could be implemented in Git. In fact, we would be excited to see somebody taking up the challenge. Git fans! How about stealing history-aware merging from us?

exponential merges still exist but there are fewer of them

We have developed two versions of patch theory. The second version avoids a lot of the common causes of exponential merge blowups, but it is still possible to trigger them. Recent Darcs repositories are created using version 2 of the theory. For compatibility's sake, repositories created before Darcs 2 came along tend to still be using version 1 of the theory (we only recommend converting if conflicts become a problem).

The most well-known remaining cause of blowups in theory 2 is the problem of "conflict fights" where one side of the conflict resolves the conflict and gets on with their life without propagating the resolution back to the other side. What tends to happen there is that we not only encounter the conflict again in the future, but we also conflict with the resolution!

So life is definitely better with Darcs 2. We've given the exponential merge problem a good knock on the head, but it's still staggering around and we're working our way to the finishing blow.

performance is improving

I think that when people complain about Darcs being slow, they're not talking about the exponential merge problem. They're mostly referring to day-to-day issues like the time it takes to check out a repository. Our recent focus has been to solve a lot of these pedestrian performance issues. For example, the upcoming Darcs 2.8 is like to use a new "packs" feature which makes it possible to fetch a repository in the form of two larger tarballs rather than thousands of little patch files. This makes a big difference!

Another improvement we hope to bring to Darcs 2.8 is the performance of the darcs annotate command (cf. git blame). Annotate has neglected for a while, and to make things better, we've basically reimplemented the command from scratch with more readable output to boot. As an example of something fixed along the way, one misfeature of the old annotate is that would work by applying all the patches relevant to a given file, building it up from the very beginning. But if you think about it, annotating a file is really about annotating its current state; we don't care about ancient history! So one of the Darcs hackers had the sort of idea that’s obvious in hindsight: rather than applying patches forwards from the beginning of history, we simply unapply them from the end. Much faster.

We're not yet trying to compete with Git when working on these performance issues. We admire the performance that Git can deliver and we agree that getting speed right is a usability issue (too slow and your user loses their train of thought). But we've been picking a lot of low hanging fruit lately, solving problems that make Darcs faster with very little cost. We hope you'll like the results!

consistency is a usability issue

a darcs repository is a set of patches

darcs is imperfect

darcs patch theory is incomplete

patch theory is simple at heart

patch vs snapshot is not so important

exponential merges still exist but there are fewer of them

performance is improving

2011-02-18

practical QuickCheck revisited - separate testing hierarchy

I'll begin this post with a quote from 2009-Eric:

This may go down as the kind of bad advice that "seemed like a good idea at the the time".

The advice in question was to "bake unit tests in". The basic idea was that whatever module you write should have its own testSuite function exposing unit tests for that particular module. The advantages were simplicity (no parallel test hierarchy), the ability to ship a binary with self-tests, and the ability to non-exported functions, helper code with a granularity that lends itself more to testing (easier to think of tests for them).

I was unconvinced by the counterargument that it was not a good idea to mix testing and business logic. To be clear, I did agree with the spirit of the advice -- I'm not about go around questioning the kind of wisdom a community gains by watching rockets blow up -- but I felt that I was not advocating any such mixing. All I wanted was to put my testing code in the same file as the business code, cordoned off in a testing section at the end of the file if you want without any sort of if-testing-mode-do-X logic. So I thought that the counterargument was right, but that it didn't apply to this particular context. (I'd be interested to see when/if I change my mind on this, maybe it leads to temptation to mix logic, which is bad.)

In any case, I don't need to change my mind on that particular point. Being the kind of person that only learns the hard way, I've found myself forced to divorce my test code from the business code after all. It's mainly a practical problem of dependencies (this was pointed out by Echo Nolan and Ivan Miljenovic). Forcing users to install QuickCheck and test-framework, when they probably don't care about testing, when they just see your module as yet another dependency on the the road to some other more pressing goal, is really a bit anti-social.

The problem isn't installing the package per se (it all happens automatically with cabal install), but dealing with package version dependencies. So GenI depends on test-framework 2.x and QuickCheck 1.2. What if I go away for a few years, stop hacking on GenI and in the meantime the rest of the world moves on to using QuickCheck 2.x and test-framework 3? What happens when they try to install GenI and cabal install needs to rebuild the random package, which then breaks QuickCheck-2.4 because it depends on random too. Headaches all around.

I think I can live with a separate hierarchy. Arguing with past-Eric a bit:

All the extra modules and what not are not that big a deal (and I could probably let myself go wrt imports, etc).
Who cares if there's an extra geni-test binary, which only gets enable with -ftest anyway?
Self tests, shmelf tests. Seriously, who is going to run that geni --test function anyway?
If I forget to cabal configure -ftest, I can always cabal configure again and build
If I'm really desperate to test some internal function, I could always export an alias like testingFoo for every foo I want to test, applying a sort of Pythonesque we're-all-grownups-here principle.
Also maybe forcing yourself to test only the exported functions, enforces a kind of general black-box thinking which is healthy if you're writing a library.

So, with apologies to Ivan for not understanding his rants 2 years ago; and also anyone that may have listened to 2009-Eric for any messes I got you and your users in, I'm retracting that particular bit of advice and separating my test hierarchies like a good boy. Let's see if 2013-Eric decides to post some kind of retraction retraction.

I'll begin this post with a quote from 2009-Eric:

This may go down as the kind of bad advice that "seemed like a good idea at the the time".

All the extra modules and what not are not that big a deal (and I could probably let myself go wrt imports, etc).
Who cares if there's an extra geni-test binary, which only gets enable with -ftest anyway?
Self tests, shmelf tests. Seriously, who is going to run that geni --test function anyway?
If I forget to cabal configure -ftest, I can always cabal configure again and build
If I'm really desperate to test some internal function, I could always export an alias like testingFoo for every foo I want to test, applying a sort of Pythonesque we're-all-grownups-here principle.
Also maybe forcing yourself to test only the exported functions, enforces a kind of general black-box thinking which is healthy if you're writing a library.

2010-12-02

personal gitit wiki on MacOS X

Here's a quick little recipe for using gitit as a personal wiki on MacOS X. I assume here you already have the wiki itself set up, and now you just want it to run automatically in the background whenever you log in. You can do this by using launchd.

Download net.johnmacfarlane.gitit.plist from this Gist
Replace the WorkingDirectory with the path to your personal wiki
Replace the last part of the PATH to include your cabal directory, and possibly something like /usr/local/bin or /opt/local/bin if you're using Git instead of Darcs
Save the file in ~/Library/LaunchAgents
Test it with launchctl load ~/Library/LaunchAgents, maybe using the Console application to search for logs should something go wrong.
Log out and log back in (or maybe even restart your computer if you want to be sure)

Helpful bits and pieces:

This MacGeekery article
launchd.plist man page
Property List Editor in Developer Tools (beats looking at XML)

Edit 2010-12-04: Fixed broken link

2010-09-22

Early Career Researcher: the computer game

Here's an idea for a computer game called Early Career Researcher. The simple version being a fairly mindless turn-based RPG-esque deal. Nothing earth shattering in terms of game mechanics, but perhaps an amusing toy.

You have

personal attributes (eg. writing, social skills, initiative)
inputs (eg. ideas, papers to review)
daily resources (eg. time, energy)
actions (eg. check email, write paper, write grant proposal, lab work [or some generic term for "actual" research leg work], take nap, go to pub)
outcomes (eg. paper accepted, grant awarded, contract extension)
light bulbs (XP)

The goal of the game is just to maximise light bulbs. The basic model is that every turn consists of a "day" (a day should take about 5-10 minutes to play). In each day, you can do any number of actions, but the kinds of actions are limited by the inputs and daily resources you have. For example, you could do write a paper, but in order to do so, you'd need a paper-topic resource to consume, not to mention time. Likewise, you could check your email and it may only take a few minutes, but it could also use up a lot of your energy. Actions may result in outcomes, but whether or not they do so depends on a combination of personal attributes and luck. For example, writing a paper may result in paper accepted, depending on writing skills, research-fu and the dice roll. Going to the pub (presumably chatting with colleagues) may result in Ideas depending on social skills and creativity and the dice roll. Outcomes generate inputs (eg. ideas) and Lightbulbs (XP). If you get enough XP to level up, you can use your lightbulbs to purchase personal attributes.

As the game develops it should become clearer that it's important to choose your actions wisely, and also to pay attention to the notion of balance. Spending all your time doing lab work or writing grant proposals may seem like a good idea, but if you fail to spend enough time in the pub or take sufficient naps, you may not generate sufficient idea resources to make very much progress. Or maybe if you're too lazy and spending all your time just trying to be inspired, you just don't make sufficient practical progress to get anywhere.

So if anybody wants to code this up as a little exercise...

You have

personal attributes (eg. writing, social skills, initiative)
inputs (eg. ideas, papers to review)
daily resources (eg. time, energy)
actions (eg. check email, write paper, write grant proposal, lab work [or some generic term for "actual" research leg work], take nap, go to pub)
outcomes (eg. paper accepted, grant awarded, contract extension)
light bulbs (XP)

So if anybody wants to code this up as a little exercise...

2010-03-28

hsgtd and friends 1: mutt inbox and actions

I've been practising the methodology of Getting Things Done for over 4 years now, but I'm still not very good at it.

I hope to write a small serious of postings showing my current GTD state of the art. I hope it will be useful to somebody out there and that I will get some ideas on fine-tuning my approach.

Another hope I have is to reach out to technical people who are resisting "becoming more organised" because of the apparent overhead involved. I hope to demonstrate that you can actually get a lot of mileage out of a handful of shell scripts and simple practices (keeping all your mail in a single folder).

Ingredients

mutt - The appeal here is to have a mail client that is malleable and which can talk to 3rd-party software. So it doesn't necessarily have to be as old school as mutt, just scriptable and capable of playing with others.
hsgtd - a command line GTD tracker written in 351 lines of Haskell. Everything is stored in a simple text file

I also use mairix, xmonad and Unison, but these will likely only be relevant in future postings.

Background

In this first instalment, I would like to talk about how I deal with inbox triage. It's useful to know a little bit of GTD terminology for this.

Inbox - things which are not yet triaged. Practicing GTD is like using an issue tracker; you decouple triage from actions. One priority in GTD is to empty out the inbox by performing triage on all items. Working this way is efficient because you avoid looking at the same item or having the same thought about it (gee, I oughta...) twice. Things go in stages.
Next actions - One of the results from the triage process is a set of "next actions", concrete physical actions like, eg. call Bob 398-0811 to see if he wants that spare external disk drive

I use two different programs: mutt to view my inbox, and hsgtd to view my list of next actions. In this series of posts, I'll be exploring how mutt and hsgtd might talk to each other.

Inbox triage : from email to next actions

The most common source of next actions for me is my email, so it is very important for me to good integration between my hsgtd list and my email. In particular, one thing I like to be able to do is to read an email, figure out what "next action" to do with it, record that next action, and pin that email to the next action for reference.

To this end, I have a simple shell script and muttrc macro that you can copy from the hsgtd contrib directory. The shell script greps an email from stdin for its message id and reads the command line parameters for the next action text. It combines the two by adding an hsgtd action using the message ID as a project name. Here's the script to show you how simple and stupid it is:

#!/bin/bash
MSGID=$(grep -i '^message-id' | head -n 1 | sed 's/Message-I[Dd]: /:/')
hsgtd add "$@" "$MSGID"

To make this work with mutt, I also have a small macro that lets me call the shell script whenever I'm viewing a message:

macro pager \Ca "|email-add-action"
macro index \Ca "|email-add-action"

Triage example

So how does this get used in practice? Let's say my inbox has a patch to Darcs from Guillaume.

If you saw Merlin Mann's Inbox Zero talk, there are 5 "verbs" you can apply to an inbox item. Let's run through these. Clearly this is not a mail I want to [i] delete, and for a variety of reasons, it's not something I want to [ii] delegate, or to [iii] defer. Let's look at the email in mutt:

I can't [iv] respond yet because I need take some time out to review the patch so I need [v] track an action for this to do later. I hit Control-a in mutt, and type in "@darcs review this". This creates an action in hsgtd. If I later visit hsgtd and type "list" to see the actions available, I will see the email from Guillaume:

By the way, if you're wondering about the "@darcs", the use of an at-sign before a word is an hsgtd convention for contexts. Contexts are a useful way of dividing up actions because they signify certain constraints on where you can perform the actions (typical contexts might be @home, @work). I use @darcs because working on darcs is sometimes something I'll do in one block at a time. If I type "list @darcs" in hsgtd, it will show me only the actions for that context:

Back to main story. We've now added Guillaume's message to hsgtd. Let's take a closer look at the entry that was created. You see the original action text that we typed in "@darcs review this". Notice how the context @darcs was helpfully highlighted in yellow. In green you will also see a strange suffix like ":<4ba5fc74.0e0db80a.261d.ffff8b51@mx.google.com>". This is useful for three reasons:

It creates a GTD "project" for that email. Sometimes dealing with an email requires more than one action. In the GTD world, any set of >1 action is considered a project.
[most important] It gives you a means for retrieving the email that goes with this action when you are actually predisposed to do that action.
It allows you to be fairly oblique in your next action texts, you can type in any short string which seems to be meaningful without having to be super-precise about it.

Next up: waiting and review

In this posting, we saw a way of extracting "next actions" from your mutt inbox and storing them in an hsgtd list. In a future posting, I hope to expand on this by exploring delegation (asking somebody else to act) and review (going over your actions and delegated items). Actually, the review was what initially motivated this blog posting. I'd finally worked out how to create a virtual mailbox of my hsgtd-tracked items and wanted to show it off. But that will have to wait as this post is long enough as it is.

Ingredients

mutt - The appeal here is to have a mail client that is malleable and which can talk to 3rd-party software. So it doesn't necessarily have to be as old school as mutt, just scriptable and capable of playing with others.
hsgtd - a command line GTD tracker written in 351 lines of Haskell. Everything is stored in a simple text file

I also use mairix, xmonad and Unison, but these will likely only be relevant in future postings.

Background

In this first instalment, I would like to talk about how I deal with inbox triage. It's useful to know a little bit of GTD terminology for this.

Inbox - things which are not yet triaged. Practicing GTD is like using an issue tracker; you decouple triage from actions. One priority in GTD is to empty out the inbox by performing triage on all items. Working this way is efficient because you avoid looking at the same item or having the same thought about it (gee, I oughta...) twice. Things go in stages.
Next actions - One of the results from the triage process is a set of "next actions", concrete physical actions like, eg. call Bob 398-0811 to see if he wants that spare external disk drive

I use two different programs: mutt to view my inbox, and hsgtd to view my list of next actions. In this series of posts, I'll be exploring how mutt and hsgtd might talk to each other.

Inbox triage : from email to next actions

#!/bin/bash
MSGID=$(grep -i '^message-id' | head -n 1 | sed 's/Message-I[Dd]: /:/')
hsgtd add "$@" "$MSGID"

To make this work with mutt, I also have a small macro that lets me call the shell script whenever I'm viewing a message:

macro pager \Ca "|email-add-action"
macro index \Ca "|email-add-action"

Triage example

So how does this get used in practice? Let's say my inbox has a patch to Darcs from Guillaume.

It creates a GTD "project" for that email. Sometimes dealing with an email requires more than one action. In the GTD world, any set of >1 action is considered a project.
[most important] It gives you a means for retrieving the email that goes with this action when you are actually predisposed to do that action.
It allows you to be fairly oblique in your next action texts, you can type in any short string which seems to be meaningful without having to be super-precise about it.

Next up: waiting and review

2010-03-20

darcs team at ZuriHac

Just a quick photo showing what happens when you give a bunch of Darcs hackers a flipchart and a marker pen...

(With thanks to David Anderson for gamely taking this photo for our collective memory)

This was the result of a lively discussion on the future darcs rebase feature, which will make maintaining long-term branches in Darcs a lot easier. Perhaps it'll be ready in early 2011. We'll be sure to take our time to get this right...

2010-01-21

heapgraph tool

Here is a small program to help draw diagrams of heap graphs.

You feed it (via stdin) a text file written a silly little language, for example:

graph g0
node n0 (closure "double" (closure "(*)" "5" "4"))
 
graph g1
node n0 (closure "(+)" n1 n1)
node n1 (closure "(*)" "5" "4")
 
graph g2
node n0 (closure "(+)" n1 n1)
node n1 "20"
 
graph g3
node n0 "40"

...pipe the results through Graphviz

./heapgraph < example | dot -T pdf -o example.pdf

...and what you get back is a little series of graphs like the following:

I never worked out how to tell graphviz to draw the subgraphs from top to bottom instead of left to right. Help would be appreciated :-)

The context is that I'm in the process of reading Rabhi and Lapalme's Algorithms: A Functional Programming Approach. One of its introductory chapters has an explanation of graph reduction. It occured to me that I ought to write lots of little graphs and just walk through them. The general idea is that maybe one of the impediments to my understanding Haskell laziness/strictness was sheer impatience, that I was being far too motivated to make my programs go faster. I'm hoping that a slower and more methodical approach will work, for example, starting by making sure I understand basic ideas like a heap first.

Perhaps such a tool will be useful for you if you are in a similar position, or if you happen to be teaching this sort of stuff.

Here is a small program to help draw diagrams of heap graphs.

You feed it (via stdin) a text file written a silly little language, for example:

graph g0
node n0 (closure "double" (closure "(*)" "5" "4"))
 
graph g1
node n0 (closure "(+)" n1 n1)
node n1 (closure "(*)" "5" "4")
 
graph g2
node n0 (closure "(+)" n1 n1)
node n1 "20"
 
graph g3
node n0 "40"

...pipe the results through Graphviz

./heapgraph < example | dot -T pdf -o example.pdf

...and what you get back is a little series of graphs like the following:

2009-10-08

darcs hashed-storage work merged (woo!)

The following is a copy of my recent post to the darcs-users mailing list.

Hi everybody,

So you may have noticed me saying this in a couple of recent threads. Petr Ročkai's hashed-storage work from his 2009 Google Summer of Code project has been merged!

I thought I would take a few moments to give everybody an overview of how this work benefits us, and where we'll be going in the future.

In a nutshell

What does this mean for you? Faster repository-local operations.

Hashed format repositories (with darcs-1 and darcs-2 patches alike) should now be faster to use on a daily basis. We saw the very beginnings of this work in Darcs 2.3.0 with a faster darcs whatsnew. Now these speed improvements cover all repository-local operations.

The next Darcs beta is a couple of months away, but before that, I would like to encourage you to try this out for yourself:

darcs get --lazy http://darcs.net
cd darcs.net
cabal install

For best results, please run darcs optimize --upgrade followed by darcs optimize --pristine. Pay attention over the next couple of weeks when you try a record, amend, revert, unrecord. If we've done our work right, there should be nothing to see. Darcs should be less noticeable, with fewer "Synchronizing pristine" messages and a faster return to the command prompt. We think you'll like it. But please get back to us. Is Darcs faster for you?

If you're particularly interested, I will step through these changes in greater detail at the end of this message. Meanwhile, I would like to step back a little and take stock of how these improvements fit in to the bigger picture.

The road ahead

The hashed storage work is a big step forward and definitely a cause for celebration. I think it is useful to reflect on this progress and consider how it fits in with our progress since darcs 1.0.9:

ssh connection sharing (darcs transfer mode)
HTTP pipelining
lazy repositories
the global cache

and now

index-based diffing
hashed-storage efficiency

We cannot promise that Darcs will magically become fast overnight. But what we can and will do is continue chipping away at it, solving problems one at a time; release by release, a little bit better, a little bit faster every time until one day we can look back and marvel at all the progress we've made.

So Petr's work makes Darcs easier to live with on a day-to-day basis. But that's not enough. Now we need to turn our attention to that crucial first impression; what happens when people try Darcs out for the first time is that they darcs get a repository they want and... then... they... wait...

This is embarrassing, but we can fix it. In fact, we already have started working on the problem. The next version of hashed-storage will likely introduce a notion of "packs" in which the many often very small files that Darcs keeps track of will be concatenated into more substantial "packs" that compress better and reduce the ill effects of latency. My hope is that we will be able to complete the packs work by Darcs 2.5.

There's a lot more progress to be made: smarter patch representations, tuning for large patches, file-to-patch caching for long histories. And that's just performance! For more details about our performance work, please have a look at

http://tinyurl.com/darcs-performance2

If you could do anything to help, benchmark, profile, anything at all, please let us know :-)

The fight continues.

Thank-you!

Petr and Ganesh deserve a huge round of applause. Petr, thanks for thinking up this work, getting it done and pushing it through. Ganesh, thanks for an extremely thorough and thoughtful review. The two of you, thanks for holding on, for tenacious cooperation in the face of adversity.

Thanks also to all the wider Darcs community for all your support, comments, patch reviews.

I'm looking forward to seeing you at the upcoming Darcs hacking sprint. The sprint will take place in Vienna, Austria on the weekend of 14-15 November. Everybody, especially Darcs and Haskell newbies, is welcome to join in. Details on http://wiki.darcs.net/Sprints/2009-11

And if I may take a paragraph to mention this, Darcs needs your support. Every little counts, if you can send patches, review patches, tweak documentation, profile, benchmark, submit bug reports. Barring that, you could also make a contribution to our travel fund via the Software Freedom Conservancy. See http://darcs.net/donations.html for details.

Thanks everybody and enjoy!

Eric

Changes in detail

Darcs uses an "index" file to compute working directory and pristine cache diffs. This avoids timestamps going out of synch when you have multiple local branches, which saves a huge and needless slowdown.
Hashed storage is more efficient in general. Even if you already have perfect timestamps, the new optimisations should make Darcs faster in general.
The new 'darcs optimize --pristine' reduces spurious mismatches on directories.
Darcs no longer requires a one second sleep after applying patches.

The following is a copy of my recent post to the darcs-users mailing list.

Hi everybody,

So you may have noticed me saying this in a couple of recent threads. Petr Ročkai's hashed-storage work from his 2009 Google Summer of Code project has been merged!

I thought I would take a few moments to give everybody an overview of how this work benefits us, and where we'll be going in the future.

In a nutshell

What does this mean for you? Faster repository-local operations.

The next Darcs beta is a couple of months away, but before that, I would like to encourage you to try this out for yourself:

darcs get --lazy http://darcs.net
cd darcs.net
cabal install

The road ahead

The hashed storage work is a big step forward and definitely a cause for celebration. I think it is useful to reflect on this progress and consider how it fits in with our progress since darcs 1.0.9:

ssh connection sharing (darcs transfer mode)
HTTP pipelining
lazy repositories
the global cache

and now

index-based diffing
hashed-storage efficiency

http://tinyurl.com/darcs-performance2

If you could do anything to help, benchmark, profile, anything at all, please let us know :-)

The fight continues.

Thank-you!

Thanks also to all the wider Darcs community for all your support, comments, patch reviews.

Thanks everybody and enjoy!

Eric

Changes in detail

Darcs uses an "index" file to compute working directory and pristine cache diffs. This avoids timestamps going out of synch when you have multiple local branches, which saves a huge and needless slowdown.
Hashed storage is more efficient in general. Even if you already have perfect timestamps, the new optimisations should make Darcs faster in general.
The new 'darcs optimize --pristine' reduces spurious mismatches on directories.
Darcs no longer requires a one second sleep after applying patches.

2009-09-11

cabal installing graphical apps on MacOS X

I have a graphical command line tool written in wxHaskell. For the longest time, my tool was relatively easy to install on Linux but a pain on MacOS X because my users had to jump through extra post-installation hoops like creating application bundles.

Thanks to some very patient help from Beelsebob, quicksilver, dcoutts on #haskell I was finally able to cobble together a Setup.hs file that lets me do just this. Now when I write install instructions for my program, I no longer need to add extra bullet points telling people to turn knobs and twiggle blops just to run the GUI. It just works.

Note that this was written with wxHaskell in mind. I hope that folks using gtk2hs and qtHaskell either do not have this problem or can make use of a similar solution.

desiderata

What I wanted was for the 'cabal install' command to work as well on MacOS X as it did under Linux. My core desiderata were:

Ability to call my application from the command line the same way you would under Linux with command line arguments correctly recognised
No need for the user to add extra junk to the path (besides $HOME/.cabal/bin which they'll already have added)
No manual intervention after cabal install (eg calling scripts to create application bundles)
No need to be super-user.

basic ideas

The basic ideas behind this solution are

Replace "foo" with a shell script that calls "foo.app/MacOS/Contents/foo"
MacOS X Leopard seems to want graphical applications to live in application bundles. At least for wxHaskell if you invoke "foo" you get a GUI that does not respond to input. On the other hand, if you invoke "foo.app/MacOS/Contents/foo" you get something that works.
Use a Cabal postInst to create the application bundle in the bin dir.

basic solution

Here is the solution. (I'll send it as a mail to the wxhaskell-users mailing list too)

-- --------------- BEGIN Setup.hs EXAMPLE ------------------------------
import Control.Monad (foldM_, forM_)
import Data.Maybe ( fromMaybe )
import System.Cmd
import System.Exit
import System.Info (os)
import System.FilePath
import System.Directory ( doesFileExist, copyFile, removeFile, createDirectoryIfMissing )

import Distribution.PackageDescription
import Distribution.Simple.Setup
import Distribution.Simple
import Distribution.Simple.LocalBuildInfo

main :: IO ()
main = defaultMainWithHooks $ addMacHook simpleUserHooks
 where
  addMacHook h =
   case os of
    "darwin" -> h { postInst = appBundleHook } -- is it OK to treat darwin as synonymous with MacOS X?
    _        -> h

appBundleHook :: Args -> InstallFlags -> PackageDescription -> LocalBuildInfo -> IO ()
appBundleHook _ _ pkg localb =
 forM_ exes $ \app ->
   do createAppBundle theBindir (buildDir localb </> app </> app)
      customiseAppBundle (appBundlePath theBindir app) app
        `catch` \err -> putStrLn $ "Warning: could not customise bundle for " ++ app ++ ": " ++ show err
      removeFile (theBindir </> app)
      createAppBundleWrapper theBindir app
 where
  theBindir = bindir $ absoluteInstallDirs pkg localb NoCopyDest
  exes = fromMaybe (map exeName $ executables pkg) mRestrictTo

-- ----------------------------------------------------------------------
-- helper code for application bundles
-- ----------------------------------------------------------------------

-- | 'createAppBundle' @d p@ - creates an application bundle in @d@
--   for program @p@, assuming that @d@ already exists and is a directory.
--   Note that only the filename part of @p@ is used.
createAppBundle :: FilePath -> FilePath -> IO ()
createAppBundle dir p =
 do createDirectoryIfMissing False $ bundle
    createDirectoryIfMissing True  $ bundleBin
    createDirectoryIfMissing True  $ bundleRsrc
    copyFile p (bundleBin </> takeFileName p)
 where
  bundle     = appBundlePath dir p
  bundleBin  = bundle </> "Contents/MacOS"
  bundleRsrc = bundle </> "Contents/Resources"

-- | 'createAppBundleWrapper' @d p@ - creates a script in @d@ that calls
--   @p@ from the application bundle @d </> takeFileName p <.> "app"@
createAppBundleWrapper :: FilePath -> FilePath -> IO ()
createAppBundleWrapper bindir p =
  writeFile (bindir </> takeFileName p) scriptTxt
 where
  scriptTxt = "`dirname $0`" </> appBundlePath "." p </> "Contents/MacOS" </> takeFileName p ++ " \"$@\""

appBundlePath :: FilePath -> FilePath -> FilePath
appBundlePath dir p = dir </> takeFileName p <.> "app"

-- optional stupff: to be discussed later
mRestrictTo = Nothing
customiseAppBundle _ _ = return ()
-- --------------- END Setup.hs EXAMPLE ---------------------------------

fancier solution

I also have some extra wishlist items.

Possibility of installing in --global
Fancy custom app bundles with custom icons and what not

Global installation might already be working with this basic script, but I haven't tested it yet. Fancy app bundles sort of work (if I double-click it in Finder, I get a customised icon, but running it from the command line does not give me one).

Here are extra hooks I created for this:

-- ------------- BEGIN FANCY Setup.hs ADDENDUM ------------------------
-- | Put here IO actions needed to add any fancy things (eg icons)
--   you want to your application bundle.
customiseAppBundle :: FilePath -- ^ app bundle path
                   -> FilePath -- ^ full path to original binary
                   -> IO ()
customiseAppBundle bundleDir p =
 case takeFileName p of
  "geni" ->
    do hasRez <- doesFileExist "/Developer/Tools/Rez"
       if hasRez
          then do -- set the icon
                  copyFile "etc/macstuff/Info.plist" (bundleDir </> "Contents/Info.plist")
                  copyFile "etc/macstuff/wxmac.icns" (bundleDir </> "Contents/Resources/wxmac.icns")
                  -- no idea what this does
                  system ("/Developer/Tools/Rez -t APPL Carbon.r -o " ++ bundleDir </> "Contents/MacOS/geni")
                  writeFile (bundleDir </> "PkgInfo") "APPL????"
                  -- tell Finder about the icon
                  system ("/Developer/Tools/SetFile -a C " ++ bundleDir </> "Contents")
                  return ()
          else putStrLn "Developer Tools not found.  Too bad; no fancy icons for you."
  ""     -> return ()

-- | Put here the list of executables which contain a GUI.  If they all
--   contain a GUI (or you don't really care that much), just put Nothing
mRestrictTo :: Maybe [String]
mRestrictTo = Just ["geni"]
-- ------------- END FANCY Setup.hs ADDENDUM ---------------------------

desiderata

What I wanted was for the 'cabal install' command to work as well on MacOS X as it did under Linux. My core desiderata were:

Ability to call my application from the command line the same way you would under Linux with command line arguments correctly recognised
No need for the user to add extra junk to the path (besides $HOME/.cabal/bin which they'll already have added)
No manual intervention after cabal install (eg calling scripts to create application bundles)
No need to be super-user.

basic ideas

The basic ideas behind this solution are

Replace "foo" with a shell script that calls "foo.app/MacOS/Contents/foo"
MacOS X Leopard seems to want graphical applications to live in application bundles. At least for wxHaskell if you invoke "foo" you get a GUI that does not respond to input. On the other hand, if you invoke "foo.app/MacOS/Contents/foo" you get something that works.
Use a Cabal postInst to create the application bundle in the bin dir.

basic solution

Here is the solution. (I'll send it as a mail to the wxhaskell-users mailing list too)

-- --------------- BEGIN Setup.hs EXAMPLE ------------------------------
import Control.Monad (foldM_, forM_)
import Data.Maybe ( fromMaybe )
import System.Cmd
import System.Exit
import System.Info (os)
import System.FilePath
import System.Directory ( doesFileExist, copyFile, removeFile, createDirectoryIfMissing )

import Distribution.PackageDescription
import Distribution.Simple.Setup
import Distribution.Simple
import Distribution.Simple.LocalBuildInfo

main :: IO ()
main = defaultMainWithHooks $ addMacHook simpleUserHooks
 where
  addMacHook h =
   case os of
    "darwin" -> h { postInst = appBundleHook } -- is it OK to treat darwin as synonymous with MacOS X?
    _        -> h

appBundleHook :: Args -> InstallFlags -> PackageDescription -> LocalBuildInfo -> IO ()
appBundleHook _ _ pkg localb =
 forM_ exes $ \app ->
   do createAppBundle theBindir (buildDir localb </> app </> app)
      customiseAppBundle (appBundlePath theBindir app) app
        `catch` \err -> putStrLn $ "Warning: could not customise bundle for " ++ app ++ ": " ++ show err
      removeFile (theBindir </> app)
      createAppBundleWrapper theBindir app
 where
  theBindir = bindir $ absoluteInstallDirs pkg localb NoCopyDest
  exes = fromMaybe (map exeName $ executables pkg) mRestrictTo

-- ----------------------------------------------------------------------
-- helper code for application bundles
-- ----------------------------------------------------------------------

-- | 'createAppBundle' @d p@ - creates an application bundle in @d@
--   for program @p@, assuming that @d@ already exists and is a directory.
--   Note that only the filename part of @p@ is used.
createAppBundle :: FilePath -> FilePath -> IO ()
createAppBundle dir p =
 do createDirectoryIfMissing False $ bundle
    createDirectoryIfMissing True  $ bundleBin
    createDirectoryIfMissing True  $ bundleRsrc
    copyFile p (bundleBin </> takeFileName p)
 where
  bundle     = appBundlePath dir p
  bundleBin  = bundle </> "Contents/MacOS"
  bundleRsrc = bundle </> "Contents/Resources"

-- | 'createAppBundleWrapper' @d p@ - creates a script in @d@ that calls
--   @p@ from the application bundle @d </> takeFileName p <.> "app"@
createAppBundleWrapper :: FilePath -> FilePath -> IO ()
createAppBundleWrapper bindir p =
  writeFile (bindir </> takeFileName p) scriptTxt
 where
  scriptTxt = "`dirname $0`" </> appBundlePath "." p </> "Contents/MacOS" </> takeFileName p ++ " \"$@\""

appBundlePath :: FilePath -> FilePath -> FilePath
appBundlePath dir p = dir </> takeFileName p <.> "app"

-- optional stupff: to be discussed later
mRestrictTo = Nothing
customiseAppBundle _ _ = return ()
-- --------------- END Setup.hs EXAMPLE ---------------------------------

fancier solution

I also have some extra wishlist items.

Possibility of installing in --global
Fancy custom app bundles with custom icons and what not

-- ------------- BEGIN FANCY Setup.hs ADDENDUM ------------------------
-- | Put here IO actions needed to add any fancy things (eg icons)
--   you want to your application bundle.
customiseAppBundle :: FilePath -- ^ app bundle path
                   -> FilePath -- ^ full path to original binary
                   -> IO ()
customiseAppBundle bundleDir p =
 case takeFileName p of
  "geni" ->
    do hasRez <- doesFileExist "/Developer/Tools/Rez"
       if hasRez
          then do -- set the icon
                  copyFile "etc/macstuff/Info.plist" (bundleDir </> "Contents/Info.plist")
                  copyFile "etc/macstuff/wxmac.icns" (bundleDir </> "Contents/Resources/wxmac.icns")
                  -- no idea what this does
                  system ("/Developer/Tools/Rez -t APPL Carbon.r -o " ++ bundleDir </> "Contents/MacOS/geni")
                  writeFile (bundleDir </> "PkgInfo") "APPL????"
                  -- tell Finder about the icon
                  system ("/Developer/Tools/SetFile -a C " ++ bundleDir </> "Contents")
                  return ()
          else putStrLn "Developer Tools not found.  Too bad; no fancy icons for you."
  ""     -> return ()

-- | Put here the list of executables which contain a GUI.  If they all
--   contain a GUI (or you don't really care that much), just put Nothing
mRestrictTo :: Maybe [String]
mRestrictTo = Just ["geni"]
-- ------------- END FANCY Setup.hs ADDENDUM ---------------------------

2009-07-29

vim and building with cabal

I don't know about you, but I've got map ,m :make<Enter> in my .vimrc to bind comma-m to my build program. This could be "ant" for Java files (for example) and "make" otherwise.

Now here is a snippet to set it to "cabal build" as needed

"-----------------------8<--------------------------
function! SetToCabalBuild()
  if glob("*.cabal") != ''
    set makeprg=cabal\ build
  endif
endfunction

autocmd BufEnter *.hs,*.lhs :call SetToCabalBuild()
"-----------------------8<--------------------------

Apologies for making noise in case this is already redundant with a piece of Claus Reinke's very interesting and modular-looking Haskell mode for Vim (which I've been promising myself to install some day). Perhaps the above will be useful anyway for those of us still limping along with configuration files cobbled together from bits and bobs on the web.

"-----------------------8<--------------------------
function! SetToCabalBuild()
  if glob("*.cabal") != ''
    set makeprg=cabal\ build
  endif
endfunction

autocmd BufEnter *.hs,*.lhs :call SetToCabalBuild()
"-----------------------8<--------------------------

2009-07-28

some ideas for practical QuickCheck

I think I've found some answers to my practical QuickCheck questions. This post may be fairly long as I'm trying to make it concrete and explicit enough to overcome the kind of inertia I had when I was still resisting testing.

How do I make my tests easy to run?

1. Use test-framework

The key thing to know about test-framework is that it is very easy to get started. Just visit the friendly web page and copy the example.

Note: An earlier post suggested the testrunner package developed for Darcs, but at the time we didn't realise that test-framework already had all the features needed.

2. Support cabal test

Here's a Setup.hs recipe I copied. It has the handy property of the code is that it runs your tests straight from your dist/build directory.

-- EXAMPLE Setup.hs FILE 1 -----------------------------------------------
import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "test" </> "test"
-- -----------------------------------------------------------------------

The code snippet for your Setup.hs file comes from Greg Bacon's Setting up a Simple Test with Cabal (I tacked on an import). As you can see, the recipe assumes you're building an executable called "test" (see Greg's post on how to do this)

3. Bake your unit tests in

This may go down as the kind of bad advice that "seemed like a good idea at the the time". For now, I can justify this by saying that it may be reassuring to users to be able to just run the same tests that I'm running and see for themselves that their program thinks it's working.

I've been working on a program called GenI. To help people test this program, I've added a simple "--tests" switch. Now people can run geni --tests for a self check. If they want, they can also "cabal test", using this slight modification to Greg's setup file (to call geni itself and to pass the --tests flag in).

-- EXAMPLE Setup.hs FILE 2 -----------------------------------------------

import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "geni" </> "geni --tests"

-- -----------------------------------------------------------------------

As for GenI, whenever I see --tests in my arguments (for example "--tests" `elem` args), I just pass control to another module, which in turn strips the switch out and passes the rest of the arguments to test-framework.

-- EXAMPLE TEST-FRAMEWORK WRAPPER ------------------------------------------
module NLP.GenI.Test where

import System.Environment ( getArgs )
import Test.Framework

import NLP.GenI.GeniVal ( testSuite )
import NLP.GenI.Tags ( testSuite )
import NLP.GenI.Simple.SimpleBuilder ( testSuite )

runTests :: IO ()
runTests =
 do args <- filter (/= "--tests") `fmap` getArgs
    flip defaultMainWithArgs args
     [ NLP.GenI.GeniVal.testSuite
     , NLP.GenI.Tags.testSuite
     , NLP.GenI.Simple.SimpleBuilder.testSuite
     ]
-- -----------------------------------------------------------------------

There's some other things going on in this file, notably the organisation of test suites. More on that later.

Where should I put my properties?

4. Put tests in the same module (where relevant)

If a test is specific to one module, I tend to put them in that same source file. I do this because

It lets me test functions that I don't want to export
The tests serve as documentation
It forces me to update my tests along with my code

This approach is in contrast to (a) having one big tests module and (b) having a separate test hierarchy. It may turn out to be useful to have a single big tests module as well, for example, for tests that cross the boundary from one module to the next. That need has not arisen for me yet. Likewise, I don't particularly believe in a separation between tests and code, although on the other hand some very experienced hackers seem to do so, so I'll just have to let experience teach me why.

How do I avoid repeating myself?

5. Provide a testSuite function for each module

Commenting on my last post, Josef kindly pointed out that the book-keeping I feared isn't so bad in practice. He's right. Nevertheless, I want to avoid it. To do this, I make each of my modules export a testSuite function. Here is what one of my modules looks like, just focusing on the test suite

-- EXAMPLE MODULE --------------------------------------------------------
module NLP.GenI.GeniVal where

-- SKIPPED MAIN IMPORTS ...

import Test.Framework
import Test.Framework.Providers.HUnit
import Test.Framework.Providers.QuickCheck
import Test.QuickCheck
import Test.HUnit

-- SKIPPED MAIN CODE

testSuite = testGroup "unification"
 [ testProperty "self" prop_unify_sym
 , testProperty "anonymous variables" prop_unify_anon
 , testProperty "symmetry" prop_unify_sym
 , testCase "evil unification" test_evil
 ]

-- SKIPPED THE TESTS THEMSELVES
-- -----------------------------------------------------------------------

If you'll scroll up to the example that's marked TEST-FRAMEWORK WRAPPER, you'll see how these test suites are used in practice. Note the small trick of using the qualified module name to identify the test suite.

Anyway, the general principle of having a per-module test suite comes from Aidan Delaney's Organising Unit Tests in Haskell. The main difference between his approach and my approach are that I mix tests and code rather liberally.

Conclusion

I hope that some of these hints will make testing easier for you, or perhaps even get you started. If you still find yourself putting testing off, let me know. I'll be curious to see what else makes us resist. One thing that would probably be helpful is an extra guide to writing Arbitrary instances for QuickCheck, and also writing good properties that control the space well. Maybe even getting started with SmallCheck.

Note that I am still somewhat new to testing and have only recently started these practices. So take these ideas with the usual salt. Thanks to Greg, Reinier, Aidan, and also folks who commented on my previous posts.

How do I make my tests easy to run?

1. Use test-framework

2. Support cabal test

Here's a Setup.hs recipe I copied. It has the handy property of the code is that it runs your tests straight from your dist/build directory.

-- EXAMPLE Setup.hs FILE 1 -----------------------------------------------
import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "test" </> "test"
-- -----------------------------------------------------------------------

3. Bake your unit tests in

-- EXAMPLE Setup.hs FILE 2 -----------------------------------------------

import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "geni" </> "geni --tests"

-- -----------------------------------------------------------------------

-- EXAMPLE TEST-FRAMEWORK WRAPPER ------------------------------------------
module NLP.GenI.Test where

import System.Environment ( getArgs )
import Test.Framework

import NLP.GenI.GeniVal ( testSuite )
import NLP.GenI.Tags ( testSuite )
import NLP.GenI.Simple.SimpleBuilder ( testSuite )

runTests :: IO ()
runTests =
 do args <- filter (/= "--tests") `fmap` getArgs
    flip defaultMainWithArgs args
     [ NLP.GenI.GeniVal.testSuite
     , NLP.GenI.Tags.testSuite
     , NLP.GenI.Simple.SimpleBuilder.testSuite
     ]
-- -----------------------------------------------------------------------

There's some other things going on in this file, notably the organisation of test suites. More on that later.

Where should I put my properties?

4. Put tests in the same module (where relevant)

If a test is specific to one module, I tend to put them in that same source file. I do this because

It lets me test functions that I don't want to export
The tests serve as documentation
It forces me to update my tests along with my code

How do I avoid repeating myself?

5. Provide a testSuite function for each module

-- EXAMPLE MODULE --------------------------------------------------------
module NLP.GenI.GeniVal where

-- SKIPPED MAIN IMPORTS ...

import Test.Framework
import Test.Framework.Providers.HUnit
import Test.Framework.Providers.QuickCheck
import Test.QuickCheck
import Test.HUnit

-- SKIPPED MAIN CODE

testSuite = testGroup "unification"
 [ testProperty "self" prop_unify_sym
 , testProperty "anonymous variables" prop_unify_anon
 , testProperty "symmetry" prop_unify_sym
 , testCase "evil unification" test_evil
 ]

-- SKIPPED THE TESTS THEMSELVES
-- -----------------------------------------------------------------------

Conclusion

2009-06-24

Haskell syntax highlighting on Wikipedia and Wikibooks

If you edit the Haskell Wikibook and Wikipedia entries with Haskell in them, you may be interested to note that Haskell syntax highlighting is now available on all Wikimedia projects.

Example:

<source lang="haskell">
-- foo
let x = foo
</source>

If you edit the Haskell Wikibook and Wikipedia entries with Haskell in them, you may be interested to note that Haskell syntax highlighting is now available on all Wikimedia projects.

Example:

<source lang="haskell">
-- foo
let x = foo
</source>

2009-06-08

testrunner for practical quickcheck

I had mentioned in a previous post three practical problems I had getting started with QuickCheck. My third question in this post was:

How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

And one of the replies I got:

I'm sure people are writing tests, but we all hack up harnesses in our own idiosyncratic ways.... -- blackdog

Maybe we can do better. Instead of everybody hacking up their own harness, how about having one test harness that everybody wants to use? We may even have a candidate for such a harness. Reinier Lamers has recently released a "testrunner" package which supports some rather nice features:

It can run unit tests in parallel.
It can run QuickCheck and HUnit tests as well as simple boolean expressions.
It comes with a ready-made main function for your unit test executable.
This main function recognizes command-line arguments to select tests by name and replay QuickCheck tests.

That's all really good stuff, but I think the number one best feature for me would be the little tutorial on its homepage.

Testrunner is work that Reinier started in the context of the darcs project. We were trying to make our own custom test suite faster and more useful. Seeing ahead, Reinier did it not just by tweaking and tuning the harness we have, but by writing a more general purpose harness that did the things we wanted it to do and hopefully which other projects would want to do as well. So do you have a Haskell project that needs testing? Or maybe you already are doing some tests, but you just wish you could squeeze a little more out of your tests? Give testrunner a try!

Edit 2009-06-08 17:15
It turns out there is a second candidate, or rather a first candidate since test-framework has been around for months. Embarrassingly enough, I had started to use test-framework for my own stuff, but I never realised how feature complete it was. Maybe it'll be time to merge projects? I'll see what Reinier thinks. Apologies to Max...

I had mentioned in a previous post three practical problems I had getting started with QuickCheck. My third question in this post was:

How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

And one of the replies I got:

I'm sure people are writing tests, but we all hack up harnesses in our own idiosyncratic ways.... -- blackdog

It can run unit tests in parallel.
It can run QuickCheck and HUnit tests as well as simple boolean expressions.
It comes with a ready-made main function for your unit test executable.
This main function recognizes command-line arguments to select tests by name and replay QuickCheck tests.

2009-02-26

inkscape layers

Here's a small program that I wrote to extract a subset of layers from an Inkscape file. It may be handy if you have to give a talk and you want to include some "animated" overlays in your slides.

I'm writing this post because I'm pleased to be able to automate this process at last. Also, I want to demonstrate that you don't have to be particularly clever or ambitious to get some good practical use out of Haskell.

usage

So I've got my Inkscape file with a "base" layer and several steps of my animation "zero", "one", "two", "three".

If I do inkscape-layers myfile.svg base > /tmp/foo.svg && inkscape --export-pdf=/tmp/foo.pdf", I get just the base layer which isn't very interesting:

Now if I do inkscape-layers myfile.svg base zero (and convert the resulting SVG into a PDF as above), I get the zeroth layer:

Likewise, to build the rest of my animation, inkscape-layers myfile.svg base one

inkscape-layers myfile.svg base two

Now instead of going clickity-click all over the place, I just dump this in my Makefile. If I every have to change something about my animation (for example, in the base layer), I just run "make" and rebuild it automatically.

Yay, Haskell! Well, I'm sure you could just as easily have written this in your favourite programming language; I just like to randomly credit Haskell for making my life easier :-D

the code

I may upload this to Hackage if I could maybe get some other useful inkscape tools with it:

import Data.Maybe (fromMaybe)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stdout, stderr)
import Text.XML.Light

main =
 do args  <- getArgs
    pname <- getProgName
    case args of
      (f:ls) -> go f ls
      _      -> hPutStrLn stderr $ unwords [ "Usage:", pname, "filename", "layer1", "[layer2 [.. layer N]]" ]

go f ls =
 do d <- goodXML =<< parseXMLDoc `fmap` readFile f
    let o = stdout -- we may want to make this more flexible later
    hPutStrLn o . showTopElement . wrapTop walk $ d
 where
  goodXML = maybe (fail "bad XML") return
  --
  walk x@(Elem el) =
   let lbl = fromMaybe "" (findAttr qLABEL el)
       x2  = Elem $ el { elContent = map walk (elContent el) }
   in case () of _ | not (isLayer el) -> x2
                   | lbl `elem` ls    -> x2
                   | otherwise        -> Text blank_cdata
  walk x = x

isLayer el = elName el == qSVG "g" && findAttr qGROUP_MODE el == Just "layer"

qLABEL      = qInkscape "label"
qGROUP_MODE = qInkscape "groupmode"

qSVG l = QName l (Just nsSVG) Nothing
nsSVG = "http://www.w3.org/2000/svg"

qInkscape l = QName l (Just nsINKSCAPE) Nothing
nsINKSCAPE="http://www.inkscape.org/namespaces/inkscape"

wrapTop f e =
 case f (Elem e) of
 (Elem e) -> e
 _ -> error "programmer error: top content is not an element"

Note: as an exercise: modify the attributes of all exported layers so that they are visible. In Inkscape, I tend to make layers invisible so I don't get confused by them. But then Inkscape does not export them, which is annoying. This seems to be a simple matter of replacing "display:none" with "display:inline" in the style attribute (watch out, there could be more than one!). The 'split' library on Hackage could be handy for that.

usage

Now if I do inkscape-layers myfile.svg base zero (and convert the resulting SVG into a PDF as above), I get the zeroth layer:

Likewise, to build the rest of my animation, inkscape-layers myfile.svg base one

inkscape-layers myfile.svg base two

the code

I may upload this to Hackage if I could maybe get some other useful inkscape tools with it:

import Data.Maybe (fromMaybe)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stdout, stderr)
import Text.XML.Light

main =
 do args  <- getArgs
    pname <- getProgName
    case args of
      (f:ls) -> go f ls
      _      -> hPutStrLn stderr $ unwords [ "Usage:", pname, "filename", "layer1", "[layer2 [.. layer N]]" ]

go f ls =
 do d <- goodXML =<< parseXMLDoc `fmap` readFile f
    let o = stdout -- we may want to make this more flexible later
    hPutStrLn o . showTopElement . wrapTop walk $ d
 where
  goodXML = maybe (fail "bad XML") return
  --
  walk x@(Elem el) =
   let lbl = fromMaybe "" (findAttr qLABEL el)
       x2  = Elem $ el { elContent = map walk (elContent el) }
   in case () of _ | not (isLayer el) -> x2
                   | lbl `elem` ls    -> x2
                   | otherwise        -> Text blank_cdata
  walk x = x

isLayer el = elName el == qSVG "g" && findAttr qGROUP_MODE el == Just "layer"

qLABEL      = qInkscape "label"
qGROUP_MODE = qInkscape "groupmode"

qSVG l = QName l (Just nsSVG) Nothing
nsSVG = "http://www.w3.org/2000/svg"

qInkscape l = QName l (Just nsINKSCAPE) Nothing
nsINKSCAPE="http://www.inkscape.org/namespaces/inkscape"

wrapTop f e =
 case f (Elem e) of
 (Elem e) -> e
 _ -> error "programmer error: top content is not an element"

2009-02-21

implementing join in terms of (>>=)

One of the things I got out of the Typeclassopedia is a somewhat more mature understand of monads (at last!). As a bonus side-effect it has also given me a slightly better understanding of myself. Specifically, I learned I often have trouble learning things because I suffer from a sort of "failure to unify". I thought I might make a note of it for the benefit of anybody else who is interested in how they learn... or not, as the case may be.

So,

we have (>>=) :: m a -> (a -> m b) -> m b
we want join :: m (m x) -> m x

My mind drew a complete blank. So I went with something "direct" via do notation:

join mmx =
 do mx <- mmx
    x  <- mx
    return x

Those last two lines are redundant:

join mmx =
 do mx <- mmx
    mx

Hang on, Eric, surely you don't need the crutch of do notation...

join mmx = mmx >>= (\mx -> mx)

That's just id:

join mmx = mmx >>= id

But wait! Surely that can't be right! Doesn't (>>=) require something of type a -> m b? And isn't id giving me m x -> m x? I stared at that for a while, almost panicking. What did I do wrong? And then it clicked. Of course, the a in a -> m b could stand in for any type, including m x. Just because it doesn't have a little m in it, doesn't mean that it's constrained not to have one.

A simpler version of this kind of error, although one that didn't get me this time: just because we have a and b doesn't mean we actually have to have two different types. They can, but don't need to. And that, is my "failure to unify", inventing completely illusory constraints and not seeing through them.

And so join is just (>>= id). It took a little struggle, but it was well worth it!

(PS, in my original attempt, I used the more conventional m (m a) when thinking of the types instead of what I reported here, m (m x). The reason I reported the later is because I didn't want to confuse the discussion with another stumbling block I have, which is a "failure to rename", i.e. forgetting that two things called a in different contexts are actually two separate things. It's like speaking a foreign language. Just because you are aware that you have to do something, doesn't mean you will always do it automatically. Anyway, the "failure to rename" may very likely have conspired with the "failure to unify" in making me confused for a while)

we have (>>=) :: m a -> (a -> m b) -> m b
we want join :: m (m x) -> m x

My mind drew a complete blank. So I went with something "direct" via do notation:

join mmx =
 do mx <- mmx
    x  <- mx
    return x

Those last two lines are redundant:

join mmx =
 do mx <- mmx
    mx

Hang on, Eric, surely you don't need the crutch of do notation...

join mmx = mmx >>= (\mx -> mx)

That's just id:

join mmx = mmx >>= id

2009-02-16

announcing: burrito tutorial support group

It's really for the best if you leave these sorts of things out in the open.

The first step is to ask for forgiveness, right?

2009-02-04

practical quickcheck (wanted)

Despite all the glowing reports on how useful QuickCheck is, I find that I still have a lot of resistance to using it. A lot of resistance comes from uncertainty, so in this post, I'm going to write down some of my half-formulated questions about using QuickCheck.

Now, there may not be any right answer to these questions, but I'm writing them down anyway so that other people in my shoes know that they are not alone. Later on, as I find the answers that work for me, I'll hopefully put together some notes on 'Practical QuickCheck'.

Where should I put my properties? Xmonad and darcs seem to put them in a single properties module, but it would seem more natural to me to stick them in the same module as the functions I'm quickchecking. That said, I imagine that some properties can be thought of as being cross-module, so maybe a properties module would make sense.
How do I avoid redundancy, and generally repeating myself? Ideally, I would just write a property and be done with it. It would annoy me to have to keep updating some list of properties somewhere else (duplication). That said, maybe it's not really duplication if the list serves a secondary purpose of grouping the properties into some sensible hierarchy. Maybe the real question is "how do I make sure I don't forget to run all my properties?"
How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

I might update this list later as I think of more "best practices" questions. Hopefully I can follow this up with a short article teaching myself and others that really getting started with QuickCheck is easy easy easy (or maybe a link to a pre-existing article of the sort). The Real World Haskell chapter on it seems helpful.

2009-01-30

haskell-ji

As a programmer, I find myself struggling with a lot of really mundane and stupid-looking issues like "how should I name my variables", or "should acronyms be kept upper case (XML), or smooshed down for easier CamelCasing (Xml)?" and finally "what order should my code go in?"

These questions do not so much keep me up and night, but cause me an inordinate amount of flip-flopping in my code. Not remembering my preference du jour, I'll sometimes do things four different ways in code and later on suffer because I forgot that in one bit of code, I had named something parseXML and in the other bit, I had named it xmlParse.

The good news is that things are settling down on at least one front. It seems that all the versions of Eric past and present are settling on a consensus on How To Lay Code Out. The result is a set of directional tips, akin to the kind of thing you learn when you are writing Chinese Hanzi (Japanese Kanji):

Types before code
High-level before low-level -- For example, generally using where instead of let...in, but also "higher-level" functions first, "detail" functions later
Input before output -- It's not that this was ever up for debate, it's just that sometimes, I'll write it the other way without realising that I'm doing it.
Odds and ends last -- At the very end of my code: an odds-and-ends section for all those little snippets of code you copy around but are that too small to justify making a library, e.g.
```
buckets :: Ord b => (a -> b) -> [a] -> [ (b,[a]) ]
buckets f = map (\xs -> (f (head xs), xs))
        . groupBy ((==) `on` f)
        . sortBy (compare `on` f)
```
Do you have an odds-and-ends.hs file on your computer?

Notice that the tips are not always compatible with each other, but they do sort of point in the same general direction.

Phew, I'm glad I'm starting to get at least this bit sorted. I really hope it reduces the amount of pointless erician flip-flopping. It's no big deal -- civilisation does not collapse because of inconsistent case conventions -- but it is a nuisance. This kind of thing is on the order of silly American-style dates vs. European-style dates causing confusion, where we could all just be using International yyyy-mm-dd dates, and while we're at it, 24 hour time, the metric system and A4 paper...

Types before code
High-level before low-level -- For example, generally using where instead of let...in, but also "higher-level" functions first, "detail" functions later
Input before output -- It's not that this was ever up for debate, it's just that sometimes, I'll write it the other way without realising that I'm doing it.
Odds and ends last -- At the very end of my code: an odds-and-ends section for all those little snippets of code you copy around but are that too small to justify making a library, e.g.
```
buckets :: Ord b => (a -> b) -> [a] -> [ (b,[a]) ]
buckets f = map (\xs -> (f (head xs), xs))
        . groupBy ((==) `on` f)
        . sortBy (compare `on` f)
```
Do you have an odds-and-ends.hs file on your computer?