koweycode

Hobby-hacking Eric

2013-03-17

moved to erickow.com

I've migrated my various blogs to erickow.com.  Readers of this blog may more interested in and want to point their RSS readers to a limited set of posts instead, for example, those tagged haskell or darcs than the unified whole.  (That said, I don't seem to blog very much these days anyway and tend to use G+ for my techie commentary.)

Also if you have a Wordpress and/or Blogger blog you'd like to import to hakyll, you may be interested in my hakyll-convert utility. 


2011-05-03

untangling a cabal install problem

I sometimes have trouble translating abstract general explanations to my particular concrete cases.  I hope that by sharing a very concrete situation I experienced, other users may recognise themselves and get unstuck on their own problems.

I finally untangled out a cabal install problem that's been bugging me for some time, almost driving me to use cabal-dev on all my packages (which seems like it might be a bit inconvenient)

So I have a fairly standard setup (at least, it was standard when I wrote this post), GHC 6.12.3 with the latest released Haskell Platform.  I'm working on two packages, GenI and nltg-hillwalking simultaneously.  Switching from one to the other is painful.  When I try to install GenI typing "cabal install" results in this horribly disheartening sequence, where it installs random, haskell98, cpphs, haskell-src-exts, derive and finally GenI.  If I then switch back to working on hillwalking, I then get this another discouraging sequence involving random (again!), QuickCheck, test-framework, ntlg-hillwalking.  And going back to working on GenI, I go through the same pain again.

It took me a while to work out that the problem was just the interaction between these two packages.  Having had a chance to chat about this with Duncan and Ian, I got a bit of a clue about what the problem might be.  Indeed, when I ran  "cabal install --dry-run -v2", this little bit of output caught my eye:

In order, the following would be installed:
random-1.0.0.3 (reinstall) changes: time-1.1.4 -> 1.1.2.4
haskell98-1.0.1.1 (reinstall)
cpphs-1.11 (new package)
haskell-src-exts-1.10.2 (new package)
derive-2.4.2 (new package)
GenI-0.21 (new package)

See that little arrow?  It says that the reason random, the cause of all my heartache, is being reinstalled because of it wants to depend on an older version of time.  Why on earth would it want to do that?  ... Oh, because I told it to.  Apparently, some past version of myself decided to put this dependency in GenI.cabal: time == 1.1.2.4

Oops!

I think the problem looks like this.  GenI uses the derive package, which triggers a chain of dependencies all the way down to random and time.  Unfortunately, GenI also directly depends on time but now we have an issue.  I'm not entirely clear on why this causes a recompile as opposed to the more usual "this will likely cause an error" output (maybe the latter is only appropriate for direct dependencies, ie. if derive depended on time itself?).


By forcing GenI to use this old version of time, I was indirectly forcing it to install a version of  the random package that depends on this old version.  In doing so, I would clobber the version of the random package that QuickCheck uses.

Fixing the issue in GenI was relatively straightforward.  Did I really need to be using such a constrained version of time?  It turns out that time == 1.1.* works perfectly fine (taking advantage of the PVP promise of backwards compatibility in all A.B.* versions of a package).  Just one little dependency and everything works a lot more smoothly.

So what did I learn from this?
  1. take a deep breath - I think when I'm faced with these issues, I'm feeling really impatient to get on with my work.  But solving the issue involves recognising just some silly little problem, which can be hard to do when I'm being impatient.  So part of the trick is to defocus somehow and shift to poking mode.
  2. use cabal install --dry-run -v2 and study the end part : what packages are we trying to install and why?  The -v2 is important because it tells you why packages are being installed.
  3. ???  hunt for the offending dependency - for me this was a simple case of staring at GenI.cabal.  What if GenI depended on some library which in turn depended on time-1.1.2.4?  I guess the answer would lie in the list of packages that cabal-install says it would install.  The dependency must lie *somewhere* in the chain.
If I understand correctly, this may actually an improvement over the pre GHC 6.12 days before the ABI hash was introduced.  I don't actually know, but I could imagine there's something that'd make one random not-quite-compatible with the other, even if they're both version 1.0.0.3 and silently swapping one out for the other would cause subtle breakage.  At least now, we know if something is wrong and we can fix it relatively easily by just reinstalling the missing package.

This dependency stuff must be really tricky!  It looks like there may be some work that could make life better, for example, a Nix-like approach where both versions of random 1.0.0.3 could co-exist.  But we should be glad in the meantime that Duncan et al have not torn their hair out yet.  (Just think of the pre-Cabal-install days if it helps, life's much better now, isn't it?)


2011-04-19

why darcs users care about consistency

In the Darcs community, we've been discussing the recent blog posts saying that Git is inconsistent, that it cannot be made to be consistent.

With Darcs being the foil to Git for the purposes of this discussion, I thought it would be useful if I cleared up a few points, particularly this first one:

consistency is a usability issue

When people say they like Darcs, they don't generally talk about it having a beautiful or elegant theory. Instead, they talk about how easy and simple it is to use, about how they never really had to grapple with a learning curve or feel stupid for doing something wrong.

What makes Darcs so simple to use? Did it hit the right notes by accident or through David Roundy's good taste? Or is usability merely in the eye of the beholder? Some of these explanations may be true, but I think what lies at the heart of Darcs' usability is that it supports a very simple way of understanding a repository:

a darcs repository is a set of patches

This mental model may not be suitable for everybody, and in the long run Darcs may need to improve its support for history tracking.  But if you want to understand why, for all its current shortcomings, people continue to use and develop Darcs, you must appreciate how refreshingly simple the set-of-patches mental model can be.  As a Darcs user you are freed from a lot of the artefacts of worrying about commit order.  Collaborating with people is just question of shuffling patches around, with no merge states, no rebases, way fewer spurious dependencies to worry about.

But simplicity is hard.  In order to make this simple world view possible, Darcs has to guarantee a property that any ordering of patches allowed by Darcs commutation rules is equivalent. If Darcs gives you the option of skipping a patch, it has to work hard to make sure that if you include the patch later on, that the repository you get is equivalent. That's what the patch theory fuss is about.  While it's useful that Darcs tends to attract purists and math geeks, we're really not engaged in the pursuit of some sort of ivory tower theoretical elegance for its own sake.  Ultimately what we're after is usability.

A good user interface minimises work for the user, be it cognitive, memory or physical work. The joy of Darcs is being able to focus cognitive work on our real jobs, and not on babysitting version control systems.  So when Russell O'Connor says that merges ought to be associative, he's not saying this to tick some sort of mathematical box, what I think he's really saying is as a Darcs user, he doesn't want to worry about the difference between pushing patches one at a time vs all in one go. Consistency is a usability issue.

darcs is imperfect

Darcs is very much a work in progress.  Some users have felt let down by Darcs: whenever performance grew to be unacceptable for their repositories, when they hit one exponential merge too many, or when Darcs just plain did something wrong. Even our much vaunted usability has cracks at the edges, a confirmation prompt too many, an inconsistent flag set, a non-reversible operation or two.

I particularly want to make sure I'm very clear about this point:

darcs patch theory is incomplete

We still don't know how to cope with complicated conflicts. Moreover the implementation of our first two theories is somewhat buggy. Darcs copes well enough with most every day conflicts, but if a conflict gets hairy enough, Darcs will crash and emit a nasty message.  This is one of the reasons why we don't recommend Darcs for large repositories.

Our version of "don't do that" is not to maintain long term feature branches without merging back to the trunk on a regular basis. This is not acceptable for bigger projects, but for smaller projects like Darcs itself, the trade-off between a simple user interface in the general case, and the occasional hairy conflict can be worth it. In the long run, we have to fix this. We are revising our patch theory again, this time taking a much more rigorous and systematic approach to the problem.

In the interim, we will be gaining some powerful new tools to help work around the problem, namely a new "darcs rebase" feature that will allow users to smooth away conflicts rather than letting them get out of hand. This will be a crucial bridging tool while we continue to attack the patch theory problem.

patch theory is simple at heart

I am in the awkward position of being a non-expert maintainer, having to defer a lot of thinking about software engineering and patch theory to the rest of the Darcs team. In a way, this is healthy for Darcs, because we have long suffered from an excess concentration of expertise. Inverting the pie so that you basically have the number one Darcs Fan as the maintainer is useful because it forces everybody else to break things down into words an Eric can understand.

The good news is that basic patch theory is one of these things an Eric can understand: patches have inverses and may sometimes be commuted.  Just learning the core theory teaches you how merging and cherry picking works, why you can trust the set-of-patches abstraction and most importantly, how simple Darcs is. So we're not after some kind of magical AI here, nor are we trying to guess user intention. The things we do with patches are much more mechanical, systematically adjusting patches to context, one at a time, click-clack on the abacus until the merge is complete.

patch vs snapshot is not so important

We think it's important to continue working on Darcs because we are exploring territory that no other version control system is looking at - patch-based version control. That said, patches and snapshots are duals of each other. We think that things that Darcs can do are possible in snapshot based version control and we would be very interested to see work in that direction.

The secret to Darcs merging is that it replaces guesswork (fuzz factor) with history. A darcs patch only exists in the context of its predecessors, and if we want to apply a patch to a different context, we mechanically transform the patch to fit. We think this sort of history-aware merging could be implemented in Git. In fact, we would be excited to see somebody taking up the challenge. Git fans! How about stealing history-aware merging from us?

exponential merges still exist but there are fewer of them

We have developed two versions of patch theory. The second version avoids a lot of the common causes of exponential merge blowups, but it is still possible to trigger them. Recent Darcs repositories are created using version 2 of the theory. For compatibility's sake, repositories created before Darcs 2 came along tend to still be using version 1 of the theory (we only recommend converting if conflicts become a problem).

The most well-known remaining cause of blowups in theory 2 is the problem of "conflict fights" where one side of the conflict resolves the conflict and gets on with their life without propagating the resolution back to the other side. What tends to happen there is that we not only encounter the conflict again in the future, but we also conflict with the resolution!

So life is definitely better with Darcs 2. We've given the exponential merge problem a good knock on the head, but it's still staggering around and we're working our way to the finishing blow.

performance is improving

I think that when people complain about Darcs being slow, they're not talking about the exponential merge problem. They're mostly referring to day-to-day issues like the time it takes to check out a repository. Our recent focus has been to solve a lot of these pedestrian performance issues. For example, the upcoming Darcs 2.8 is like to use a new "packs" feature which makes it possible to fetch a repository in the form of two larger tarballs rather than thousands of little patch files. This makes a big difference!

Another improvement we hope to bring to Darcs 2.8 is the performance of the darcs annotate command (cf. git blame).  Annotate has neglected for a while, and to make things better, we've basically reimplemented the command from scratch with more readable output to boot.  As an example of something fixed along the way, one misfeature of the old annotate is that would work by applying all the patches relevant to a given file, building it up from the very beginning.  But if you think about it, annotating a file is really about annotating its current state; we don't care about ancient history! So one of the Darcs hackers had the sort of idea that’s obvious in hindsight: rather than applying patches forwards from the beginning of history, we simply unapply them from the end.  Much faster.

We're not yet trying to compete with Git when working on these performance issues. We admire the performance that Git can deliver and we agree that getting speed right is a usability issue (too slow and your user loses their train of thought).  But we've been picking a lot of low hanging fruit lately, solving problems that make Darcs faster with very little cost. We hope you'll like the results!


2011-02-18

practical QuickCheck revisited - separate testing hierarchy

I'll begin this post with a quote from 2009-Eric:
This may go down as the kind of bad advice that "seemed like a good idea at the the time".
The advice in question was to "bake unit tests in".  The basic idea was that whatever module you write should have its own testSuite function exposing unit tests for that particular module.  The advantages were simplicity (no parallel test hierarchy), the ability to ship a binary with self-tests, and the ability to non-exported functions, helper code with a granularity that lends itself more to testing (easier to think of tests for them).

I was unconvinced by the counterargument that it was not a good idea to mix testing and business logic.  To be clear, I did agree with the spirit of the advice -- I'm not about go around questioning the kind of wisdom a community gains by watching rockets blow up -- but I felt that I was not advocating any such mixing.  All I wanted was to put my testing code in the same file as the business code, cordoned off in a testing section at the end of the file if you want without any sort of if-testing-mode-do-X logic.  So I thought that the counterargument was right, but that it didn't apply to this particular context.  (I'd be interested to see when/if I change my mind on this, maybe it leads to temptation to mix logic, which is bad.)

In any case, I don't need to change my mind on that particular point. Being the kind of person that only learns the hard way, I've found myself forced to divorce my test code from the business code after all. It's mainly a practical problem of dependencies (this was pointed out by Echo Nolan and Ivan Miljenovic). Forcing users to install QuickCheck and test-framework, when they probably don't care about testing, when they just see your module as yet another dependency on the the road to some other more pressing goal, is really a bit anti-social.

The problem isn't installing the package per se (it all happens automatically with cabal install), but dealing with package version dependencies.  So GenI depends on test-framework 2.x and QuickCheck 1.2.  What if I go away for a few years, stop hacking on GenI and in the meantime the rest of the world moves on to using QuickCheck 2.x and test-framework 3?  What happens when they try to install GenI and cabal install needs to rebuild the random package, which then breaks QuickCheck-2.4 because it depends on random too.  Headaches all around.

I think I can live with a separate hierarchy. Arguing with past-Eric a bit:
  1. All the extra modules and what not are not that big a deal (and I could probably let myself go wrt imports, etc).
  2. Who cares if there's an extra geni-test binary, which only gets enable with -ftest anyway?
  3. Self tests, shmelf tests.  Seriously, who is going to run that geni --test function anyway?
  4. If I forget to cabal configure -ftest, I can always cabal configure again and build
  5. If I'm really desperate to test some internal function, I could always export an alias like testingFoo for every foo I want to test, applying a sort of Pythonesque we're-all-grownups-here principle. 
  6. Also maybe forcing yourself to test only the exported functions, enforces a kind of general black-box thinking which is healthy if you're writing a library.
So, with apologies to Ivan for not understanding his rants 2 years ago; and also anyone that may have listened to 2009-Eric for any messes I got you and your users in, I'm retracting that particular bit of advice and separating my test hierarchies like a good boy.   Let's see if 2013-Eric decides to post some kind of retraction retraction.


2010-12-02

personal gitit wiki on MacOS X

Here's a quick little recipe for using gitit as a personal wiki on MacOS X. I assume here you already have the wiki itself set up, and now you just want it to run automatically in the background whenever you log in. You can do this by using launchd.
  1. Download net.johnmacfarlane.gitit.plist from this Gist
  2. Replace the WorkingDirectory with the path to your personal wiki
  3. Replace the last part of the PATH to include your cabal directory, and possibly something like /usr/local/bin or /opt/local/bin if you're using Git instead of Darcs
  4. Save the file in ~/Library/LaunchAgents
  5. Test it with launchctl load ~/Library/LaunchAgents, maybe using the Console application to search for logs should something go wrong.
  6. Log out and log back in (or maybe even restart your computer if you want to be sure)
Helpful bits and pieces:
  • This MacGeekery article
  • launchd.plist man page
  • Property List Editor in Developer Tools (beats looking at XML)
Edit 2010-12-04: Fixed broken link


2010-09-22

Early Career Researcher: the computer game

Here's an idea for a computer game called Early Career Researcher. The simple version being a fairly mindless turn-based RPG-esque deal. Nothing earth shattering in terms of game mechanics, but perhaps an amusing toy.

You have
  • personal attributes (eg. writing, social skills, initiative)
  • inputs (eg. ideas, papers to review)
  • daily resources (eg. time, energy)
  • actions (eg. check email, write paper, write grant proposal, lab work [or some generic term for "actual" research leg work], take nap, go to pub)
  • outcomes (eg. paper accepted, grant awarded, contract extension)
  • light bulbs (XP)
The goal of the game is just to maximise light bulbs. The basic model is that every turn consists of a "day" (a day should take about 5-10 minutes to play). In each day, you can do any number of actions, but the kinds of actions are limited by the inputs and daily resources you have. For example, you could do write a paper, but in order to do so, you'd need a paper-topic resource to consume, not to mention time. Likewise, you could check your email and it may only take a few minutes, but it could also use up a lot of your energy. Actions may result in outcomes, but whether or not they do so depends on a combination of personal attributes and luck. For example, writing a paper may result in paper accepted, depending on writing skills, research-fu and the dice roll. Going to the pub (presumably chatting with colleagues) may result in Ideas depending on social skills and creativity and the dice roll. Outcomes generate inputs (eg. ideas) and Lightbulbs (XP). If you get enough XP to level up, you can use your lightbulbs to purchase personal attributes.

As the game develops it should become clearer that it's important to choose your actions wisely, and also to pay attention to the notion of balance. Spending all your time doing lab work or writing grant proposals may seem like a good idea, but if you fail to spend enough time in the pub or take sufficient naps, you may not generate sufficient idea resources to make very much progress. Or maybe if you're too lazy and spending all your time just trying to be inspired, you just don't make sufficient practical progress to get anywhere.

So if anybody wants to code this up as a little exercise...


2010-03-28

hsgtd and friends 1: mutt inbox and actions

I've been practising the methodology of Getting Things Done for over 4 years now, but I'm still not very good at it.

I hope to write a small serious of postings showing my current GTD state of the art. I hope it will be useful to somebody out there and that I will get some ideas on fine-tuning my approach.

Another hope I have is to reach out to technical people who are resisting "becoming more organised" because of the apparent overhead involved. I hope to demonstrate that you can actually get a lot of mileage out of a handful of shell scripts and simple practices (keeping all your mail in a single folder).

Ingredients

  • mutt - The appeal here is to have a mail client that is malleable and which can talk to 3rd-party software. So it doesn't necessarily have to be as old school as mutt, just scriptable and capable of playing with others.
  • hsgtd - a command line GTD tracker written in 351 lines of Haskell. Everything is stored in a simple text file
I also use mairix, xmonad and Unison, but these will likely only be relevant in future postings.

Background

In this first instalment, I would like to talk about how I deal with inbox triage. It's useful to know a little bit of GTD terminology for this.
  • Inbox - things which are not yet triaged. Practicing GTD is like using an issue tracker; you decouple triage from actions. One priority in GTD is to empty out the inbox by performing triage on all items. Working this way is efficient because you avoid looking at the same item or having the same thought about it (gee, I oughta...) twice. Things go in stages.
  • Next actions - One of the results from the triage process is a set of "next actions", concrete physical actions like, eg. call Bob 398-0811 to see if he wants that spare external disk drive
I use two different programs: mutt to view my inbox, and hsgtd to view my list of next actions. In this series of posts, I'll be exploring how mutt and hsgtd might talk to each other.

Inbox triage : from email to next actions

The most common source of next actions for me is my email, so it is very important for me to good integration between my hsgtd list and my email. In particular, one thing I like to be able to do is to read an email, figure out what "next action" to do with it, record that next action, and pin that email to the next action for reference.

To this end, I have a simple shell script and muttrc macro that you can copy from the hsgtd contrib directory. The shell script greps an email from stdin for its message id and reads the command line parameters for the next action text. It combines the two by adding an hsgtd action using the message ID as a project name. Here's the script to show you how simple and stupid it is:
#!/bin/bash
MSGID=$(grep -i '^message-id' | head -n 1 | sed 's/Message-I[Dd]: /:/')
hsgtd add "$@" "$MSGID"
To make this work with mutt, I also have a small macro that lets me call the shell script whenever I'm viewing a message:
macro pager \Ca "|email-add-action"
macro index \Ca "|email-add-action"

Triage example

So how does this get used in practice? Let's say my inbox has a patch to Darcs from Guillaume.

If you saw Merlin Mann's Inbox Zero talk, there are 5 "verbs" you can apply to an inbox item. Let's run through these. Clearly this is not a mail I want to [i] delete, and for a variety of reasons, it's not something I want to [ii] delegate, or to [iii] defer. Let's look at the email in mutt:
I can't [iv] respond yet because I need take some time out to review the patch so I need [v] track an action for this to do later. I hit Control-a in mutt, and type in "@darcs review this". This creates an action in hsgtd. If I later visit hsgtd and type "list" to see the actions available, I will see the email from Guillaume:

By the way, if you're wondering about the "@darcs", the use of an at-sign before a word is an hsgtd convention for contexts. Contexts are a useful way of dividing up actions because they signify certain constraints on where you can perform the actions (typical contexts might be @home, @work). I use @darcs because working on darcs is sometimes something I'll do in one block at a time. If I type "list @darcs" in hsgtd, it will show me only the actions for that context:

Back to main story. We've now added Guillaume's message to hsgtd. Let's take a closer look at the entry that was created. You see the original action text that we typed in "@darcs review this". Notice how the context @darcs was helpfully highlighted in yellow. In green you will also see a strange suffix like ":<4ba5fc74.0e0db80a.261d.ffff8b51@mx.google.com>". This is useful for three reasons:
  1. It creates a GTD "project" for that email. Sometimes dealing with an email requires more than one action. In the GTD world, any set of >1 action is considered a project.
  2. [most important] It gives you a means for retrieving the email that goes with this action when you are actually predisposed to do that action.
  3. It allows you to be fairly oblique in your next action texts, you can type in any short string which seems to be meaningful without having to be super-precise about it.

Next up: waiting and review

In this posting, we saw a way of extracting "next actions" from your mutt inbox and storing them in an hsgtd list. In a future posting, I hope to expand on this by exploring delegation (asking somebody else to act) and review (going over your actions and delegated items). Actually, the review was what initially motivated this blog posting. I'd finally worked out how to create a virtual mailbox of my hsgtd-tracked items and wanted to show it off. But that will have to wait as this post is long enough as it is.