Hobby-hacking Eric

2011-05-03

untangling a cabal install problem

I sometimes have trouble translating abstract general explanations to my particular concrete cases.  I hope that by sharing a very concrete situation I experienced, other users may recognise themselves and get unstuck on their own problems.

I finally untangled out a cabal install problem that's been bugging me for some time, almost driving me to use cabal-dev on all my packages (which seems like it might be a bit inconvenient)

So I have a fairly standard setup (at least, it was standard when I wrote this post), GHC 6.12.3 with the latest released Haskell Platform.  I'm working on two packages, GenI and nltg-hillwalking simultaneously.  Switching from one to the other is painful.  When I try to install GenI typing "cabal install" results in this horribly disheartening sequence, where it installs random, haskell98, cpphs, haskell-src-exts, derive and finally GenI.  If I then switch back to working on hillwalking, I then get this another discouraging sequence involving random (again!), QuickCheck, test-framework, ntlg-hillwalking.  And going back to working on GenI, I go through the same pain again.

It took me a while to work out that the problem was just the interaction between these two packages.  Having had a chance to chat about this with Duncan and Ian, I got a bit of a clue about what the problem might be.  Indeed, when I ran  "cabal install --dry-run -v2", this little bit of output caught my eye:

In order, the following would be installed:
random-1.0.0.3 (reinstall) changes: time-1.1.4 -> 1.1.2.4
haskell98-1.0.1.1 (reinstall)
cpphs-1.11 (new package)
haskell-src-exts-1.10.2 (new package)
derive-2.4.2 (new package)
GenI-0.21 (new package)

See that little arrow?  It says that the reason random, the cause of all my heartache, is being reinstalled because of it wants to depend on an older version of time.  Why on earth would it want to do that?  ... Oh, because I told it to.  Apparently, some past version of myself decided to put this dependency in GenI.cabal: time == 1.1.2.4

Oops!

I think the problem looks like this.  GenI uses the derive package, which triggers a chain of dependencies all the way down to random and time.  Unfortunately, GenI also directly depends on time but now we have an issue.  I'm not entirely clear on why this causes a recompile as opposed to the more usual "this will likely cause an error" output (maybe the latter is only appropriate for direct dependencies, ie. if derive depended on time itself?).


By forcing GenI to use this old version of time, I was indirectly forcing it to install a version of  the random package that depends on this old version.  In doing so, I would clobber the version of the random package that QuickCheck uses.

Fixing the issue in GenI was relatively straightforward.  Did I really need to be using such a constrained version of time?  It turns out that time == 1.1.* works perfectly fine (taking advantage of the PVP promise of backwards compatibility in all A.B.* versions of a package).  Just one little dependency and everything works a lot more smoothly.

So what did I learn from this?
  1. take a deep breath - I think when I'm faced with these issues, I'm feeling really impatient to get on with my work.  But solving the issue involves recognising just some silly little problem, which can be hard to do when I'm being impatient.  So part of the trick is to defocus somehow and shift to poking mode.
  2. use cabal install --dry-run -v2 and study the end part : what packages are we trying to install and why?  The -v2 is important because it tells you why packages are being installed.
  3. ???  hunt for the offending dependency - for me this was a simple case of staring at GenI.cabal.  What if GenI depended on some library which in turn depended on time-1.1.2.4?  I guess the answer would lie in the list of packages that cabal-install says it would install.  The dependency must lie *somewhere* in the chain.
If I understand correctly, this may actually an improvement over the pre GHC 6.12 days before the ABI hash was introduced.  I don't actually know, but I could imagine there's something that'd make one random not-quite-compatible with the other, even if they're both version 1.0.0.3 and silently swapping one out for the other would cause subtle breakage.  At least now, we know if something is wrong and we can fix it relatively easily by just reinstalling the missing package.

This dependency stuff must be really tricky!  It looks like there may be some work that could make life better, for example, a Nix-like approach where both versions of random 1.0.0.3 could co-exist.  But we should be glad in the meantime that Duncan et al have not torn their hair out yet.  (Just think of the pre-Cabal-install days if it helps, life's much better now, isn't it?)