koweycode: 2009

2009-10-08

darcs hashed-storage work merged (woo!)

The following is a copy of my recent post to the darcs-users mailing list.

Hi everybody,

So you may have noticed me saying this in a couple of recent threads. Petr Ročkai's hashed-storage work from his 2009 Google Summer of Code project has been merged!

I thought I would take a few moments to give everybody an overview of how this work benefits us, and where we'll be going in the future.

In a nutshell

What does this mean for you? Faster repository-local operations.

Hashed format repositories (with darcs-1 and darcs-2 patches alike) should now be faster to use on a daily basis. We saw the very beginnings of this work in Darcs 2.3.0 with a faster darcs whatsnew. Now these speed improvements cover all repository-local operations.

The next Darcs beta is a couple of months away, but before that, I would like to encourage you to try this out for yourself:

darcs get --lazy http://darcs.net
cd darcs.net
cabal install

For best results, please run darcs optimize --upgrade followed by darcs optimize --pristine. Pay attention over the next couple of weeks when you try a record, amend, revert, unrecord. If we've done our work right, there should be nothing to see. Darcs should be less noticeable, with fewer "Synchronizing pristine" messages and a faster return to the command prompt. We think you'll like it. But please get back to us. Is Darcs faster for you?

If you're particularly interested, I will step through these changes in greater detail at the end of this message. Meanwhile, I would like to step back a little and take stock of how these improvements fit in to the bigger picture.

The road ahead

The hashed storage work is a big step forward and definitely a cause for celebration. I think it is useful to reflect on this progress and consider how it fits in with our progress since darcs 1.0.9:

ssh connection sharing (darcs transfer mode)
HTTP pipelining
lazy repositories
the global cache

and now

index-based diffing
hashed-storage efficiency

We cannot promise that Darcs will magically become fast overnight. But what we can and will do is continue chipping away at it, solving problems one at a time; release by release, a little bit better, a little bit faster every time until one day we can look back and marvel at all the progress we've made.

So Petr's work makes Darcs easier to live with on a day-to-day basis. But that's not enough. Now we need to turn our attention to that crucial first impression; what happens when people try Darcs out for the first time is that they darcs get a repository they want and... then... they... wait...

This is embarrassing, but we can fix it. In fact, we already have started working on the problem. The next version of hashed-storage will likely introduce a notion of "packs" in which the many often very small files that Darcs keeps track of will be concatenated into more substantial "packs" that compress better and reduce the ill effects of latency. My hope is that we will be able to complete the packs work by Darcs 2.5.

There's a lot more progress to be made: smarter patch representations, tuning for large patches, file-to-patch caching for long histories. And that's just performance! For more details about our performance work, please have a look at

http://tinyurl.com/darcs-performance2

If you could do anything to help, benchmark, profile, anything at all, please let us know :-)

The fight continues.

Thank-you!

Petr and Ganesh deserve a huge round of applause. Petr, thanks for thinking up this work, getting it done and pushing it through. Ganesh, thanks for an extremely thorough and thoughtful review. The two of you, thanks for holding on, for tenacious cooperation in the face of adversity.

Thanks also to all the wider Darcs community for all your support, comments, patch reviews.

I'm looking forward to seeing you at the upcoming Darcs hacking sprint. The sprint will take place in Vienna, Austria on the weekend of 14-15 November. Everybody, especially Darcs and Haskell newbies, is welcome to join in. Details on http://wiki.darcs.net/Sprints/2009-11

And if I may take a paragraph to mention this, Darcs needs your support. Every little counts, if you can send patches, review patches, tweak documentation, profile, benchmark, submit bug reports. Barring that, you could also make a contribution to our travel fund via the Software Freedom Conservancy. See http://darcs.net/donations.html for details.

Thanks everybody and enjoy!

Eric

Changes in detail

Darcs uses an "index" file to compute working directory and pristine cache diffs. This avoids timestamps going out of synch when you have multiple local branches, which saves a huge and needless slowdown.
Hashed storage is more efficient in general. Even if you already have perfect timestamps, the new optimisations should make Darcs faster in general.
The new 'darcs optimize --pristine' reduces spurious mismatches on directories.
Darcs no longer requires a one second sleep after applying patches.

The following is a copy of my recent post to the darcs-users mailing list.

Hi everybody,

So you may have noticed me saying this in a couple of recent threads. Petr Ročkai's hashed-storage work from his 2009 Google Summer of Code project has been merged!

I thought I would take a few moments to give everybody an overview of how this work benefits us, and where we'll be going in the future.

In a nutshell

What does this mean for you? Faster repository-local operations.

The next Darcs beta is a couple of months away, but before that, I would like to encourage you to try this out for yourself:

darcs get --lazy http://darcs.net
cd darcs.net
cabal install

The road ahead

The hashed storage work is a big step forward and definitely a cause for celebration. I think it is useful to reflect on this progress and consider how it fits in with our progress since darcs 1.0.9:

ssh connection sharing (darcs transfer mode)
HTTP pipelining
lazy repositories
the global cache

and now

index-based diffing
hashed-storage efficiency

http://tinyurl.com/darcs-performance2

If you could do anything to help, benchmark, profile, anything at all, please let us know :-)

The fight continues.

Thank-you!

Thanks also to all the wider Darcs community for all your support, comments, patch reviews.

Thanks everybody and enjoy!

Eric

Changes in detail

Darcs uses an "index" file to compute working directory and pristine cache diffs. This avoids timestamps going out of synch when you have multiple local branches, which saves a huge and needless slowdown.
Hashed storage is more efficient in general. Even if you already have perfect timestamps, the new optimisations should make Darcs faster in general.
The new 'darcs optimize --pristine' reduces spurious mismatches on directories.
Darcs no longer requires a one second sleep after applying patches.

2009-09-11

cabal installing graphical apps on MacOS X

I have a graphical command line tool written in wxHaskell. For the longest time, my tool was relatively easy to install on Linux but a pain on MacOS X because my users had to jump through extra post-installation hoops like creating application bundles.

Thanks to some very patient help from Beelsebob, quicksilver, dcoutts on #haskell I was finally able to cobble together a Setup.hs file that lets me do just this. Now when I write install instructions for my program, I no longer need to add extra bullet points telling people to turn knobs and twiggle blops just to run the GUI. It just works.

Note that this was written with wxHaskell in mind. I hope that folks using gtk2hs and qtHaskell either do not have this problem or can make use of a similar solution.

desiderata

What I wanted was for the 'cabal install' command to work as well on MacOS X as it did under Linux. My core desiderata were:

Ability to call my application from the command line the same way you would under Linux with command line arguments correctly recognised
No need for the user to add extra junk to the path (besides $HOME/.cabal/bin which they'll already have added)
No manual intervention after cabal install (eg calling scripts to create application bundles)
No need to be super-user.

basic ideas

The basic ideas behind this solution are

Replace "foo" with a shell script that calls "foo.app/MacOS/Contents/foo"
MacOS X Leopard seems to want graphical applications to live in application bundles. At least for wxHaskell if you invoke "foo" you get a GUI that does not respond to input. On the other hand, if you invoke "foo.app/MacOS/Contents/foo" you get something that works.
Use a Cabal postInst to create the application bundle in the bin dir.

basic solution

Here is the solution. (I'll send it as a mail to the wxhaskell-users mailing list too)

-- --------------- BEGIN Setup.hs EXAMPLE ------------------------------
import Control.Monad (foldM_, forM_)
import Data.Maybe ( fromMaybe )
import System.Cmd
import System.Exit
import System.Info (os)
import System.FilePath
import System.Directory ( doesFileExist, copyFile, removeFile, createDirectoryIfMissing )

import Distribution.PackageDescription
import Distribution.Simple.Setup
import Distribution.Simple
import Distribution.Simple.LocalBuildInfo

main :: IO ()
main = defaultMainWithHooks $ addMacHook simpleUserHooks
 where
  addMacHook h =
   case os of
    "darwin" -> h { postInst = appBundleHook } -- is it OK to treat darwin as synonymous with MacOS X?
    _        -> h

appBundleHook :: Args -> InstallFlags -> PackageDescription -> LocalBuildInfo -> IO ()
appBundleHook _ _ pkg localb =
 forM_ exes $ \app ->
   do createAppBundle theBindir (buildDir localb </> app </> app)
      customiseAppBundle (appBundlePath theBindir app) app
        `catch` \err -> putStrLn $ "Warning: could not customise bundle for " ++ app ++ ": " ++ show err
      removeFile (theBindir </> app)
      createAppBundleWrapper theBindir app
 where
  theBindir = bindir $ absoluteInstallDirs pkg localb NoCopyDest
  exes = fromMaybe (map exeName $ executables pkg) mRestrictTo

-- ----------------------------------------------------------------------
-- helper code for application bundles
-- ----------------------------------------------------------------------

-- | 'createAppBundle' @d p@ - creates an application bundle in @d@
--   for program @p@, assuming that @d@ already exists and is a directory.
--   Note that only the filename part of @p@ is used.
createAppBundle :: FilePath -> FilePath -> IO ()
createAppBundle dir p =
 do createDirectoryIfMissing False $ bundle
    createDirectoryIfMissing True  $ bundleBin
    createDirectoryIfMissing True  $ bundleRsrc
    copyFile p (bundleBin </> takeFileName p)
 where
  bundle     = appBundlePath dir p
  bundleBin  = bundle </> "Contents/MacOS"
  bundleRsrc = bundle </> "Contents/Resources"

-- | 'createAppBundleWrapper' @d p@ - creates a script in @d@ that calls
--   @p@ from the application bundle @d </> takeFileName p <.> "app"@
createAppBundleWrapper :: FilePath -> FilePath -> IO ()
createAppBundleWrapper bindir p =
  writeFile (bindir </> takeFileName p) scriptTxt
 where
  scriptTxt = "`dirname $0`" </> appBundlePath "." p </> "Contents/MacOS" </> takeFileName p ++ " \"$@\""

appBundlePath :: FilePath -> FilePath -> FilePath
appBundlePath dir p = dir </> takeFileName p <.> "app"

-- optional stupff: to be discussed later
mRestrictTo = Nothing
customiseAppBundle _ _ = return ()
-- --------------- END Setup.hs EXAMPLE ---------------------------------

fancier solution

I also have some extra wishlist items.

Possibility of installing in --global
Fancy custom app bundles with custom icons and what not

Global installation might already be working with this basic script, but I haven't tested it yet. Fancy app bundles sort of work (if I double-click it in Finder, I get a customised icon, but running it from the command line does not give me one).

Here are extra hooks I created for this:

-- ------------- BEGIN FANCY Setup.hs ADDENDUM ------------------------
-- | Put here IO actions needed to add any fancy things (eg icons)
--   you want to your application bundle.
customiseAppBundle :: FilePath -- ^ app bundle path
                   -> FilePath -- ^ full path to original binary
                   -> IO ()
customiseAppBundle bundleDir p =
 case takeFileName p of
  "geni" ->
    do hasRez <- doesFileExist "/Developer/Tools/Rez"
       if hasRez
          then do -- set the icon
                  copyFile "etc/macstuff/Info.plist" (bundleDir </> "Contents/Info.plist")
                  copyFile "etc/macstuff/wxmac.icns" (bundleDir </> "Contents/Resources/wxmac.icns")
                  -- no idea what this does
                  system ("/Developer/Tools/Rez -t APPL Carbon.r -o " ++ bundleDir </> "Contents/MacOS/geni")
                  writeFile (bundleDir </> "PkgInfo") "APPL????"
                  -- tell Finder about the icon
                  system ("/Developer/Tools/SetFile -a C " ++ bundleDir </> "Contents")
                  return ()
          else putStrLn "Developer Tools not found.  Too bad; no fancy icons for you."
  ""     -> return ()

-- | Put here the list of executables which contain a GUI.  If they all
--   contain a GUI (or you don't really care that much), just put Nothing
mRestrictTo :: Maybe [String]
mRestrictTo = Just ["geni"]
-- ------------- END FANCY Setup.hs ADDENDUM ---------------------------

desiderata

What I wanted was for the 'cabal install' command to work as well on MacOS X as it did under Linux. My core desiderata were:

Ability to call my application from the command line the same way you would under Linux with command line arguments correctly recognised
No need for the user to add extra junk to the path (besides $HOME/.cabal/bin which they'll already have added)
No manual intervention after cabal install (eg calling scripts to create application bundles)
No need to be super-user.

basic ideas

The basic ideas behind this solution are

Replace "foo" with a shell script that calls "foo.app/MacOS/Contents/foo"
MacOS X Leopard seems to want graphical applications to live in application bundles. At least for wxHaskell if you invoke "foo" you get a GUI that does not respond to input. On the other hand, if you invoke "foo.app/MacOS/Contents/foo" you get something that works.
Use a Cabal postInst to create the application bundle in the bin dir.

basic solution

Here is the solution. (I'll send it as a mail to the wxhaskell-users mailing list too)

-- --------------- BEGIN Setup.hs EXAMPLE ------------------------------
import Control.Monad (foldM_, forM_)
import Data.Maybe ( fromMaybe )
import System.Cmd
import System.Exit
import System.Info (os)
import System.FilePath
import System.Directory ( doesFileExist, copyFile, removeFile, createDirectoryIfMissing )

import Distribution.PackageDescription
import Distribution.Simple.Setup
import Distribution.Simple
import Distribution.Simple.LocalBuildInfo

main :: IO ()
main = defaultMainWithHooks $ addMacHook simpleUserHooks
 where
  addMacHook h =
   case os of
    "darwin" -> h { postInst = appBundleHook } -- is it OK to treat darwin as synonymous with MacOS X?
    _        -> h

appBundleHook :: Args -> InstallFlags -> PackageDescription -> LocalBuildInfo -> IO ()
appBundleHook _ _ pkg localb =
 forM_ exes $ \app ->
   do createAppBundle theBindir (buildDir localb </> app </> app)
      customiseAppBundle (appBundlePath theBindir app) app
        `catch` \err -> putStrLn $ "Warning: could not customise bundle for " ++ app ++ ": " ++ show err
      removeFile (theBindir </> app)
      createAppBundleWrapper theBindir app
 where
  theBindir = bindir $ absoluteInstallDirs pkg localb NoCopyDest
  exes = fromMaybe (map exeName $ executables pkg) mRestrictTo

-- ----------------------------------------------------------------------
-- helper code for application bundles
-- ----------------------------------------------------------------------

-- | 'createAppBundle' @d p@ - creates an application bundle in @d@
--   for program @p@, assuming that @d@ already exists and is a directory.
--   Note that only the filename part of @p@ is used.
createAppBundle :: FilePath -> FilePath -> IO ()
createAppBundle dir p =
 do createDirectoryIfMissing False $ bundle
    createDirectoryIfMissing True  $ bundleBin
    createDirectoryIfMissing True  $ bundleRsrc
    copyFile p (bundleBin </> takeFileName p)
 where
  bundle     = appBundlePath dir p
  bundleBin  = bundle </> "Contents/MacOS"
  bundleRsrc = bundle </> "Contents/Resources"

-- | 'createAppBundleWrapper' @d p@ - creates a script in @d@ that calls
--   @p@ from the application bundle @d </> takeFileName p <.> "app"@
createAppBundleWrapper :: FilePath -> FilePath -> IO ()
createAppBundleWrapper bindir p =
  writeFile (bindir </> takeFileName p) scriptTxt
 where
  scriptTxt = "`dirname $0`" </> appBundlePath "." p </> "Contents/MacOS" </> takeFileName p ++ " \"$@\""

appBundlePath :: FilePath -> FilePath -> FilePath
appBundlePath dir p = dir </> takeFileName p <.> "app"

-- optional stupff: to be discussed later
mRestrictTo = Nothing
customiseAppBundle _ _ = return ()
-- --------------- END Setup.hs EXAMPLE ---------------------------------

fancier solution

I also have some extra wishlist items.

Possibility of installing in --global
Fancy custom app bundles with custom icons and what not

-- ------------- BEGIN FANCY Setup.hs ADDENDUM ------------------------
-- | Put here IO actions needed to add any fancy things (eg icons)
--   you want to your application bundle.
customiseAppBundle :: FilePath -- ^ app bundle path
                   -> FilePath -- ^ full path to original binary
                   -> IO ()
customiseAppBundle bundleDir p =
 case takeFileName p of
  "geni" ->
    do hasRez <- doesFileExist "/Developer/Tools/Rez"
       if hasRez
          then do -- set the icon
                  copyFile "etc/macstuff/Info.plist" (bundleDir </> "Contents/Info.plist")
                  copyFile "etc/macstuff/wxmac.icns" (bundleDir </> "Contents/Resources/wxmac.icns")
                  -- no idea what this does
                  system ("/Developer/Tools/Rez -t APPL Carbon.r -o " ++ bundleDir </> "Contents/MacOS/geni")
                  writeFile (bundleDir </> "PkgInfo") "APPL????"
                  -- tell Finder about the icon
                  system ("/Developer/Tools/SetFile -a C " ++ bundleDir </> "Contents")
                  return ()
          else putStrLn "Developer Tools not found.  Too bad; no fancy icons for you."
  ""     -> return ()

-- | Put here the list of executables which contain a GUI.  If they all
--   contain a GUI (or you don't really care that much), just put Nothing
mRestrictTo :: Maybe [String]
mRestrictTo = Just ["geni"]
-- ------------- END FANCY Setup.hs ADDENDUM ---------------------------

2009-07-29

vim and building with cabal

I don't know about you, but I've got map ,m :make<Enter> in my .vimrc to bind comma-m to my build program. This could be "ant" for Java files (for example) and "make" otherwise.

Now here is a snippet to set it to "cabal build" as needed

"-----------------------8<--------------------------
function! SetToCabalBuild()
  if glob("*.cabal") != ''
    set makeprg=cabal\ build
  endif
endfunction

autocmd BufEnter *.hs,*.lhs :call SetToCabalBuild()
"-----------------------8<--------------------------

Apologies for making noise in case this is already redundant with a piece of Claus Reinke's very interesting and modular-looking Haskell mode for Vim (which I've been promising myself to install some day). Perhaps the above will be useful anyway for those of us still limping along with configuration files cobbled together from bits and bobs on the web.

"-----------------------8<--------------------------
function! SetToCabalBuild()
  if glob("*.cabal") != ''
    set makeprg=cabal\ build
  endif
endfunction

autocmd BufEnter *.hs,*.lhs :call SetToCabalBuild()
"-----------------------8<--------------------------

2009-07-28

some ideas for practical QuickCheck

I think I've found some answers to my practical QuickCheck questions. This post may be fairly long as I'm trying to make it concrete and explicit enough to overcome the kind of inertia I had when I was still resisting testing.

How do I make my tests easy to run?

1. Use test-framework

The key thing to know about test-framework is that it is very easy to get started. Just visit the friendly web page and copy the example.

Note: An earlier post suggested the testrunner package developed for Darcs, but at the time we didn't realise that test-framework already had all the features needed.

2. Support cabal test

Here's a Setup.hs recipe I copied. It has the handy property of the code is that it runs your tests straight from your dist/build directory.

-- EXAMPLE Setup.hs FILE 1 -----------------------------------------------
import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "test" </> "test"
-- -----------------------------------------------------------------------

The code snippet for your Setup.hs file comes from Greg Bacon's Setting up a Simple Test with Cabal (I tacked on an import). As you can see, the recipe assumes you're building an executable called "test" (see Greg's post on how to do this)

3. Bake your unit tests in

This may go down as the kind of bad advice that "seemed like a good idea at the the time". For now, I can justify this by saying that it may be reassuring to users to be able to just run the same tests that I'm running and see for themselves that their program thinks it's working.

I've been working on a program called GenI. To help people test this program, I've added a simple "--tests" switch. Now people can run geni --tests for a self check. If they want, they can also "cabal test", using this slight modification to Greg's setup file (to call geni itself and to pass the --tests flag in).

-- EXAMPLE Setup.hs FILE 2 -----------------------------------------------

import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "geni" </> "geni --tests"

-- -----------------------------------------------------------------------

As for GenI, whenever I see --tests in my arguments (for example "--tests" `elem` args), I just pass control to another module, which in turn strips the switch out and passes the rest of the arguments to test-framework.

-- EXAMPLE TEST-FRAMEWORK WRAPPER ------------------------------------------
module NLP.GenI.Test where

import System.Environment ( getArgs )
import Test.Framework

import NLP.GenI.GeniVal ( testSuite )
import NLP.GenI.Tags ( testSuite )
import NLP.GenI.Simple.SimpleBuilder ( testSuite )

runTests :: IO ()
runTests =
 do args <- filter (/= "--tests") `fmap` getArgs
    flip defaultMainWithArgs args
     [ NLP.GenI.GeniVal.testSuite
     , NLP.GenI.Tags.testSuite
     , NLP.GenI.Simple.SimpleBuilder.testSuite
     ]
-- -----------------------------------------------------------------------

There's some other things going on in this file, notably the organisation of test suites. More on that later.

Where should I put my properties?

4. Put tests in the same module (where relevant)

If a test is specific to one module, I tend to put them in that same source file. I do this because

It lets me test functions that I don't want to export
The tests serve as documentation
It forces me to update my tests along with my code

This approach is in contrast to (a) having one big tests module and (b) having a separate test hierarchy. It may turn out to be useful to have a single big tests module as well, for example, for tests that cross the boundary from one module to the next. That need has not arisen for me yet. Likewise, I don't particularly believe in a separation between tests and code, although on the other hand some very experienced hackers seem to do so, so I'll just have to let experience teach me why.

How do I avoid repeating myself?

5. Provide a testSuite function for each module

Commenting on my last post, Josef kindly pointed out that the book-keeping I feared isn't so bad in practice. He's right. Nevertheless, I want to avoid it. To do this, I make each of my modules export a testSuite function. Here is what one of my modules looks like, just focusing on the test suite

-- EXAMPLE MODULE --------------------------------------------------------
module NLP.GenI.GeniVal where

-- SKIPPED MAIN IMPORTS ...

import Test.Framework
import Test.Framework.Providers.HUnit
import Test.Framework.Providers.QuickCheck
import Test.QuickCheck
import Test.HUnit

-- SKIPPED MAIN CODE

testSuite = testGroup "unification"
 [ testProperty "self" prop_unify_sym
 , testProperty "anonymous variables" prop_unify_anon
 , testProperty "symmetry" prop_unify_sym
 , testCase "evil unification" test_evil
 ]

-- SKIPPED THE TESTS THEMSELVES
-- -----------------------------------------------------------------------

If you'll scroll up to the example that's marked TEST-FRAMEWORK WRAPPER, you'll see how these test suites are used in practice. Note the small trick of using the qualified module name to identify the test suite.

Anyway, the general principle of having a per-module test suite comes from Aidan Delaney's Organising Unit Tests in Haskell. The main difference between his approach and my approach are that I mix tests and code rather liberally.

Conclusion

I hope that some of these hints will make testing easier for you, or perhaps even get you started. If you still find yourself putting testing off, let me know. I'll be curious to see what else makes us resist. One thing that would probably be helpful is an extra guide to writing Arbitrary instances for QuickCheck, and also writing good properties that control the space well. Maybe even getting started with SmallCheck.

Note that I am still somewhat new to testing and have only recently started these practices. So take these ideas with the usual salt. Thanks to Greg, Reinier, Aidan, and also folks who commented on my previous posts.

How do I make my tests easy to run?

1. Use test-framework

2. Support cabal test

Here's a Setup.hs recipe I copied. It has the handy property of the code is that it runs your tests straight from your dist/build directory.

-- EXAMPLE Setup.hs FILE 1 -----------------------------------------------
import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "test" </> "test"
-- -----------------------------------------------------------------------

3. Bake your unit tests in

-- EXAMPLE Setup.hs FILE 2 -----------------------------------------------

import System.FilePath

main = defaultMainWithHooks hooks
  where hooks = simpleUserHooks { runTests = runTests' }

runTests' :: Args -> Bool -> PackageDescription -> LocalBuildInfo -> IO ()
runTests' _ _ _ lbi = system testprog >> return ()
  where testprog = (buildDir lbi) </> "geni" </> "geni --tests"

-- -----------------------------------------------------------------------

-- EXAMPLE TEST-FRAMEWORK WRAPPER ------------------------------------------
module NLP.GenI.Test where

import System.Environment ( getArgs )
import Test.Framework

import NLP.GenI.GeniVal ( testSuite )
import NLP.GenI.Tags ( testSuite )
import NLP.GenI.Simple.SimpleBuilder ( testSuite )

runTests :: IO ()
runTests =
 do args <- filter (/= "--tests") `fmap` getArgs
    flip defaultMainWithArgs args
     [ NLP.GenI.GeniVal.testSuite
     , NLP.GenI.Tags.testSuite
     , NLP.GenI.Simple.SimpleBuilder.testSuite
     ]
-- -----------------------------------------------------------------------

There's some other things going on in this file, notably the organisation of test suites. More on that later.

Where should I put my properties?

4. Put tests in the same module (where relevant)

If a test is specific to one module, I tend to put them in that same source file. I do this because

It lets me test functions that I don't want to export
The tests serve as documentation
It forces me to update my tests along with my code

How do I avoid repeating myself?

5. Provide a testSuite function for each module

-- EXAMPLE MODULE --------------------------------------------------------
module NLP.GenI.GeniVal where

-- SKIPPED MAIN IMPORTS ...

import Test.Framework
import Test.Framework.Providers.HUnit
import Test.Framework.Providers.QuickCheck
import Test.QuickCheck
import Test.HUnit

-- SKIPPED MAIN CODE

testSuite = testGroup "unification"
 [ testProperty "self" prop_unify_sym
 , testProperty "anonymous variables" prop_unify_anon
 , testProperty "symmetry" prop_unify_sym
 , testCase "evil unification" test_evil
 ]

-- SKIPPED THE TESTS THEMSELVES
-- -----------------------------------------------------------------------

Conclusion

2009-06-24

Haskell syntax highlighting on Wikipedia and Wikibooks

If you edit the Haskell Wikibook and Wikipedia entries with Haskell in them, you may be interested to note that Haskell syntax highlighting is now available on all Wikimedia projects.

Example:

<source lang="haskell">
-- foo
let x = foo
</source>

If you edit the Haskell Wikibook and Wikipedia entries with Haskell in them, you may be interested to note that Haskell syntax highlighting is now available on all Wikimedia projects.

Example:

<source lang="haskell">
-- foo
let x = foo
</source>

2009-06-08

testrunner for practical quickcheck

I had mentioned in a previous post three practical problems I had getting started with QuickCheck. My third question in this post was:

How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

And one of the replies I got:

I'm sure people are writing tests, but we all hack up harnesses in our own idiosyncratic ways.... -- blackdog

Maybe we can do better. Instead of everybody hacking up their own harness, how about having one test harness that everybody wants to use? We may even have a candidate for such a harness. Reinier Lamers has recently released a "testrunner" package which supports some rather nice features:

It can run unit tests in parallel.
It can run QuickCheck and HUnit tests as well as simple boolean expressions.
It comes with a ready-made main function for your unit test executable.
This main function recognizes command-line arguments to select tests by name and replay QuickCheck tests.

That's all really good stuff, but I think the number one best feature for me would be the little tutorial on its homepage.

Testrunner is work that Reinier started in the context of the darcs project. We were trying to make our own custom test suite faster and more useful. Seeing ahead, Reinier did it not just by tweaking and tuning the harness we have, but by writing a more general purpose harness that did the things we wanted it to do and hopefully which other projects would want to do as well. So do you have a Haskell project that needs testing? Or maybe you already are doing some tests, but you just wish you could squeeze a little more out of your tests? Give testrunner a try!

Edit 2009-06-08 17:15
It turns out there is a second candidate, or rather a first candidate since test-framework has been around for months. Embarrassingly enough, I had started to use test-framework for my own stuff, but I never realised how feature complete it was. Maybe it'll be time to merge projects? I'll see what Reinier thinks. Apologies to Max...

I had mentioned in a previous post three practical problems I had getting started with QuickCheck. My third question in this post was:

How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

And one of the replies I got:

I'm sure people are writing tests, but we all hack up harnesses in our own idiosyncratic ways.... -- blackdog

It can run unit tests in parallel.
It can run QuickCheck and HUnit tests as well as simple boolean expressions.
It comes with a ready-made main function for your unit test executable.
This main function recognizes command-line arguments to select tests by name and replay QuickCheck tests.

2009-02-26

inkscape layers

Here's a small program that I wrote to extract a subset of layers from an Inkscape file. It may be handy if you have to give a talk and you want to include some "animated" overlays in your slides.

I'm writing this post because I'm pleased to be able to automate this process at last. Also, I want to demonstrate that you don't have to be particularly clever or ambitious to get some good practical use out of Haskell.

usage

So I've got my Inkscape file with a "base" layer and several steps of my animation "zero", "one", "two", "three".

If I do inkscape-layers myfile.svg base > /tmp/foo.svg && inkscape --export-pdf=/tmp/foo.pdf", I get just the base layer which isn't very interesting:

Now if I do inkscape-layers myfile.svg base zero (and convert the resulting SVG into a PDF as above), I get the zeroth layer:

Likewise, to build the rest of my animation, inkscape-layers myfile.svg base one

inkscape-layers myfile.svg base two

Now instead of going clickity-click all over the place, I just dump this in my Makefile. If I every have to change something about my animation (for example, in the base layer), I just run "make" and rebuild it automatically.

Yay, Haskell! Well, I'm sure you could just as easily have written this in your favourite programming language; I just like to randomly credit Haskell for making my life easier :-D

the code

I may upload this to Hackage if I could maybe get some other useful inkscape tools with it:

import Data.Maybe (fromMaybe)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stdout, stderr)
import Text.XML.Light

main =
 do args  <- getArgs
    pname <- getProgName
    case args of
      (f:ls) -> go f ls
      _      -> hPutStrLn stderr $ unwords [ "Usage:", pname, "filename", "layer1", "[layer2 [.. layer N]]" ]

go f ls =
 do d <- goodXML =<< parseXMLDoc `fmap` readFile f
    let o = stdout -- we may want to make this more flexible later
    hPutStrLn o . showTopElement . wrapTop walk $ d
 where
  goodXML = maybe (fail "bad XML") return
  --
  walk x@(Elem el) =
   let lbl = fromMaybe "" (findAttr qLABEL el)
       x2  = Elem $ el { elContent = map walk (elContent el) }
   in case () of _ | not (isLayer el) -> x2
                   | lbl `elem` ls    -> x2
                   | otherwise        -> Text blank_cdata
  walk x = x

isLayer el = elName el == qSVG "g" && findAttr qGROUP_MODE el == Just "layer"

qLABEL      = qInkscape "label"
qGROUP_MODE = qInkscape "groupmode"

qSVG l = QName l (Just nsSVG) Nothing
nsSVG = "http://www.w3.org/2000/svg"

qInkscape l = QName l (Just nsINKSCAPE) Nothing
nsINKSCAPE="http://www.inkscape.org/namespaces/inkscape"

wrapTop f e =
 case f (Elem e) of
 (Elem e) -> e
 _ -> error "programmer error: top content is not an element"

Note: as an exercise: modify the attributes of all exported layers so that they are visible. In Inkscape, I tend to make layers invisible so I don't get confused by them. But then Inkscape does not export them, which is annoying. This seems to be a simple matter of replacing "display:none" with "display:inline" in the style attribute (watch out, there could be more than one!). The 'split' library on Hackage could be handy for that.

usage

Now if I do inkscape-layers myfile.svg base zero (and convert the resulting SVG into a PDF as above), I get the zeroth layer:

Likewise, to build the rest of my animation, inkscape-layers myfile.svg base one

inkscape-layers myfile.svg base two

the code

I may upload this to Hackage if I could maybe get some other useful inkscape tools with it:

import Data.Maybe (fromMaybe)
import System.Environment (getArgs, getProgName)
import System.IO (hPutStrLn, stdout, stderr)
import Text.XML.Light

main =
 do args  <- getArgs
    pname <- getProgName
    case args of
      (f:ls) -> go f ls
      _      -> hPutStrLn stderr $ unwords [ "Usage:", pname, "filename", "layer1", "[layer2 [.. layer N]]" ]

go f ls =
 do d <- goodXML =<< parseXMLDoc `fmap` readFile f
    let o = stdout -- we may want to make this more flexible later
    hPutStrLn o . showTopElement . wrapTop walk $ d
 where
  goodXML = maybe (fail "bad XML") return
  --
  walk x@(Elem el) =
   let lbl = fromMaybe "" (findAttr qLABEL el)
       x2  = Elem $ el { elContent = map walk (elContent el) }
   in case () of _ | not (isLayer el) -> x2
                   | lbl `elem` ls    -> x2
                   | otherwise        -> Text blank_cdata
  walk x = x

isLayer el = elName el == qSVG "g" && findAttr qGROUP_MODE el == Just "layer"

qLABEL      = qInkscape "label"
qGROUP_MODE = qInkscape "groupmode"

qSVG l = QName l (Just nsSVG) Nothing
nsSVG = "http://www.w3.org/2000/svg"

qInkscape l = QName l (Just nsINKSCAPE) Nothing
nsINKSCAPE="http://www.inkscape.org/namespaces/inkscape"

wrapTop f e =
 case f (Elem e) of
 (Elem e) -> e
 _ -> error "programmer error: top content is not an element"

2009-02-21

implementing join in terms of (>>=)

One of the things I got out of the Typeclassopedia is a somewhat more mature understand of monads (at last!). As a bonus side-effect it has also given me a slightly better understanding of myself. Specifically, I learned I often have trouble learning things because I suffer from a sort of "failure to unify". I thought I might make a note of it for the benefit of anybody else who is interested in how they learn... or not, as the case may be.

So,

we have (>>=) :: m a -> (a -> m b) -> m b
we want join :: m (m x) -> m x

My mind drew a complete blank. So I went with something "direct" via do notation:

join mmx =
 do mx <- mmx
    x  <- mx
    return x

Those last two lines are redundant:

join mmx =
 do mx <- mmx
    mx

Hang on, Eric, surely you don't need the crutch of do notation...

join mmx = mmx >>= (\mx -> mx)

That's just id:

join mmx = mmx >>= id

But wait! Surely that can't be right! Doesn't (>>=) require something of type a -> m b? And isn't id giving me m x -> m x? I stared at that for a while, almost panicking. What did I do wrong? And then it clicked. Of course, the a in a -> m b could stand in for any type, including m x. Just because it doesn't have a little m in it, doesn't mean that it's constrained not to have one.

A simpler version of this kind of error, although one that didn't get me this time: just because we have a and b doesn't mean we actually have to have two different types. They can, but don't need to. And that, is my "failure to unify", inventing completely illusory constraints and not seeing through them.

And so join is just (>>= id). It took a little struggle, but it was well worth it!

(PS, in my original attempt, I used the more conventional m (m a) when thinking of the types instead of what I reported here, m (m x). The reason I reported the later is because I didn't want to confuse the discussion with another stumbling block I have, which is a "failure to rename", i.e. forgetting that two things called a in different contexts are actually two separate things. It's like speaking a foreign language. Just because you are aware that you have to do something, doesn't mean you will always do it automatically. Anyway, the "failure to rename" may very likely have conspired with the "failure to unify" in making me confused for a while)

we have (>>=) :: m a -> (a -> m b) -> m b
we want join :: m (m x) -> m x

My mind drew a complete blank. So I went with something "direct" via do notation:

join mmx =
 do mx <- mmx
    x  <- mx
    return x

Those last two lines are redundant:

join mmx =
 do mx <- mmx
    mx

Hang on, Eric, surely you don't need the crutch of do notation...

join mmx = mmx >>= (\mx -> mx)

That's just id:

join mmx = mmx >>= id

2009-02-16

announcing: burrito tutorial support group

It's really for the best if you leave these sorts of things out in the open.

The first step is to ask for forgiveness, right?

2009-02-04

practical quickcheck (wanted)

Despite all the glowing reports on how useful QuickCheck is, I find that I still have a lot of resistance to using it. A lot of resistance comes from uncertainty, so in this post, I'm going to write down some of my half-formulated questions about using QuickCheck.

Now, there may not be any right answer to these questions, but I'm writing them down anyway so that other people in my shoes know that they are not alone. Later on, as I find the answers that work for me, I'll hopefully put together some notes on 'Practical QuickCheck'.

Where should I put my properties? Xmonad and darcs seem to put them in a single properties module, but it would seem more natural to me to stick them in the same module as the functions I'm quickchecking. That said, I imagine that some properties can be thought of as being cross-module, so maybe a properties module would make sense.
How do I avoid redundancy, and generally repeating myself? Ideally, I would just write a property and be done with it. It would annoy me to have to keep updating some list of properties somewhere else (duplication). That said, maybe it's not really duplication if the list serves a secondary purpose of grouping the properties into some sensible hierarchy. Maybe the real question is "how do I make sure I don't forget to run all my properties?"
How do I make my tests easy to run? Do I have to write my own RunTests module? Should I just use something like quickcheck-script?

I might update this list later as I think of more "best practices" questions. Hopefully I can follow this up with a short article teaching myself and others that really getting started with QuickCheck is easy easy easy (or maybe a link to a pre-existing article of the sort). The Real World Haskell chapter on it seems helpful.

2009-01-30

haskell-ji

As a programmer, I find myself struggling with a lot of really mundane and stupid-looking issues like "how should I name my variables", or "should acronyms be kept upper case (XML), or smooshed down for easier CamelCasing (Xml)?" and finally "what order should my code go in?"

These questions do not so much keep me up and night, but cause me an inordinate amount of flip-flopping in my code. Not remembering my preference du jour, I'll sometimes do things four different ways in code and later on suffer because I forgot that in one bit of code, I had named something parseXML and in the other bit, I had named it xmlParse.

The good news is that things are settling down on at least one front. It seems that all the versions of Eric past and present are settling on a consensus on How To Lay Code Out. The result is a set of directional tips, akin to the kind of thing you learn when you are writing Chinese Hanzi (Japanese Kanji):

Types before code
High-level before low-level -- For example, generally using where instead of let...in, but also "higher-level" functions first, "detail" functions later
Input before output -- It's not that this was ever up for debate, it's just that sometimes, I'll write it the other way without realising that I'm doing it.
Odds and ends last -- At the very end of my code: an odds-and-ends section for all those little snippets of code you copy around but are that too small to justify making a library, e.g.
```
buckets :: Ord b => (a -> b) -> [a] -> [ (b,[a]) ]
buckets f = map (\xs -> (f (head xs), xs))
        . groupBy ((==) `on` f)
        . sortBy (compare `on` f)
```
Do you have an odds-and-ends.hs file on your computer?

Notice that the tips are not always compatible with each other, but they do sort of point in the same general direction.

Phew, I'm glad I'm starting to get at least this bit sorted. I really hope it reduces the amount of pointless erician flip-flopping. It's no big deal -- civilisation does not collapse because of inconsistent case conventions -- but it is a nuisance. This kind of thing is on the order of silly American-style dates vs. European-style dates causing confusion, where we could all just be using International yyyy-mm-dd dates, and while we're at it, 24 hour time, the metric system and A4 paper...

Types before code
High-level before low-level -- For example, generally using where instead of let...in, but also "higher-level" functions first, "detail" functions later
Input before output -- It's not that this was ever up for debate, it's just that sometimes, I'll write it the other way without realising that I'm doing it.
Odds and ends last -- At the very end of my code: an odds-and-ends section for all those little snippets of code you copy around but are that too small to justify making a library, e.g.
```
buckets :: Ord b => (a -> b) -> [a] -> [ (b,[a]) ]
buckets f = map (\xs -> (f (head xs), xs))
        . groupBy ((==) `on` f)
        . sortBy (compare `on` f)
```
Do you have an odds-and-ends.hs file on your computer?

2009-01-08

fold diagram revisited?

z
|
f----1----f
|    :    |
f----2----f
|    :    |
f----3----f
|    :    |
f----4----f
|    :    |
f----5----f
     :    |
     []   z

z
|
f----1----f
|    :    |
f----2----f
|    :    |
f----3----f
|    :    |
f----4----f
|    :    |
f----5----f
     :    |
     []   z

2009-10-08

In a nutshell

The road ahead

Thank-you!

Changes in detail

In a nutshell

The road ahead

Thank-you!

Changes in detail

2009-09-11

desiderata

basic ideas

basic solution

fancier solution

desiderata

basic ideas

basic solution

fancier solution

2009-07-29

2009-07-28

How do I make my tests easy to run?

1. Use test-framework

2. Support cabal test

3. Bake your unit tests in

Where should I put my properties?

4. Put tests in the same module (where relevant)

How do I avoid repeating myself?

5. Provide a testSuite function for each module

Conclusion

How do I make my tests easy to run?

1. Use test-framework

2. Support cabal test

3. Bake your unit tests in

Where should I put my properties?

4. Put tests in the same module (where relevant)

How do I avoid repeating myself?

5. Provide a testSuite function for each module

Conclusion

2009-06-24

2009-06-08

2009-02-26

usage

the code

usage

the code

2009-02-21

2009-02-16

2009-02-04

2009-01-30

2009-01-08

Blog Archive

Projects

About Me