Hobby-hacking Eric

2008-12-30

riot is almost a Haskell mail client

In case anybody wanted to write a mail client in Haskell, I should point out that Tuomov's riot (Riot is an Information Organisation Tool) outliner
  • provides a sort of mutt-like user interface and
  • stores its outlines as mailboxes (using in-reply-to to treat outline ancestry as thread ancestry).

So that's some of the work done for you :-)


2008-11-19

iterative committing

A few weeks ago, I saw this interesting complaint about distributed revision control advocacy:
But really, to read some of these articles, you'd think 99.9% of OSS contributions come from people who live on planes, only get 10% uptime on their broadband at home, and are incapable of spending the five minutes required to install something like Subversion locally for use with side projects.
This particular complaint resonated with me because I've always had a slight feeling that all this talk of airplanes and intermittent online access is missing the point.

I think what would help these discussions is to introduce the idea that there are really two ways to be disconnected: the involuntary way that most people talk about, and the voluntary way which is the really interesting one.

To be involuntarily disconnected is to be literally or technically offline. The universe prevents you from phoning home because it broke your wifi card or plunked you deep in an Amazonian rain forest. True, a distributed revision control system lets you continue hacking in the face of such adversity; but this fact isn't very convincing to some folks who are used to centralised revision control. How often in today's world are you really involuntarily offline? The trick is that sometimes your disconnectedness is entirely voluntarily. I don't really mean that in the sense of unplugging your cable modem and calling a moratorium on network access for the day. The minute you want to commit to a server and can't because of missing network access, you are offline involuntarily, even if this came as the result of a voluntary decision.

What I mean is that distributed revision control allows you to have pockets of deliberate disconnectedness from your peers. You want to work on something in little bits and pieces, you want to version control your work in progress, but you don't want to inflict your uncompleted work on your friends. A distributed VCS gives you a chance to step back for a moment and continue working with the benefit of version control. There are two alternatives to stepping back, neither of which are really acceptable. The first is to go ahead and commit your stuff with wild abandon, the consequences of which being that you pollute the change history with unfinished work and make life potentially difficult for your friends. The second alternative is NOT to commit your stuff at all, the consequences of which being that you lose the ability to track and log your your work as you go along.

A distributed revision control system gives you the choice of iterative committing. It doesn't really matter if you are online or offline actually. Sometimes you just want to commit something for your own sake and only later decide if the commits should be shared with the main repository or not. In the meantime you can choose to go back, undo a commit, redo a commit, undo all your intermediary commits and lump them all into one big commit, update from the main repository and then rework your commit in the new context. These are the choices that a distributed revision control system offers.

It's heartening to see that the idea of using a distributed VCS is catching on, that people are starting to adopt the likes of darcs, git, mercurial and bzr for their work. It means that the joy of iterative committing is spreading. Of course, I am partial to one of these systems in particular. Perhaps in a future article, I can describe what I think is the essential difference between darcs and our estimable competitors. I think I will call it iterative merging.

Happy committing in the meantime!


2008-11-07

timesheet helper

I wish there was a simple, no-fuss command line timesheet helper in the spirit of cabal-install and twidge. The kind of interactions I imagine are:

09:00 # timesheet start work draft 3 of the paper
10:00 # timesheet start darcs dwn
10:45 # timesheet start work regression test for ppack
12:00 # timesheet stop
12:30 # timesheet start darcs roadmap
13:15 # timesheet start work regression test for ppack
16:30 # timesheet start darcs patch review
17:00 # timesheet start work meeting
18:30 # timesheet stop

18:30 # timesheet summary
Today 2008-11-07
-------------------
darcs: 2h
work: 6h 30m

18:30 # timesheet details
Today 2008-11-07
-------------------
darcs: 2h
* dwn: 30 m
* roadmap: 45m
* patch review: 45m
work: 6h 30m
* draft 3 of the paper: 1h
* regression test for ppack: 4h
* meeting: 1h 30m

(Note the assumption here that you are never working on two tasks at the same time; clocking in to a new task automatically clocks you out of an old one). The key to this application is simplicity. In its present state, gpe-timesheet (0.32) uses too many confirmation dialogues to be really useful. Loggr is nice and simple, but if I close my browser window, I lose track of things. Another property I would like to have is for the application to be forgiving to mistakes. If it stored timesheets in a simple text format, for example, I could just edit out my mistakes in a text-editor.

For Haskellers, I also wish that we had a common library for writing command line applications with subcommands and switches. This would be useful for darcs, cabal-install, twidge, this timesheet application, and more.


2008-10-30

official darcs blog!

Darcs weekly news has moved! It will now be hosted on the official darcs blog at http://blog.darcs.net.

The latest entry, darcs weekly news #10 has been posted on the new blog.


2008-10-26

darcs hacking sprint - Team Brighton Day 2

Ganesh and Ian, slurpies and curl


More Important Looking Things for the whiteboard (faster slurpies, courtesy of Ganesh)


Team Brighton. (having worked out the auto-timer on Eric's camera)


Sprint on!


Sprint wrap-up later...


2008-10-25

darcs hacking sprints - some pictures from Team Brighton

Just a little update from day 1. Who's doing what?



Ganesh (Heffalump) profiling away and drinking coffee from a University of Brighton mug!




Ian (Igloo) looking serious and Campy



Eric (kowey) unscattering his brain



Healthy hacking (malteasers conveniently obscured by kettle)



Hope to have a report up after the sprint!


2008-10-23

darcs weekly news #9

News and discussions

  1. Enfranchising darcs! An update on the build systems question
  2. Darcs hacking sprint in 2 days!
  3. What does it mean to commute? Darcs hackers like to talk about 'commuting' patches all the time. But what does that mean? Jason explains and provides a tiny bit of code for us to play with
  4. darcsweb 1.1-rc1 Alberto Bertogli reports a release candidate for darcsweb 1.1, with support for darcs 2 repositories, and syntax highlight support if the pygments module is available
  5. First impressions of darcs. A Pythonista named Benjamin tries darcs out for the first time. Here are his likes and dislikes.
  6. Choosing a revision control system. Daniel Carrera compares darcs with Monotone, Mercurial and Bazaar. Daniel finds our "brilliant patch management" to be unique, but what can we learn from the others?

Reviewers

  • Jason Dagit

New contributors

  • Christian Kellermann
  • Salvatore Insalaco
  • J. Garrett Morris

Issues resolved in the last week (1)

issue784 Salvatore Insalaco

Patches applied in the last week (66)


See text entry for details.


2008-10-17

darcs weekly news #8

News and discussions

  1. Improving the darcs build system? David Roundy is doing some interesting work on building darcs with his franchise build system. There are also attempts by other folks to Cabalise darcs. Discussions are underway about the future of building darcs.
  2. Type Correct Changes: A Safe Approach to Version Control Implementation. Jason Dagit gave a Galois tech talk on the use of Haskell GADTs to make darcs code more transparent, robust and approachable.
  3. Haskell, static typing, type witnesses and darcs. David Roundy gave a darcs talk at the ACM (5 October), presenting darcs and also explaining how the type witnesses are helping us to avoid errors in the code.
  4. Darcs hacking sprint only 9 days away!

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Trent W. Buck
  • Jason Dagit
  • Nathan Gray
  • Simon Michael

Issues resolved in the last week (3)

issue1062 Eric Kow
issue1105 Dmitry Kurochkin
issue1139 David Roundy

Patches applied in the last week (96)


See text entry for details.


2008-10-10

darcs 2.1.0 released!

I am delighted to announce the release of darcs 2.1.0, available at

http://darcs.net/darcs-2.1.0.tar.gz

What has changed?

This version provides over 20 bug fixes and 7 new features since darcs 2.0.2. The most notable changes are:

  • Defaulting to darcs-2. The darcs initialize command now creates darcs-2 format repositories by default. This change will make the the improved conflict handling and merging semantics from darcs 2 available to more users. Note that no action is required on your part. Darcs will continue working with all pre-existing repositories. You can explicitly request an old-fashioned repository if needed.

  • Better HTTP support. Dmitry Kurochkin has refined our HTTP support and fixed several http-related bugs from darcs 2.0.2. There is also an experimental --http-pipelining feature you can enable on the command line (or in your defaults file) for faster downloading. Note: --http-pipelining is enabled by default for libwww, and also for libcurl 7.19.1 (not yet released at the time of this writing)

  • Repository correctness. David Roundy has resolved a longstanding 'pending patch' regression (originally reported on 2008-02). Needless to say the offending case has been moved to our regression testing suite

See the attached ChangeLog for more details.

What should I do?

Upgrade! Binary versions should be available shortly, either from your favourite package distributor or by third party contributors.

Other than installing the new darcs, no action is required on your part to perform this upgrade. Darcs 2, including this particular version, is 100% compatible with your pre-existing repositories.

If you have not done so already, you should consider using the hashed repository format in place of your current old-fashioned repositories. This format offers greater protection against accidental corruption, better support for case insensitive file systems. It also provides some very nice performance features, including lazy fetching of patches and a global cache (both optional).

If darcs 1 compatibility is not a concern, you could also upgrade your repositories all the way to the darcs 2 format. In addition to the robustness and performance features above, this gives you the improved merging semantics and conflicts handling that give darcs 2 its name.

More details about upgrading to darcs 2 here:

http://wiki.darcs.net/index.html/DarcsTwo

What comes next?

We will now be shifting to a time based release model, with the next darcs release planned for January 2009.

For the next release of darcs, we will be focusing on optimising darcs's day to day performance issues. We want darcs to fetch repositories as fast as it possibly can over a network, and we especially want to rehabilitate known slow commands like darcs annotate. We believe that a few simple and practical changes can really improve the darcs experience for most users.

Think you can help? We would love to hear from you. In fact, the first darcs hacking sprint (25-26 October) is fast approaching! We have three venues available: Brighton, Paris and Portland and everybody is invited to come hack. See http://wiki.darcs.net/index.html/Sprints for details.

Thanks everybody, and enjoy!



darcs weekly news #7

News and discussions

  1. Darcs 2.1.0 released! With 20 bug fixes and 7 new features. Notable changes: darcs-2 repositories by default, HTTP robustness and better pending patch handling.
  2. Optimising darcs annotate. Darcs annotate is too slow. Proposed solution: create a cache mapping filenames to patches. Stay tuned for fast annotate in the future...
  3. Eleven new contributors since darcs 2.0.2. Thanks, Alex, Florent, Gaetan, Judah, Matthias, Max, Nathaniel, Steve, Taylor, Thorkil, and Vlad!

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Trent Buck
  • Tommy Pettersson

Issues resolved in the last week (6)

issue1104 Dmitry Kurochkin
issue1109 Dmitry Kurochkin
issue1111 Tommy Pettersson
issue1124 Thorkil Naur
issue1128 Benjamin Franksen
issue1131 Dmitry Kurochkin

Patches applied in the last week (35)


See text entry for details.


2008-10-02

darcs weekly news #6

News and discussions

  1. Third pre-release of darcs 2.1.0. Release pushed back to 17 October latest for more testing. We're getting very close to the finish line!
  2. Darcs ideas in other VCS. Kirill Smelkov has kind words for us on behalf of the NAVY project, which is moving away from darcs. Best of luck to Kirill with whatever revision control system NAVY choose! While we are delighted that "Good ideas behind [darcs] were adopted by youth", we still have a thing or two to show these whippersnappers.
  3. Haddock + Hoogle == Javadoc on steroids. Simon Michael has combined haddock and hoogle to give us a lovely darcs code browser. In the meantime, Florent Becker has been adding value to this browser by sending in lots of haddock patches. Many thanks to Simon and Florent!
  4. Patch theory update. Ian gives us his latest progress on documenting, prototyping and improving darcs patch theory. "[S]ome proofs are finally starting to appear, albeit rather handwavey for now". Go Ian!

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Simon Michael

Issues resolved in the last week (5)

issue1003 David Roundy
issue1043 David Roundy
issue1078 Dmitry Kurochkin
issue1102 Eric Kow
issue1110 David Roundy

Patches applied in the last week (47)

See text entry for details.


2008-09-25

darcs weekly news #5

News and discussions

  1. Second pre-release of darcs 2.1.0 (formerly known as 2.0.3) This version of darcs will produce darcs-2 format repositories by default
  2. New issue manager - Thorkil Naur. The darcs team now has an official Issue Manager role. Thorkil will be ensuring that incoming reports are responded to in a timely manner, and that all bugs are eventually moved to a resolved state.
  3. Hoogling the darcs source?

Issues resolved in the last week (5)

issue27 David Roundy
issue53 Eric Kow
issue805 David Roundy
issue1039 Dmitry Kurochkin
issue1041 Vlad Dogaru

Patches applied in the last week (54)

See text entry for details.


2008-09-18

darcs weekly news #4

News and discussions

  1. First pre-release of darcs 2.0.3. This version of darcs has some very nice bug fixes on offer. A few more user-friendliness tweaks are planned for the actual release.
  2. Third venue confirmed for darcs hacking sprint, 25-26 October. Brighton, Portland and now Paris are all CONFIRMED. Come hack with us!
  3. code.haskell.org upgrades to darcs 2! /usr/bin/darcs is now darcs 2.0.2 on this server. No action is needed on the user's part.
  4. Retiring GHC 6.4. Nobody seems to be using GHC 6.4 to compile darcs after all, so we shall be dropping support for it.

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Jason Dagit
  • Nathan Gray
  • Trent W. Buck

Issues resolved in the last week (6)

issue691 Dmitry Kurochkin
issue709 David Roundy
issue885 David Roundy
issue1012 David Roundy
issue1054 Dmitry Kurochkin
issue1057 David Roundy

Patches applied in the last week (86)

See 2008-09-17 text entry for details


2008-09-10

darcs weekly news #3

News and discussions

  1. Venues confirmed for the darcs hacking sprint, 25-26 October. Brighton and Portland are CONFIRMED; Paris is likely. Come hack with us!
  2. Planning darcs 2.0.3. We have started making steps towards a release for the end of September. Eric thinks we are only a buildbot and couple of bugfixes away from a prerelease.
  3. Darcs patch theory. Ian Lynagh continues his patch theory research. He has written up a nice explanation and a working prototype of a darcs-like patch theory.
  4. Retiring GHC 6.4. The darcs team would like to know if anybody is still using GHC 6.4 to compile darcs, so that we can focus on later versions (6.6 and 6.8).

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Jason Dagit
  • Trent Buck

Issues resolved in the last week (7)

issue844 David Roundy
issue924 Eric Kow
issue1015 Ganesh Sittampalam
issue1037 Dmitry Kurochkin
issue1049 David Roundy
issue1050 Eric Kow
issue1063 Eric Kow

Patches applied in the last week (72)

See the darcs weekly news #3 email for the full list.


2008-09-06

darcs hacking sprint (25-26 October 2008)

Some news on the darcs hacking sprint. We have at least two venues confirmed, hopefully three shortly.

Venues

We plan to host the sprint across three sites:

  • CONFIRMED: Brighton, UK (University of Brighton)
  • CONFIRMED: Portland, USA (Galois)
  • likely: Paris, France (Université Paris Diderot)

So if you were waiting to book tickets, this is the time!

For more details, please see http://wiki.darcs.net/index.html/Sprints

Agenda

During this first sprint, we shall be focusing our attention on the day to day performance issues that darcs users commonly face.

This is what we are reaching for:

  1. Fast network operations. We want to make it very pleasant for users to darcs get a repository and pull some patches to it over http and ssh. Git does this very well, and we plan to learn from them.

  2. Cutting memory consumption. We want to profile the heck out of operations like darcs record, darcs convert and darcs whatsnew. What's eating up all the memory? And how can we can cut it down to size?

  3. Responsiveness. Sometimes basic darcs commands can take long enough for programmers to lose their train of thought. We want to track down these lost seconds and kill that dreaded context switch.

Of course, if you are interested in other areas, then you can work on those instead.

Note that if you are new to the darcs code or to Haskell, there will also be a lot interesting jobs for you to get started with. Everyone will have something to hack on, so come join us!

Thanks very much to the University of Brighton, Galois and University of Paris VII for their generous offers.

Hope to see you there, everyone! :-)



2008-09-03

darcs weekly news #2

News and discussions

  1. Growing the darcs team: The darcs unstable repository is coming back, with David Roundy as its maintainer. Eric will be taking care of stable and keeping in closely in synch.
  2. Shiny new IRC logs: Thanks to Moritz Lenz, the #darcs and #darcs-theory IRC channels are now being logged with fancy formatting and search capability
  3. Hacking darcs: Petr Ročkai shares his recent hopes and experiences as a darcs user turned developer. Come share the excitement!

Reviewers

Thanks to our patch reviewers for this week for giving David a hand!

  • Jason Dagit
  • Nathan Gray
  • Eric Kow
  • Petr Ročkai

Issues resolved in the last week (1)

issue966 Dmitry Kurochkin

fix apply_inv_to_matcher_inclusive. http://bugs.darcs.net/issue966

Patches applied in the last week (37)

2008-08-31 David Roundy
  • don't show ssh stderr output unless we're passed --debug.
  • fix bug in --list-options (tab completion).
  • fix bug in makeRelative.
2008-08-30 Ganesh Sittampalam
  • add warning to configure about Haskell zlib speed
  • make use of Haskell zlib dependent on bytestring
  • add option to use Haskell zlib package
2008-08-22 Eric Kow
  • Remove unused FileSystem module.
  • Add a link to a repository browser for darcs's code.
2008-08-29
  • Replace grep invocation by perl code
2008-08-24 David Roundy
  • clean up network/get.sh test.
  • fix type of withRepository and friends.
  • fix recent bug in --list-options.
2008-08-28 Dmitry Kurochkin
  • Check for package random on windows, used in Ssh module.
  • Debug messages in curl module.
2008-08-28 David Roundy
  • TAG working version.
2008-08-27 Dmitry Kurochkin
  • Use InclusiveOrExclusive instead of Bool in apply_inv_to_matcher.
2008-08-27 David Roundy
  • add more modules to make witnesses.
2008-08-27 Jason Dagit
  • updates to Darcs.Patch.Unit for type witnesses
2008-08-27 Dmitry Kurochkin
  • Refactor get_matcher and apply_inv_to_matcher functions from Darcs.Match module.
  • Resolve issue966: fix apply_inv_to_matcher_inclusive.
  • Simplify withCurrentDirectory.
2008-08-27 Jason Dagit
  • updates to Sealed.lhs to support type witness refactor in commands
  • updates to Ordered.lhs to support type witness refactor in commands
  • make Annotate.lhs compile with type witnesses
2008-08-27 David Roundy
  • fix type witnesses in Internal.
2008-08-27 Jason Dagit
  • updates to Repository.Internal to fix conflicts and support type witness refactor in commands
  • fix error in Properties due to new commuteFL
  • fix minor type witness compile error with new commuteFL
  • fix conflicts with get_extra changes
  • improve reporting for bug in get_extra
  • Finish refactor of Unrevert as well as making it pass double-unrevert.sh
  • add double-unrevert.sh test
  • partial type witnesses in Unrevert
2008-08-26 Eric Kow
  • More ChangeLog entries since 2.0.2
2008-08-27 David Roundy
  • fix bug in defaultrepo.
2008-08-26 Jason Dagit
  • fix accidental reversal in tentativelyAddToPending
  • minor refator to get_extra improve comments


2008-07-30

simple random numbers in Haskell

Random numbers are the kind of thing I use rarely enough that by the time I want to use them, I have forgotten the relevant details, but frequently enough that I get annoyed whenever it happens.

Hopefully these notes will be useful to somebody in a similar situation.

two things to know

(1) import System.Random

(2) randomIO :: Random a => IO a


The one function you really need to know about is randomIO (The type of this function is Random a => IO a. Don't worry if you do not understand the type; it suffices to know that it involves IO). In this example, we use and generate a random Int:
import System.Random

main =
do r <- randomIO
print (r + 1 :: Int)
-- Note re the ':: Int' above: Haskell can't figure out from
-- the context exactly what type of number you want, so we
-- constrain it to Int
One neat feature is that you can randomly generate anything that implements the Random typeclass. In the example below, we generate a random Bool. Notice how we do not do anything differently, except to treat the result as a bool (i.e. by applying not to it)
import System.Random

main =
do r <- randomIO
print (not r)


A useful exercise, if you know about typeclasses, is to implement Random for one of your own types. The toEnum function may be useful.

more advanced stuff

  1. you can use randomRIO :: Random a => (a,a) -> IO a to generate random numbers constrained within a range
  2. Instead of using the functions randomIO and randomRIO, you can separate obtaining a random number generator, from using the generator. Doing so allows you to minimise your reliance on the IO monad. It also makes your code easier to debug, because you can opt to always pass the same generator to it and make life much more predictable. See the functions random and randomR for details.
  3. a potentially handy trick is to generate an infinite list of random numbers, which you can then pass to a function. See the randoms function for details.


Edit: fixed s/randomR/randomIO/


2008-07-27

pandoc gets mediawiki support

Pandoc is a universal document converter. You feed it documents in one format (say, HTML) and it spits them out in another one (say, ODF). Assuming it works correctly, Pandoc has the potential to replace all those little one-to-one convertors (e.g. latex2html) in my toolbox. Just the one simple Pandoc.

And now, thanks to fiddlosopher (John MacFarlane?), it knows how to write Mediawiki files! Mediawiki? That's the syntax/software that powers Wikipedia, Wikibooks, and a whole slew of organisational or community wikis (like HaskellWiki).

Hey, Haskellers probably have a lot of LaTeX documents lying around. Maybe this is their chance to get them on Haskell wiki?

We're halfway to being able to do a roundtrip between LaTeX and Mediawiki! All we need is for somebody [maybe John :-D] to implement a Mediawiki reader for Pandoc and things could get mighty interesting... Oh and yes, and if anybody is working on a wiki with direct LaTeX support, hats off to you! Sometimes Mediawiki is a fact of life, though.


2008-07-23

rose zipper on hackage

I guess this isn't big enough to go on the haskell@ mailing list: I have uploaded Krasimir Angelov and Iavor S. Diatchki's Data.Tree implementation of zippers onto hackage. The package is called rosezipper and it is available under the BSD3 license.

For the interested, "The Zipper is an idiom that uses the idea of “context” to the means of manipulating locations in a data structure." (Haskell wiki).

For me, zippers are just a very nice way to navigate and edit trees. By "nice", I mean elegant, efficient and purely functional. Before learning about zippers, I only knew how to navigate trees from top to bottom, but if I wanted to go back up a node, or visit a sibling node, I basically had to start over from the root. Zippers allow me to walk the tree in any direction, visiting a node's parent, children and siblings without starting over from the top. This kind of thing is especially handy for Natural Language Processing people, basically, anybody who eats trees for a living.

If you would like to learn more, I would recommend Apfelmus's very friendly tutorial (part of the Haskell wikibook).

Thanks to Krasimir and Iavor for implementing this and for allowing me to package it up.


2008-07-22

encodings-aware hex editor

Here's another coding-project idea: I would like to see a hex editor that knows how to display characters in other encodings than ASCII (specifically: I want to debug messed up UTF-8 text files).

Google and apt-cache search reveal no such editor, at least not in the free/open-source worlds, nowhere in Linux or MacOS X freeware land. On Debian based systems, there are a couple that handle some Japanese encodings, but nothing that deals with UTF-8.

Likely features:
  • toggle between an ASCII-only mode and a show-as-UTF-8 mode
  • good UI for the fact that UTF-8 characters have a variable length in bytes
  • graceful handling of encoding errors


Haskellers could possibly do this as a part (plugin?) of Yi, or maybe just a completely standalone product.

And if you want a slightly simpler project, a UTF-8 hex dumper would be good. Hmmph... come to think of it, maybe it would have been more productive to just go write that instead of this blog post.

Edit: Well, I went ahead and made a stupid little dumper for my needs. Here is the output on some sample corrupted UTF-8
20 28 5b 47 65 6f 72 67 69 61 6e                     ([Georgian
3a 20 e183a1 e183 3f e183a5 e183 : ს«e1 83»?ქ«e1 83»
3f e183 20 e18397 e18395 e18394 e1839a e183 ?«e1 83» თველ«e1 83»
3f 5d 0a ?]
20 28 5b 47 65 72 6d 61 6e 3a 20                     ([German:
44 65 75 74 73 63 68 6c 61 6e 64 Deutschland
5d 20 5b 49 50 41 3a 20 cb88 64 c994 ] [IPA: ˈdɔ
c9aa 74 ca83 6c 61 6e 74 5d 29 2c 20 ɪtʃlant]),
6f 66 66 69 63 69 61 6c 6c 79 20 officially
74 68 65 20 46 65 64 65 72 61 6c the Federal
20 52 65 70 75 62 6c 69 63 20 6f Republic o
66 20 47 65 72 6d 61 6e 79 20 28 f Germany (
42 75 6e 64 65 73 72 65 70 75 62 Bundesrepub
6c 69 6b 20 44 65 75 74 73 63 68 lik Deutsch
6c 61 6e 64 2c 20 5b 49 50 41 3a land, [IPA:
20 cb88 62 ca8a 6e 64 c999 73 72 65 70 ˈbʊndəsrep
75 62 6c 69 cb 3f 6b 20 cb88 64 ubli«cb»?k ˈd
c994 c9aa 74 ca83 6c 61 6e 74 5d 29 2c ɔɪtʃlant]),
20 69 73 20 61 20 63 6f 75 6e 74 is a count
72 79 20 69 6e 20 43 65 6e 74 72 ry in Centr
61 6c 20 45 75 72 6f 70 65 2e 20 al Europe.
0a
Highlighting by hand. I should probably go figure out how to colourise the corrupted characters. Or maybe I should just go ahead and package this, put it up on hackage? Make it available via darcs? I would need a decent name. So far, I have hexy-xxy and hexdump-utf8 neither of which are that great :-/


2008-07-21

simply reading and writing UTF-8 in Haskell

A year and a half ago, I posted what seemed to be the simplest recipe for reading and writing UTF-8 in Haskell. In this post, I will provide an even simpler recipe, made possible by Eric Mertens' utf8-string package.

For those who are not familiar with Haskell, its internal representation for characters is Unicode, but for IO it effectively assumes that that it is reading and writing in the ISO8859-1 format. This used to be annoying for those of us who wanted to work with the UTF-8 encoding, but now there is a very simple solution, perfect for those of us who don't want to think too much and just get the job done.

the example


The sample problem from my last post was to take a UTF-8 encoded file as input, reverse all its lines, writing the results in the same file, with a ".rev" extension appended to its name. The solution might be self-explanatory if you are used to Haskell, but I will make some minor comments below, just in case.

import System.IO.UTF8
import Prelude hiding (readFile, writeFile)
import System.Environment (getArgs)

main =
do args <- getArgs
mapM_ reverseUTF8File args

reverseUTF8File f =
do c <- readFile f
writeFile (f ++ ".rev") $ reverseLines c

reverseLines = unlines . map reverse . lines

In the above code, we use some drop-in replacements for some System.IO functions. Some of these functions are also provided in the Prelude, so we must hide them so that they do not overlap with what we import. (Alternatively, we could import the UTF-8 ones qualified, which could be handy in contexts where we want the option of reading and writing in UTF-8 without committing to it). The rest is straightforward. Notice that we do not jump through any hoops whatsoever. In fact, you can pretty much take any pre-existing Haskell program that you have written and turn it into a UTF-8 version by changing the import statements.

Here are the results of running this script on a UTF-8 sampler:
)udrU( یتوہ ںیہن فیلکت ےھجم روا ںوہ اتکس اھک چناک ںیم 
)othsaP( يوږوخ هن ام هغه ،مش ېلړوخ هشيش هز
)naeroK(요아않 지프아 도래그 .요어있 수 을먹 를리유 는나
)keerG( .ατοπίτ ωθάπ αν ςίρωχ άιλαυγ ανέμσαπσ ωάφ αν ώροπΜ
)cidnalecI / aksnelsÍ( .gim aðiem ða sseþ ná relg ðite teg gÉ
)hsiloP( .izdokzs ein im i ,ołkzs ćśej ęgoM
)nainamoR( .etșenăr ăm un ae iș ălcits cnânăm ăs toP
)nainiarkU( .ьтидокшоп ен інем онов й ,олкш итсї ужом Я
)nainemrA( ։րենըչ տսիգնահնա իծնի և լետւո իկապա մանրԿ
)naigroeG( .ავიკტმ არა ად მაჭვ სანიმ
)idniH( .तह हन डप ईक स सउ झम ,ह तकस ख चक म
)werbeH( .יל קיזמ אל הזו תיכוכז לוכאל לוכי ינא
)hsiddiY( .ײװ טשינ רימ טוט סע ןוא זאלג ןסע ןעק ךיא
)cibarA( .ينملؤي ل اذه و جاجزلا لكأ ىلع رداق انأ
)esenapaJ( 。んせまけつ傷を私はれそ。すまれらべ食をスラ
)iahT( บจเนฉหใำทมไนมตแ ดไกจะรกนกนฉ
)slobmys ycnerruc( ₯·₮·₭·₫·₪·₩·₨·₧·₦·₥·₤·₣·₢·₡·¢·$·€·£·¥


The utf8-string package is available on HackageDB. Thanks to Eric M. for providing this little wrapper! It's a perfect example of the kind of thing which seems obvious... after somebody else has thought to do it.


2008-05-13

recurring problem (boring text file merging)

I keep solving variations of this problem at work, whether I'm trying to merge some log files together, or identify token offsets with bits of parse tree. I had better jot it down so that I don't forget there may be something more general hidden behind all this.

mergeFoo :: [a] -> [(Int,Int,b)] -> [Either a ([a],b)]




I'm not necessarily looking for a solution -- I could just boil one out from my previous solutions -- but I am at least officially and publicly reminding myself that I shouldn't keep solving the same thing over and over again (unless I'm engaged in some kind of lateral thinking exercise, which is a different story)


2008-05-09

lispparser on hackage

Ever wanted a LISP S-expressions parser?

I have. I do some natural language processing work, where some people like to output parse trees as S-expressions. Very natural. But then I always balk because I have to go whip up a little parser for it, which I know to be easy in principle, but... well, you know how that goes.

Anyway, if you're at my level of programming mediocrity, the one where "write an S-expressions parser" makes you think "I know this is easy, but do I have to?", then perhaps the lispparser package is for you! I guess this is too minor a package to warrant a mailing list announcement, but I've taken a bit of Jonathan Tang's tutorial code and put it on hackage as lispparser. If you think it needs improvement, I might consider putting a darcs2 repository online somewhere.


2008-05-05

lingscore

A little bit work-related. In a mail that I'm about to send out to the Corpora mailing list:
We're looking for implementations of scoring algorithms for coference resolution. Specifically, the algorithms we are interested in are MUC-6 (Vilain et al., 1995), B-CUBED (Bagga and Baldwin, 1998), and CEAF (Luo, 2005).

Our hope is to compare a few pieces of coference resolution software. Does anybody have preferably standalone software that we could use to calculate these scores?


I am sorely tempted to just sit down for a few moments and create these (scoring) tools myself.

It'd be a small Haskell package called 'lingscore', probably a library and an executable. I'd stick the scorers under the 'NLP.Evaluation' package. The library would be dedicated to NLP evaluation algorithms. No actual NLP, just the scoring algorithms for evaluation campaigns. Should not be difficult, and would very slightly advance the agenda of making Haskell a viable platform for NLP-hacking.

I quite like the idea of using Haskell for the stupid reason that type signatures make it a bit clearer what kind of inputs we're expecting and what kind of outputs we can produce.


2008-04-28

short day (reg 5h)

Edit 23:16 - Argh! This post was meant for koweynlg, my daily work diary. Not so much for public consumption (like koweycode), but public to keep me honest. Sorry for the noise (and the meta noise).

Got here at 10h, leaving at 18h; various tea and cake breaks in between. Almost done with the REG stuff. Got my tri-text framework setup (more and more scripts, some of them throwaway). I am very grateful for the automation, am thanking past-eric (of 1 week ago) for making the results file generator. Trying to do this by hand [what was I thinking] would have suicide.

Back to reading Grosz and Sidner.

More REG stuff tomorrow. Not done yet.


native speakers wanted

Sorry, this is a little off-topic for koweycode, but we would be interested to have some native speakers participating in a little experiment.

All you have to do is to read some encyclopaedic texts and plug some stuff into drop-down-boxes along the way. There are NO WRONG ANSWERS. We want to see your answers and to learn from them. It only takes a few minutes (but you can do as much of it as you want).

We are now running the second instalment of an experiment in which we ask people to select referential expressions (REs) that refer to the main subject in the context of a simple encyclopaedic text. The idea is to investigate to what extent people agree when choosing REs. The experiment is designed as a multiple choice task, where REs can be selected from a menu. The texts are short and can usually be done in under a minute.

We would like to ask any native speakers of English who have a few minutes to spare to help us complete the experiment. It would be great if participants could do at least three texts, but you can do as many as you like.

There is more information on the experiment website. To participate, simply read the instructions and then click on the 'start experiment' button at the bottom of the page:

http://www.nltg.brighton.ac.uk/home/Anja.Belz/CMSR

Any non-native speakers who would like to try out the experiment can do so at this alternative website:

http://www.nltg.brighton.ac.uk/home/Anja.Belz/TESTDRIVE

We would be very grateful for any feedback, comments and suggestions.
Many thanks for your time,

Anja Belz


Regretfully, this has not been coded up in Haskell. But surely that is forgivable :-)


2008-04-11

darcs 2 at last!

I'm sure you've all seen David's announcement: darcs 2.0.0 is out!

what's good

In short, darcs 2.0 is safer and faster.

Particularly, the dreaded exponential-time merge bug has now been largely resolved. Let me say it more carefully: while it may still be possible to run into exponential time merges, our improvements to conflict-handling should make it considerably less common. We hope that nobody ever runs into such a situation in practice.

Other key points are improved the hashed inventory and pristine cache which darcs more robust (you no longer have to worry about third party tools like Eclipse or Unison messing things up by mucking around with darcs internals), the ssh-connection mode which speeds up SSH-issues a lot and kills the typing-your-password-ten-million-times issue dead (at most you'll have to type it in twice).

what's bad

On the one hand, darcs 2.0.0 should be much smoother and faster for most users. On the other hand, people with large repositories (e.g. GHC-sized) might find certain operations to be somewhat slower. David does not (yet) have ideas on how to make things better for such users, and is even recommending them to switch to something else. If you've got a repository of darcs' size (over 5000 patches, 6 years, 131 contributors) or smaller, you should continue using darcs, because we still think it works better: we're still the only ones around to offer deep cherry picking... something which we think would be hard to do without radically changing the way other VCSes work. If you would like to prove us wrong, please do so and we would be most grateful!

Also, taking advantage of darcs 2 will require you to upgrade your repository to the darcs-2 format (see darcs convert), which unfortunately, is not compatible with older versions of darcs. People with new repositories should definitely start using this format. People with old repositories should probably do so at the earliest convenient moment, although this means your users will have to upgrade. Please switch to the new format. It will make everybody's lives easier.

The final piece of bad news: we're going to have to shift to a lighter weight development model, something which puts less strain on David and the rest of the contributors. The consequences are that patches might get less review [one maintainer and not two], and that you'll be seeing less of David on the mailing lists. The good news in the bad news is that our lighter weight development model is now being supported by increased automation of the administrative stuff. For example, our bug tracker is now integrated with the darcs repository so that it automatically knows when a ticket has been resolved by a patch. This increased automation gives us extra rigour and more time to think about making darcs better. The only thing we need is more of us. If you want a place to hone your Haskell, Perl or C... or if you think you know a thing or two about user interfaces, please spend some time with us.

to sum up...

Have you been hesitating to try darcs out? Well, now is a good time to do so, as our killer bugs have been fixed as well as the kind of minor nuisances that get most of us. Or... are you looking for something to work on? Uncle David needs you!

[note: Thanks to David Roundy for comments on a draft of this post]


2008-04-02

wxHaskell 0.10.3 is out!

As you may have noticed in Jeremy's announcement, wxHaskell 0.10.3 is now available for download. This version offers the following key improvements over 0.9.4:
  • Support for Unicode builds of wxWidgets
  • Support for wxWidgets 2.6.x (support for wxWidgets 2.4.2 retained if
    you compile from source)
  • Support for building with GHC 6.6.x and 6.8.x
  • Parts of wxHaskell are now built with Cabal
  • Profiling support
  • Smaller generated binary sizes (using --split-objs)
At the moment, wxHaskell works with wxWidgets 2.6 (the previous stable release, still widely available). But we're working to get you wxWidgets 2.8 support as soon as we can.

In the meantime, here's a screenshot of the wxFruit paddle-ball example running on my Mac. (Thanks to shelarcy for putting up the package!)


This was painlessly installed with the help of cabal-install. You don't have to use cabal-install -- the wxHaskell sourceforge site includes binaries -- but it may make your life easier for dealing with dependencies. The only caveat you might want to know about when using Cabal to install wxHaskell and its companion libraries are that wxcore should be installed as root, i.e. sudo cabal install wxcore.


2008-03-15

XTC on hackage

Just a quick note to say that XTC (XTC: eXtended & Typed Controls for wxHaskell) is available on hackage.

the haddock


The XTC library provides a typed interface to several wxHaskell controls.
  • radio view (typed radio box)
  • single-selection list view (typed single-selection list box)
  • multiple-selection list view (typed multiple-selection list box)
  • choice view (typed choice box)
  • value entry (typed text entry)
XTC controls keep track of typed values and items, rather than being string based. Selections in XTC controls consist of actual values instead of indices.

my notes

XTC library was developed in Utrecht University, and has been used to develop Dazzle, a Bayesian Network toolbox, and very much a "real world" application. You can read more about XTC and Dazzle in their Haskell Workshop paper from 2005.

If you're using wxhaskell, XTC could make your code a bit cleaner, without imposing a steep learning curve. Here is a quick example of the library in action.




And here is the source code. Notice how we work directly with the Fruit type, eschewing any intermediary strings:
import Graphics.UI.WX
import Graphics.UI.XTC

data Fruit = Apple | Banana | Orange deriving Show

instance Labeled Fruit where
toLabel = show

main :: IO ()
main = start $
do f <- frame []
txt <- staticText f [ text := "pick a fruit and I will give you a slogan" ]
radioV <- mkRadioView f Vertical [Apple, Banana, Orange] []
--
set radioV [ on select :=
do mf <- get radioV typedSelection
set txt [ text := slogan mf ]]
set f [ layout := margin 5 $ column 1
[ hfill $ widget txt, widget radioV ] ]

slogan :: Fruit -> String
slogan Orange = "orange you glad I didn't say 'orange'?"
slogan Apple = "an apple a day keeps, well you know"
slogan Banana = "buh-naaaaa-naaa"


If you like this kind of thing, be sure to also check out AutoForms and Phooey


2008-03-12

wxhaskell 0.10.3rc1

In case you missed Jeremy's announcement, the first release candidate for wxhaskell 0.10.3 is now ready for download.

Highlights of 0.10.3 rc1 include:

  • Support for Unicode builds of wxWidgets
  • Support for wxWidgets 2.6.x (support for wxWidgets 2.4.2 retained if
    you compile from source)
  • Support for building with GHC 6.6.x and 6.8.x
  • Parts of wxHaskell are now built with Cabal
  • Profiling support
  • Smaller generated binary sizes (using --split-objs)

See Jeremy's message for a more complete list.

Note that we have postponed the goal of supporting wxWidgets 2.8. We are definitely making this a priority for wxhaskell 0.11. In the meantime, we will focus on making wxhaskell 0.10.3 easy for everyone to install. So please let us know if you run into any trouble. (Many thanks to Neil Mitchell and other users who have helped in testing)

Finally, users of MacOS X Leopard have reported difficulty building the older wxWidgets 2.6. According to the wxWidgets wiki page for more details), doing so is indeed possible, if you use the Tiger SDK. We would love to hear from you if you have succeeded in doing this.


2008-02-28

operation Roundy Tears

I have a very serious issue that I'd like to raise with the darcs and Haskell communities: you're not being evil enough.

Darcs2 is getting closer and closer to completion (I am not saying this in any official capacity), but you've all been pretty complacent about making it hurt. Sure, some of you have done performance testing, for which thanks, and yes, some of you have thrown in a couple of conflict related tests. But the closest we have ever come is a
darcs: src/Darcs/Patch/Real.lhs:422:21-50: Irrefutable pattern failed for
pattern Data.Maybe.Just a2'


And that was fixed within a week.

This isn't good enough. Be more evil! Submit tests to our bugs/ directory. Think of devious conflicting ways to make darcs fall down.

Make... David... cry...

Please.

[note: the best way to participate in Operation Roundy Tears is to use the --darcs-2 format; you can get a darcs2 repository from a darcs1 repository by using darcs convert]


2008-02-24

wxhaskell components and news

So, I made this little diagram showing the basic components of wxhaskell. It might be not entirely correct, but I hope it will be useful for anybody who wants to help out.



Also, as you might have noticed, wxhaskell is now on hackage as an experimental pre-release. Let me know if you have any trouble building it, or getting it to run sample applications. There's still a few painful bits, (1) you still have to use wxWidgets 2.6 and not the newer 2.8 [we're working on it] (2) that it assumes your wxWidgets is compiled with --enable-mediactrl (this should be relatively easy for us to fix and (3) for Linux, wxcore 0.10.2 requires that you configure wxhaskell --with-opengl (the darcs version fixes this) and (4) for Windows... well I don't know; shelarcy can build it just fine using Visual Studio, I think and the darcs version of cabal should now be happier with our Cabal files.

In other news, we're now much more disciplined about using the wiki to note problems installing wxhaskell and to propose solutions. We're also now paying closer attention to the bug tracker (triaging them), and have made it much easier for users to submit bug demonstrators (see our bugs/ directory). I hope these new habits will make us more responsive.

So we're not really ready for an official release, but we're getting closer. I'm hoping we get there sooner rather than later. I want to see more people playing with neat tools like Phooey and Autoforms, both of which are on hackage.


2008-02-22

maybench underway

The maybench project (formerly checkquick) is now underway. Maybench is a tool for comparing the performance between two versions of the same program, on a series of benchmarks that you design. Maybench aims to be easy to use, almost as easy as running time your-program arg1..arg2. Ideally, it should be very straightforward for outsiders to write timing tests for your programming project and contribute them as part of your performance testing suite.

We have a Google Code page, a mailing list and a darcs repository:

darcs get http://code.haskell.org/maybench

The repository basically consists of the preliminary code written by the tehgeekmeister and also some new code by ertai (see darcs-benchmark). I'm also hoping that some of the code written for nobench can be used, for example, to generate fancy reports. Right now, running maybench looks like this:
% dist/build/maybench/maybench 'sleep 5' 'sleep 3'
"sleep 3" took 60.0% of the time "sleep 5" took.

As a first priority, we're going to get maybench useable for benchmarking darcs. After that we'll start thinking of how to generalise it, so that it can be used for the Haskell benchmark suite, for example, or for your software.

Interested? Come join us!


2008-02-18

darcs circa 2003

There's always room for improvement.



Archaeological findings courtesy of Zooko on issue687.

Darcs exercise: can you find the original patch in the darcs darcs repository where this logo was introduced? My hope is that the darcs UI is friendly enough that you just say "duh, of course!"


2008-02-08

checkquick

Darcs needs a benchmarking tool. After 'stopwatcH', this is the first name I came up with, horrible as it is. I was thinking that surely, this is a general problem, so we should throw something up on hackage. The basic wish is to be able to run a program N times (comparing it with a different version of the same program).

Can you help? If you've got code to share, put it up! If not, and you want to contribute to project, comment on this blog. If nobody gets moving [and gives me a better name] I guess I'll create a 'checkquick' project, probably using code.google.com and code.haskell.org. If nobody submits anything and I really do have to start this project (note: I do not want to), I am hoping for a liberal commit model, where we pretty much hand out push rights to anybody who wants them.

I've already a little description written up, just in case:
Checkquick is a tool for comparing the performance between two versions of the same program, on a series of benchmarks that you design.

Checkquick aims to be easy to use, almost as easy as running 'time your-program arg1..arg2'. Ideally, it should be easy for outsiders to write timing tests for your programming project and contribute them as part of your performance testing suite.

It is written in Haskell and named after the illustrious, though wholly unrelated, quickcheck.


Blog Archive