Hobby-hacking Eric

2008-05-13

recurring problem (boring text file merging)

I keep solving variations of this problem at work, whether I'm trying to merge some log files together, or identify token offsets with bits of parse tree. I had better jot it down so that I don't forget there may be something more general hidden behind all this.

mergeFoo :: [a] -> [(Int,Int,b)] -> [Either a ([a],b)]




I'm not necessarily looking for a solution -- I could just boil one out from my previous solutions -- but I am at least officially and publicly reminding myself that I shouldn't keep solving the same thing over and over again (unless I'm engaged in some kind of lateral thinking exercise, which is a different story)


2008-05-09

lispparser on hackage

Ever wanted a LISP S-expressions parser?

I have. I do some natural language processing work, where some people like to output parse trees as S-expressions. Very natural. But then I always balk because I have to go whip up a little parser for it, which I know to be easy in principle, but... well, you know how that goes.

Anyway, if you're at my level of programming mediocrity, the one where "write an S-expressions parser" makes you think "I know this is easy, but do I have to?", then perhaps the lispparser package is for you! I guess this is too minor a package to warrant a mailing list announcement, but I've taken a bit of Jonathan Tang's tutorial code and put it on hackage as lispparser. If you think it needs improvement, I might consider putting a darcs2 repository online somewhere.


2008-05-05

lingscore

A little bit work-related. In a mail that I'm about to send out to the Corpora mailing list:
We're looking for implementations of scoring algorithms for coference resolution. Specifically, the algorithms we are interested in are MUC-6 (Vilain et al., 1995), B-CUBED (Bagga and Baldwin, 1998), and CEAF (Luo, 2005).

Our hope is to compare a few pieces of coference resolution software. Does anybody have preferably standalone software that we could use to calculate these scores?


I am sorely tempted to just sit down for a few moments and create these (scoring) tools myself.

It'd be a small Haskell package called 'lingscore', probably a library and an executable. I'd stick the scorers under the 'NLP.Evaluation' package. The library would be dedicated to NLP evaluation algorithms. No actual NLP, just the scoring algorithms for evaluation campaigns. Should not be difficult, and would very slightly advance the agenda of making Haskell a viable platform for NLP-hacking.

I quite like the idea of using Haskell for the stupid reason that type signatures make it a bit clearer what kind of inputs we're expecting and what kind of outputs we can produce.