koweycode

2007-11-06

PhD viva (defense)

I'll be presenting my thesis next Wednesday. You are cordially invited to attend the defence and to join us for drinks afterwards.

Surface realisation: ambiguity and determinism
Eric Kow

14 November 2007
Amphi C, LORIA
Nancy, France

Surface realisation is a subtask of natural language generation. It may be viewed as the inverse of parsing, that is, given a grammar and a representation of meaning, the surface realiser produces a natural language string that is associated by the grammar to the input meaning. This thesis presents three extensions to GenI, a realisation algorithm for Feature-Based Tree Adjoining Grammar (FB-LTAG).

The first extension improves the efficiency of the realiser with respect to lexical ambiguity. It is an adaptation from parsing of the "electrostatic tagging" optimisation, in which lexical items are associated with a set of polarities, and combinations of those items with non-neutral polarities are filtered out.

The second extension deals with the number of outputs returned by the realiser. Normally, the GenI algorithm returns all of the sentences associated with the input logical form. Whilst these inputs can be seen as having the same core meaning, they often convey subtle distinctions in emphasis or style. It is important for generation systems to be able to control these extra factors. Here, we show how the input specification can be augmented with annotations that provide for the fine-grained control that is required. The extension builds off the fact that the FB-LTAG grammar used by the generator was constructed from a "metagrammar", explicitly putting to use the linguistic generalisations that are encoded within.

The final extension provides a means for the realiser to act as a metagrammar-debugging environment. Mistakes in the metagrammar can have widespread consequences for the grammar. Since the realiser can output all strings associated with a semantic input, it can be used to find out what these mistakes are, and crucially, their precise location in the metagrammar.

I'll be presenting my thesis next Wednesday. You are cordially invited to attend the defence and to join us for drinks afterwards.

2007-10-29

darcs conflicts faq

Pekka Pessi and I have put together an FAQ on the darcs conflict problem. It has three major sections:

Everyday conflicts: what conflicts are and how you should deal with them in the general case.
The big conflicts bug: what it is, how you can avoid it, and what to do if you have run into trouble.
Darcs 2.0: from a user perspective, how it will change from darcs 1.x (not very much), and what resolving conflicts might look like (rollback will play a bigger role). David has been making some great progress. Nevertheless, our best estimates are that we will be ready for alpha testing only in Feb 2008, and release by the second quarter of that year.

There may be some mistakes or gaps in the FAQ, but I hope that you will find it useful.

2007-10-24

getting things done with mutt 3 (action counter)

Here's a short tip for putting a reminder on your desktop wallpaper of how many next-action and waiting-for messages you have. The result looks like this:

ACTION:  4
WAITING: 1

The approach is to combine a small shell script with the wonderful GeekTool program for MacOS X (similar programs exist for Linux, I'm sure). GeekTool lets you display arbitrary text, say console text on your Desktop. One of my GeekTool windows is set to run the script and display its results with a big red font.

The shell script is pretty dumb; it just counts the number of files in my Maildir directory. I guess if you're using the mbox format, you'll have to find some other way to count messages. Count the number of instances of '^From: ', maybe?

#!/bin/sh

# $2 is just for padding
function count_messages (){
  echo -n "$1:$2"; ls -1 ${HOME}/Mail/current/$1/cur | wc -l
}

count_messages ACTION " "
count_messages WAITING

I'm not expecting this to have a huge impact on my productivity, but maybe this will be that extra little bit that counts.

Edit 2007-10-25Fixed counting to account for empty boxes. Zero. We like that number.

Here's a short tip for putting a reminder on your desktop wallpaper of how many next-action and waiting-for messages you have. The result looks like this:

ACTION:  4
WAITING: 1

#!/bin/sh

# $2 is just for padding
function count_messages (){
  echo -n "$1:$2"; ls -1 ${HOME}/Mail/current/$1/cur | wc -l
}

count_messages ACTION " "
count_messages WAITING

2007-09-21

kowthese 1.0

I thought I might announce version 1.0 of my PhD thesis: Surface realisation: ambiguity and determinism

This is the version that went out to the committee, so it is not definitive (you may prefer to wait for version 1.1 in December). It's also not very Haskell-related, aside from the fact that the software inside is written in Haskell. If you are interested, the overall topic area is computational linguistics (natural language processing), specifically, natural language generation.

Oh, and the darcs repository for the thesis (LaTeX sources):

 darcs get http://www.loria.fr/~kow/kowthese

The viva/defense to be held on 2007-11-14.

 darcs get http://www.loria.fr/~kow/kowthese

The viva/defense to be held on 2007-11-14.

2007-08-25

getting things done with mutt 2 (auto-review)

Last year, I posted some ideas for applying GTD to mutt. As you may have heard, GTD is a simple methodology to help people stay on top of things. The basic idea in my post was to use an on-board X-Label editor to associate each message with a 'next action' or a 'waiting for' tag, and to store them in the respective mailboxes ACTION and WAITING. This was accompanied by a small colour configuration to highlight the X-Label field of each message, thus making it clear at all times what the next action was.

After one year of use, I can report that I am 95% happy with the system. It is simple and effective... however, not entirely eric-proof. In this post, I propose a small addition to tighten up the system and make it more resistant to my foolishness. The problem is that GTD is a heavily review-oriented system. Once you move tasks out of your head and onto some external device (e.g. a pad of paper), you must also consult that device from time to time or risk forgetting to do them. For example, one thing that can easily happen to me is that I will move messages into ACTION and WAITING and simply forget they are there.

This is where the idea of automated review comes in. What I propose a simple method for reminding yourself that you have next actions to perform, or things that you are waiting on. It consists of a shell script and a crontab entry. First the script (I call it gtd-review):


#!/bin/sh

# note: I am using the maildir format; if you are using mbox, 
# you should just replace the 'find $1 | xargs $1' with 'cat $1'
function summarise () {
  find $1 | xargs cat |\
    sed -n -e '/^X-Label/G' -e '/^X-Label/p'\
           -e '/^From/p'    -e '/^Subject/p' -e '/^Date/p' |\
    sed -e 's/From: //' -e 's/Date: //' -e 's/Subject: //'
}

echo '======================================================================'
echo 'ACTIONS'
echo '======================================================================'
echo
summarise ${HOME}/Mail/ACTION

echo '======================================================================'
echo 'WAITING'
echo '======================================================================'
echo
summarise ${HOME}/Mail/WAITING

And now the crontab:

@daily gtd-review 2>&1 /dev/null | mail -s "GTD review `date +%Y-%m-%d`" me@myaddress.com

I'm sure you could improve on this. For example, I would rather the dates were presented in yyyy-mm-dd format and accompanied with a friendlier description like "3 weeks ago"... but working on that would probably count as fidgeting.

Anyway, I hope others find this to be useful.


#!/bin/sh

# note: I am using the maildir format; if you are using mbox, 
# you should just replace the 'find $1 | xargs $1' with 'cat $1'
function summarise () {
  find $1 | xargs cat |\
    sed -n -e '/^X-Label/G' -e '/^X-Label/p'\
           -e '/^From/p'    -e '/^Subject/p' -e '/^Date/p' |\
    sed -e 's/From: //' -e 's/Date: //' -e 's/Subject: //'
}

echo '======================================================================'
echo 'ACTIONS'
echo '======================================================================'
echo
summarise ${HOME}/Mail/ACTION

echo '======================================================================'
echo 'WAITING'
echo '======================================================================'
echo
summarise ${HOME}/Mail/WAITING

And now the crontab:

@daily gtd-review 2>&1 /dev/null | mail -s "GTD review `date +%Y-%m-%d`" me@myaddress.com

2007-08-16

a history of monad tutorials

Here's a historical overview of monad tutorials since Phil Wadler's original observation that monads can be implemented in Haskell and become extremely useful.

When I wrote this, I originally wanted to do a real history, with an analysis of how people have tried to teach monads over the years, but I guess this is about all I have time for. Dates, authors and blurbs. Corrections/additions always welcome! As you can tell, I have not read them all.

Edit 2007-08-17: I have updated and moved this timeline to Haskellwiki. This might be useful when some future Haskell archeologist tries to figure out the precise "ah-ha!" moment when every single programmer in the world 'got' monads.

2007-08-14

Haskell 是一门函数式编程语言。

Looks like some wikibookian(s) have embarked upon a Chinese translation of the Haskell wikibook. Good luck to them! Chinese speakers might be interested in jumping on board. As for non Chinese speakers, the sentence above is "Haskell is a functional programming language."