Hobby-hacking Eric

2008-07-30

simple random numbers in Haskell

Random numbers are the kind of thing I use rarely enough that by the time I want to use them, I have forgotten the relevant details, but frequently enough that I get annoyed whenever it happens.

Hopefully these notes will be useful to somebody in a similar situation.

two things to know

(1) import System.Random

(2) randomIO :: Random a => IO a


The one function you really need to know about is randomIO (The type of this function is Random a => IO a. Don't worry if you do not understand the type; it suffices to know that it involves IO). In this example, we use and generate a random Int:
import System.Random

main =
do r <- randomIO
print (r + 1 :: Int)
-- Note re the ':: Int' above: Haskell can't figure out from
-- the context exactly what type of number you want, so we
-- constrain it to Int
One neat feature is that you can randomly generate anything that implements the Random typeclass. In the example below, we generate a random Bool. Notice how we do not do anything differently, except to treat the result as a bool (i.e. by applying not to it)
import System.Random

main =
do r <- randomIO
print (not r)


A useful exercise, if you know about typeclasses, is to implement Random for one of your own types. The toEnum function may be useful.

more advanced stuff

  1. you can use randomRIO :: Random a => (a,a) -> IO a to generate random numbers constrained within a range
  2. Instead of using the functions randomIO and randomRIO, you can separate obtaining a random number generator, from using the generator. Doing so allows you to minimise your reliance on the IO monad. It also makes your code easier to debug, because you can opt to always pass the same generator to it and make life much more predictable. See the functions random and randomR for details.
  3. a potentially handy trick is to generate an infinite list of random numbers, which you can then pass to a function. See the randoms function for details.


Edit: fixed s/randomR/randomIO/


2008-07-27

pandoc gets mediawiki support

Pandoc is a universal document converter. You feed it documents in one format (say, HTML) and it spits them out in another one (say, ODF). Assuming it works correctly, Pandoc has the potential to replace all those little one-to-one convertors (e.g. latex2html) in my toolbox. Just the one simple Pandoc.

And now, thanks to fiddlosopher (John MacFarlane?), it knows how to write Mediawiki files! Mediawiki? That's the syntax/software that powers Wikipedia, Wikibooks, and a whole slew of organisational or community wikis (like HaskellWiki).

Hey, Haskellers probably have a lot of LaTeX documents lying around. Maybe this is their chance to get them on Haskell wiki?

We're halfway to being able to do a roundtrip between LaTeX and Mediawiki! All we need is for somebody [maybe John :-D] to implement a Mediawiki reader for Pandoc and things could get mighty interesting... Oh and yes, and if anybody is working on a wiki with direct LaTeX support, hats off to you! Sometimes Mediawiki is a fact of life, though.


2008-07-23

rose zipper on hackage

I guess this isn't big enough to go on the haskell@ mailing list: I have uploaded Krasimir Angelov and Iavor S. Diatchki's Data.Tree implementation of zippers onto hackage. The package is called rosezipper and it is available under the BSD3 license.

For the interested, "The Zipper is an idiom that uses the idea of “context” to the means of manipulating locations in a data structure." (Haskell wiki).

For me, zippers are just a very nice way to navigate and edit trees. By "nice", I mean elegant, efficient and purely functional. Before learning about zippers, I only knew how to navigate trees from top to bottom, but if I wanted to go back up a node, or visit a sibling node, I basically had to start over from the root. Zippers allow me to walk the tree in any direction, visiting a node's parent, children and siblings without starting over from the top. This kind of thing is especially handy for Natural Language Processing people, basically, anybody who eats trees for a living.

If you would like to learn more, I would recommend Apfelmus's very friendly tutorial (part of the Haskell wikibook).

Thanks to Krasimir and Iavor for implementing this and for allowing me to package it up.


2008-07-22

encodings-aware hex editor

Here's another coding-project idea: I would like to see a hex editor that knows how to display characters in other encodings than ASCII (specifically: I want to debug messed up UTF-8 text files).

Google and apt-cache search reveal no such editor, at least not in the free/open-source worlds, nowhere in Linux or MacOS X freeware land. On Debian based systems, there are a couple that handle some Japanese encodings, but nothing that deals with UTF-8.

Likely features:
  • toggle between an ASCII-only mode and a show-as-UTF-8 mode
  • good UI for the fact that UTF-8 characters have a variable length in bytes
  • graceful handling of encoding errors


Haskellers could possibly do this as a part (plugin?) of Yi, or maybe just a completely standalone product.

And if you want a slightly simpler project, a UTF-8 hex dumper would be good. Hmmph... come to think of it, maybe it would have been more productive to just go write that instead of this blog post.

Edit: Well, I went ahead and made a stupid little dumper for my needs. Here is the output on some sample corrupted UTF-8
20 28 5b 47 65 6f 72 67 69 61 6e                     ([Georgian
3a 20 e183a1 e183 3f e183a5 e183 : ს«e1 83»?ქ«e1 83»
3f e183 20 e18397 e18395 e18394 e1839a e183 ?«e1 83» თველ«e1 83»
3f 5d 0a ?]
20 28 5b 47 65 72 6d 61 6e 3a 20                     ([German:
44 65 75 74 73 63 68 6c 61 6e 64 Deutschland
5d 20 5b 49 50 41 3a 20 cb88 64 c994 ] [IPA: ˈdɔ
c9aa 74 ca83 6c 61 6e 74 5d 29 2c 20 ɪtʃlant]),
6f 66 66 69 63 69 61 6c 6c 79 20 officially
74 68 65 20 46 65 64 65 72 61 6c the Federal
20 52 65 70 75 62 6c 69 63 20 6f Republic o
66 20 47 65 72 6d 61 6e 79 20 28 f Germany (
42 75 6e 64 65 73 72 65 70 75 62 Bundesrepub
6c 69 6b 20 44 65 75 74 73 63 68 lik Deutsch
6c 61 6e 64 2c 20 5b 49 50 41 3a land, [IPA:
20 cb88 62 ca8a 6e 64 c999 73 72 65 70 ˈbʊndəsrep
75 62 6c 69 cb 3f 6b 20 cb88 64 ubli«cb»?k ˈd
c994 c9aa 74 ca83 6c 61 6e 74 5d 29 2c ɔɪtʃlant]),
20 69 73 20 61 20 63 6f 75 6e 74 is a count
72 79 20 69 6e 20 43 65 6e 74 72 ry in Centr
61 6c 20 45 75 72 6f 70 65 2e 20 al Europe.
0a
Highlighting by hand. I should probably go figure out how to colourise the corrupted characters. Or maybe I should just go ahead and package this, put it up on hackage? Make it available via darcs? I would need a decent name. So far, I have hexy-xxy and hexdump-utf8 neither of which are that great :-/


2008-07-21

simply reading and writing UTF-8 in Haskell

A year and a half ago, I posted what seemed to be the simplest recipe for reading and writing UTF-8 in Haskell. In this post, I will provide an even simpler recipe, made possible by Eric Mertens' utf8-string package.

For those who are not familiar with Haskell, its internal representation for characters is Unicode, but for IO it effectively assumes that that it is reading and writing in the ISO8859-1 format. This used to be annoying for those of us who wanted to work with the UTF-8 encoding, but now there is a very simple solution, perfect for those of us who don't want to think too much and just get the job done.

the example


The sample problem from my last post was to take a UTF-8 encoded file as input, reverse all its lines, writing the results in the same file, with a ".rev" extension appended to its name. The solution might be self-explanatory if you are used to Haskell, but I will make some minor comments below, just in case.

import System.IO.UTF8
import Prelude hiding (readFile, writeFile)
import System.Environment (getArgs)

main =
do args <- getArgs
mapM_ reverseUTF8File args

reverseUTF8File f =
do c <- readFile f
writeFile (f ++ ".rev") $ reverseLines c

reverseLines = unlines . map reverse . lines

In the above code, we use some drop-in replacements for some System.IO functions. Some of these functions are also provided in the Prelude, so we must hide them so that they do not overlap with what we import. (Alternatively, we could import the UTF-8 ones qualified, which could be handy in contexts where we want the option of reading and writing in UTF-8 without committing to it). The rest is straightforward. Notice that we do not jump through any hoops whatsoever. In fact, you can pretty much take any pre-existing Haskell program that you have written and turn it into a UTF-8 version by changing the import statements.

Here are the results of running this script on a UTF-8 sampler:
)udrU( یتوہ ںیہن فیلکت ےھجم روا ںوہ اتکس اھک چناک ںیم 
)othsaP( يوږوخ هن ام هغه ،مش ېلړوخ هشيش هز
)naeroK(요아않 지프아 도래그 .요어있 수 을먹 를리유 는나
)keerG( .ατοπίτ ωθάπ αν ςίρωχ άιλαυγ ανέμσαπσ ωάφ αν ώροπΜ
)cidnalecI / aksnelsÍ( .gim aðiem ða sseþ ná relg ðite teg gÉ
)hsiloP( .izdokzs ein im i ,ołkzs ćśej ęgoM
)nainamoR( .etșenăr ăm un ae iș ălcits cnânăm ăs toP
)nainiarkU( .ьтидокшоп ен інем онов й ,олкш итсї ужом Я
)nainemrA( ։րենըչ տսիգնահնա իծնի և լետւո իկապա մանրԿ
)naigroeG( .ავიკტმ არა ად მაჭვ სანიმ
)idniH( .तह हन डप ईक स सउ झम ,ह तकस ख चक म
)werbeH( .יל קיזמ אל הזו תיכוכז לוכאל לוכי ינא
)hsiddiY( .ײװ טשינ רימ טוט סע ןוא זאלג ןסע ןעק ךיא
)cibarA( .ينملؤي ل اذه و جاجزلا لكأ ىلع رداق انأ
)esenapaJ( 。んせまけつ傷を私はれそ。すまれらべ食をスラ
)iahT( บจเนฉหใำทมไนมตแ ดไกจะรกนกนฉ
)slobmys ycnerruc( ₯·₮·₭·₫·₪·₩·₨·₧·₦·₥·₤·₣·₢·₡·¢·$·€·£·¥


The utf8-string package is available on HackageDB. Thanks to Eric M. for providing this little wrapper! It's a perfect example of the kind of thing which seems obvious... after somebody else has thought to do it.