Python

Spellcorrector


18th of April 2007

Spellcorrector being used on my not-yet-released web app I think a lot of Python people have seen Peter Novig's beautiful article about How to Write a Spelling Corrector. So have I and couldn't wait to write my own little version of it to fit my needs.

The changes I added were:

If you're still reading at this point it's quite likely that you're a coder so you'll prefer code to see how it works:

 >>> from spellcorrector import Spellcorrector
 >>> sc = Spellcorrector('en')
 >>> sc.correct('caracter')
 u'character'
 >>> sc.correct(u'caracter')
 u'character'
 >>> sc.suggestions(u'caracter')
 [u'character']
 >>> sc.suggestions(u'spell')
 [u'smell', u'shell', u'sell', u'spell', u'swell', u'spill', u'spells']
 >>> sc.suggestions(u'spel')
 [u'spell', u'sped']
 >>> sc.suggestions(u'spel', detailed=True)
 [{'count': 9, 'percentage': 90.0, 'word': u'spell'}, \
 {'count': 1, 'percentage': 10.0, 'word': u'sped'}]
 >>> # Physics database usage example
 ... 
 >>> sc.correct('Planck')
 u'black'
 >>> sc.correct('Curie')
 u'sure'
 >>> sc.train(['Planck','Curie','Einstein','Heisenberg'])
 >>> sc.correct('Planck')
 u'planck'
 >>> sc.correct('curie')
 u'curie'
 >>> sc.save('Physicist_words.txt')
 >>> del sc
 >>> file('Physicist_words.txt').read()
 'planck\ncurie\neinstein\nheisenberg'

A lot more can probably be done to improve it but it works quite well as a foundation to an application that mimics Google's "Did you mean: ..." feature.

I've actually already implemented this on a search feature of a not-yet-launched website for art. Since the art site contains non-English names like "Corneille", "Doucet" or "Belartio" I had to train my spellcorrector for that particular application so that a perfectly fine search for "attentif" didn't become "Did you mean: _attentive_".

I'll blog more about that application once I get it up and running on a public domain.

To take this early code experiment for a spin download: spellcorrector-0.1.2.tar.bz2 (6.7Mb) spellcorrector-0.1.4.tar.bz2 (6.7Mb) spellcorrector-0.1.5.tar.bz2 (6.7Mb)



Comment

Show all 2 comments
 

Commenting is currently disabled in Mobile version