Grammar and style checker for the Luxembourgish language

Over the weekend I started writing what will be the first ever developed grammar and style checker for the Luxembourgish language. Unfortunately exams at university start in about a month, so I won’t really be able to continue working on it till the beginning of August. Read on for a little more about what I’ve done already and what I planned to do.

I already completed a sentence and word tokenizer that splits a text into sentences and words, considering special cases such as dates or abbreviations (which contain period characters, so not necessarily every period character ends a sentence). Besides that, I wrote a prototype for a part of speech tagger (which attributes grammatical information to the tokens returned from the tokenizer) and I implemented a few basic grammar and style rules.

As it seems, Luxembourgish is particularly easy to check with computer software, so I hopefully will be able to create a rock solid product that not only helps native speakers but also allows non-native speakers to better understand the peculiarities of the Luxembourgish language.

The license of the program will probably be the LGPL. It will be written in Java so that it can easily have multiple interfaces like a simple text-only interface, an advanced graphical interface, an interface integrated into OpenOffice.org as well as a web interface using Java Server Pages.

Screenshot of the “proof of concept”-like program
Screenshot of the “proof of concept”-like program

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Facebook
  • Google
  • Slashdot
  • Technorati
  • StumbleUpon