Urdu Stemmer-Rule Based
various inflected words that can be reduced to stem.
e.g. In English language :
1) Act can have inflected words like actor, acted, acting etc.
2) Words like fishing, fished and fisher can be reduced to root word fish.
Similarly in Urdu various possibilities have been identified and rules have been developed
appropriate :
Inflected Word Root Word
ںایکڑل یکڑل
ںایتسب یتسب
ںایڑاگ یڑاگ
ںیباتک باتک
ےلیم لایم
Approaches
Stemming algorithms are classified under three categories- Rule Based, Statistical and Hybrid.
1) Rule Based approach - This approach applies a set of transformation rules to inflected words
in order to cut prefixes or suffixes.
E.g. if the word ends in 'ed', remove the 'ed'.
2) Statistical approach - The major drawback of Rule Based approach is that it is dependent on
database. Statistical algorithms overcome this problem by finding distributions of root elements
in a database. There is no need to maintain the database.
3) Hybrid approach - It is combination of both Affix removal and Statistical approach.
Stemming is useful in Natural Language Processing problems like search engine, word
processing problems and information retrieval. In this stemmer we have applied Rule Based
Approach in which we apply rules on various possibilities of inflected words to remove suffixes
or prefixes. In Urdu, the only stemmer available to us is Assas-Band developed by NUCES,
Pakistan which maintains an Affix Exception List and works according to the algorithm to
remove inflections.
For More details you may read our research paper.
To Test or Use our Urdu Stemmer please fallow this link.

sir please healp me how i can connect with a server and how i can make urdu dictionary. This is my project and i am making it. sir please
ReplyDelete