Urdu Named Entity Recognition(NER) / Named Entity Recognition System for Urdu

Named Entity Recognition System for Urdu

Named Entity Recognition (NER) is a task which helps in finding out Persons name, Location names, Brand names, Abbreviations, Date, Time etc and classifies them into predefined different categories. NER plays a major role in various Natural Language Processing (NLP) fields like Information Extraction, Machine Translations and Question Answering.We have used the Rule Based approach and developed the various rules to extract the Named Entities in the given Urdu text. So, accurate working of NER system is very important. NER system can be used for one's personal interest like company manager wants to know all the names involved in specific text document.

Approaches to NER

1 Rule Based approach: Rules are developed to identify NE in text. This approach takes much time in development and one should have good knowledge of target language. Heuristic based rules are used to identify tags and these rules are language specific. Good rules always yield good results. Development of these kinds of systems is always a time consuming task.

2 Statistical approach: Statistical approach is also known as Machine Learning approach. This is a fast way to develop a NER system. The system is trained using annotated training data set in specified format. Accuracy of statistical approach is dependent upon the training data. So, we always train the system with a large set of annotated data. Various Machine Learning models like HMM, CRF, MaxEnt, are used for NER system.

3 Hybrid system: Hybrid system is combination of Rule Based approach and Statistical approach. To develop the Hybrid system we use Statistical tools as well as linguistic rules. Combinations of both approaches make a system more accurate and efficient.

We have used Rule Based Approach:

Rule Based approach is time consuming task to develop any NER system. Rule based approach is used only when you know the target language well and have sufficient knowledge about the linguistic rules like knowledge of grammar. The system developed using Rule Based approach always yields the good results. On the another hand, Statistical approach which provide us with many Statistical tools, to develop NER system like HMM, CRF, SVM, MaxEnt etc, with the help of these tools development process of the system is rapid as compared to Rule Based approach.

To know more about this system, please follow my Research paper published in Coling 2012.

http://aclweb.org/anthology/C/C12/C12-1153.pdf

I developed this system in VS 2010 ASP.NET C#, it free to use, please check it out and give me your valuable feed back.

http://h2p.learnpunjabi.org/uner/uner.aspx

UMR-Blogs

Search This Blog

Urdu Named Entity Recognition(NER) / Named Entity Recognition System for Urdu

Labels

Comments

Post a Comment

Popular posts from this blog

Font identifier and Unicode converter for Hindi

Hindi to Punjabi Machine Translation System

Binary Search Tree in ASP .Net