The Hindi To Punjabi Machine Translation System has been developed using Direct/Rule based Approach by Dr.Vishal Goyal and Dr. G.S Lehal. Various large size Lexicon resources have been used to map Source and Target language words.
In general, if the two languages are structurally similar, in particular as regards lexical correspondences, morphology and word order, the case for abstract syntactic analysis seems less convincing. Since the present research work deals with a pair of closely related language, so the direct translation system is the obvious choice. The overall system architecture shown below, is adopted for Hindi to Punjabi Machine Translation System. The system is divided into three stages: Preprocessing, Translation Engine, and Post Processing stage. Following is the description of various steps of this architecture.
PreProcessing
The pre-processing stage is a collection of operations that are applied on input
data to make it processable by the translation engine. In our current work, we
have performed following pre-processing steps:
- Text Normalization
- Replacing Collocations
- Replacing Proper Nouns
Translation Engine
The translation engine is responsible for translation of each token obtained
from the previous step. It uses various lexical resources for finding the match
of a given token in target language. Following is the description of how a
token is passed through various modules.
- Analyzing the word for Translation /Transliteration: The token obtained in the previous stage is passed through various stages.
- Identifying Titles: The token is checked whether it is a title like प्रो(prō), श्रीभती(shrīmtī) etc. If the current token is found to be a title, then the token next to it, should be transliterated instead of translation.
- Identifying Surnames: The token is checked whether it is a surname like अग्रवार (agrvāl), ओफेयॉम (ōbērāy ) etc. If the current token is found to be a surname, then the token previous to it, should be transliterated instead of translation.
- Lexicon Lookup: If the token does not satisfy above two steps, then it is looked into the lexicon for a match for direct word to word translation.
- Resolving Ambiguity: If the token is not present in the lexicon for direct translation, it is looked into the database of ambiguous words. If this token is found to be ambiguous, then dis-ambiguity is resolved with the help of n-gram language modeling. The system uses bigram and trigram databases, which contains one and two words respectively in the vicinity of an ambiguous word and corresponding meaning for that particular context.
- Unknown Words: If all the above modules fail to analyze the token, it is considered to be foreign/unknown word. Such words first pass through the morphological analysis phase based on the rules for inflections in Hindi words. Morphological generator generates the transliterated word using the inflectional rules and then checks the generated word in the Punjabi uni-grams database for its genuinity. If this new generated word is found in the Punjabi uni-grams, it is considered for translation otherwise the token is sent to transliteration module for transliteration. Transliteration Module is the major module in the system that uses various rules specifically designed from the translation point of view.
Post Processing
After converting all the source text to target text, there are some of the grammatical errors that need to be corrected. For this purpose, we have formulated the rules for correcting the grammatical errors. Such rules have been implemented using Regular expressions and Pattern matching. This Post Processing phase is responsible for correcting grammatical errors in the generated output.
GUI Features of Systems
- Text translation from Hindi to Punjabi
- Text transliteration from Hindi to Punjabi
- Translating Websites
- Sending Email in Punjabi Language originally written in Hindi language.
The system has been rigorously evaluated and its accuracy has been found to be 94% on the basis of intelligibility test and 90.84% on the basis of accuracy test.
Architecture of Hindi To Punjabi Machine Translation System
System is freely abaliable to use. Web Link to Access Machine Translation System: h2p.learnpunjabi.org

This very informative blog to use programming and experience is very helpful to develop the translation systemscholarship essay writing help
ReplyDelete