Skip to main content

Hindi to Punjabi Machine Translation System

The Hindi To Punjabi Machine Translation System has been developed using Direct/Rule based Approach by Dr.Vishal Goyal and Dr. G.S Lehal. Various large size Lexicon resources  have been used to map Source and Target language words. 

In general, if the two languages are structurally similar, in particular as regards lexical correspondences, morphology and word order, the case for abstract syntactic analysis seems less convincing. Since the present research work deals with a pair of closely related language, so the direct translation system is the obvious choice. The overall system architecture shown below, is adopted for Hindi to Punjabi Machine Translation System. The system is divided into three stages: Preprocessing, Translation Engine, and Post Processing stage. Following is the description of various steps of this architecture. 

PreProcessing  
The pre-processing stage is a collection of operations that are applied on input 
data to make it processable by the translation engine. In our current work, we 
have performed following pre-processing steps: 
  • Text Normalization 
  • Replacing Collocations 
  • Replacing Proper Nouns 


Translation Engine 
The translation engine is responsible for translation of each token obtained 
from the previous step. It uses various lexical resources for finding the match 
of a given token in target language. Following is the description of how a 
token is passed through various modules.  
  • Analyzing the word for Translation /Transliteration: The token obtained in the previous stage is passed through various stages.
    • Identifying Titles:  The token is checked whether it is a title like प्रो(prō), श्रीभती(shrīmtī) etc. If the current token is found to be a title, then the token next to it, should be  transliterated instead of translation. 
    • Identifying Surnames: The token is checked whether it is a surname like अग्रवार (agrvāl), ओफेयॉम (ōbērāy ) etc. If the current token is found to be a surname, then the token previous to it, should be transliterated instead of translation.
    • Lexicon Lookup: If the token does not satisfy above two steps, then it is looked into the lexicon for a match for direct word to word translation.  
    • Resolving Ambiguity: If the token is not present in the lexicon for direct translation, it is looked into the database of ambiguous words. If this token is found to be ambiguous, then dis-ambiguity is resolved with the help of n-gram language modeling. The system uses bigram and trigram databases, which contains one and two words respectively in the vicinity of an ambiguous word and corresponding meaning for that particular context.
    • Unknown Words: If all the above modules fail to analyze the token, it is considered to be foreign/unknown word. Such words first pass through the morphological analysis phase based on the rules for inflections in Hindi words. Morphological generator generates the transliterated word using the inflectional rules and then checks the generated word in the Punjabi uni-grams database for its genuinity. If this new generated word is found in the Punjabi uni-grams, it is considered for translation otherwise the token is sent to transliteration module for transliteration. Transliteration Module is the major module in the system that uses various rules specifically designed from the translation point of view. 


Post Processing 
After converting all the source text to target text, there are some of the grammatical errors that need to be corrected. For this purpose, we have formulated the rules for correcting the grammatical errors. Such rules have been implemented using Regular expressions and Pattern matching. This Post Processing phase is responsible for correcting grammatical errors in the generated output. 


GUI Features of Systems
  • Text translation from Hindi to Punjabi 
  • Text transliteration from Hindi to Punjabi 
  • Translating  Websites 
  • Sending Email in Punjabi Language originally written in Hindi language. 

The system has been rigorously evaluated and its accuracy has been found to be 94% on the basis of intelligibility test and 90.84% on the basis of accuracy test. 

Architecture of Hindi To Punjabi Machine Translation System

System is freely abaliable to use. Web Link to Access Machine Translation System: h2p.learnpunjabi.org

Comments

  1. This very informative blog to use programming and experience is very helpful to develop the translation systemscholarship essay writing help

    ReplyDelete

Post a Comment

Popular posts from this blog

Font identifier and Unicode converter for Hindi

Font identifier and Unicode converter for Hindi Fonts are used to represent text in document. Fonts are mainly two kind non-Unicode and Unicode fonts. Complex scripts like Hindi and other Asian languages well represented in Unicode fonts. There are some other ways to write these languages for e.g we can use ASCII/ISCII codes to represent different characters of Hindi, but there are large numbers of characters in Hindi script as compared to English. Therefore, we always need multiple ASCII/ISCII encoded characters combination to represent a single character of Hindi Script. One major problem in these ASCII encoding based fonts is that we cannot easily transfer text from one system to another. The system must have these text fonts. There is hundreds of ASCII/ISCII encoding based fonts which are used to write Hindi text. New software systems are based on Unicode fonts.                   ...

Binary Search Tree in ASP .Net

Binary Search Tree in ASP .Net To create Binary Search Tree(BST) in Asp.net application   first you need to create a Node class. Something like following : class Node {     public String data;     public int freq = 0;     public Node left, right;     public Node()     { }     public Node( String data)     {         this .data = data;         left = null ;         right = null ;     } } Next You need to create a class including different functions.Like class BinaryTreeImp {     Node root;     String outputfreq = "" ;     static int count = 0;     public BinaryTreeImp()     {      ...