Skip to main content

Posts

Showing posts from April, 2014

AnaGram-Data Structure

An anagram  is a rearrangement of the letters in either a word or a phrase (using each letter exactly once in the word or phrase created).  Ideally the anagram created relates in some (perhaps humorous) way to the original word or phrase. Such anagrams are described as   cognate . The best anagrams are grammatically correct and use techniques such as abbreviating   and  to   'n'   only minimally. I have Developed an anagram program in VS 2010 using vc++, this program read one input string at a time and produce all anagrams of that string. Input string should include all unique chars not repeated one like "caat" . I have Developed this program using stack and queue data structure.  This Program will give you basic idea of anagram, you may modify it as per your requirement. #include #include #include #include #define MAX 10 using namespace std; typedef struct { char arr[20]; in...

Urdu Stemmer - Rule Based

Urdu Stemmer-Rule Based Stemming is the process in which inflected words are reduced to find stem or root. There are various inflected words that can be reduced to stem. e.g. In English language :  1) Act can have inflected words like actor, acted, acting etc. 2) Words like fishing, fished and fisher can be reduced to root word fish. Similarly in Urdu various possibilities have been identified and rules have been developed  appropriate : Inflected Word         Root Word  ںایکڑل                            یکڑل ںایتسب                            یتسب ںایڑاگ                            یڑاگ ںیباتک                            باتک ےلیم   ...

Font identifier and Unicode converter for Hindi

Font identifier and Unicode converter for Hindi Fonts are used to represent text in document. Fonts are mainly two kind non-Unicode and Unicode fonts. Complex scripts like Hindi and other Asian languages well represented in Unicode fonts. There are some other ways to write these languages for e.g we can use ASCII/ISCII codes to represent different characters of Hindi, but there are large numbers of characters in Hindi script as compared to English. Therefore, we always need multiple ASCII/ISCII encoded characters combination to represent a single character of Hindi Script. One major problem in these ASCII encoding based fonts is that we cannot easily transfer text from one system to another. The system must have these text fonts. There is hundreds of ASCII/ISCII encoding based fonts which are used to write Hindi text. New software systems are based on Unicode fonts.                   ...

Urdu Named Entity Recognition(NER) / Named Entity Recognition System for Urdu

Named Entity Recognition System for Urdu Named Entity Recognition (NER) is a task which helps in finding out Persons name, Location names, Brand names, Abbreviations, Date, Time etc and classifies them into predefined different categories. NER plays a major role in various Natural Language Processing (NLP) fields like Information Extraction, Machine Translations and Question Answering.We have used the Rule Based approach and developed the various rules to extract the Named Entities in the given Urdu text.  So, accurate working of NER system is very important. NER system can be used for one's personal interest like company manager wants to know all the names involved in specific text document.  Approaches to NER 1 Rule Based approach: Rules are developed to identify NE in text. This approach takes much time in development and one should have good knowledge of target language. Heuristic based rules are used to identify tags and these rules are language spec...

Tries Data Structure

TRIES (Data Structure) ·         The trie (pronounced ``try'' and derived from the word re trie val) also called prefix tree (as they can be searched by prefixes), for a set of strings S is an ordered tree such that: o    Each node but the root is labeled with a character o    The children of a node are alphabetically ordered ·         Each node has R children, one for each possible character. ·          All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. ·         Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest. Example: car, card, carry, cart, cat, cel, celery, close, closely, closet, clue Applications of Tries ...