The Road Less Taken

Computational Linguistics


Personal assistants like Siri, SVoice, Google Now & tools like Google Translate and other sentiment analysers are blurring the line between the human-machine interactions as the machines delve deeper to understand the human language intricacies, more intelligently than ever.

Natural languages precisely refer to the ones human beings use to communicate with each other. It is the language a cognitively normal human infant, whose development has been through use rather than by prescription, is able to learn. Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the natural language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition.

Computational linguists build systems that can perform tasks such as speech recognition (e.g., Siri), speech synthesis, machine translation (e.g., Google Translate), grammar checking, text mining etc.

Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Efforts to translate between human languages require understanding the grammar of both languages, including both morphology (the grammar of word forms) and syntax (the grammar of sentence structure). In order to understand syntax, one had to also understand the semantics and the lexicon (or 'vocabulary'), and even to understand something of the pragmatics of language use. This redirects to one of the most significant problems in processing natural language- ambiguity. For instance, in “I saw the man in the park with the telescope.”, it is unclear whether I, the man, or the park has the telescope. If you are told by a fire inspector, “There's a pile of inflammable trash next to your car. You are going to have to get rid of it.”, whether the word 'it' is interpreted as referring to the pile of trash or to the car will result in dramatic differences in the action taken. Ambiguities like these are pervasive in spoken utterances and written texts. Most ambiguities escape our notice because we are very good at resolving them using our knowledge of the world and of the context. But computer systems do not have much knowledge of the world and do not do a good job of making use of the context. This is where various Natural Language Processing (NLP) researchers model algorithms to parse the human language expressions and extract anaphoric relations depending on the language lexicon. These algorithms further lay the foundation for building speech recognition and sentiment analysis engines.

Modelling mechanisms to comprehend natural language on a computer is done by implementing various algorithms proposed by the academia using tools like ScalaNLP, Snowball, OpenNLP, Stanford Parser etc., depending on their scripting language, which usually is one of C++, Python or Java.

In India, the institutes offer CL course either under the Department of Linguistics or under the Department of Computer Science. Courses like Post M.A. Diploma in Linguistics/Advanced Diploma in Applied Linguistics, M.Phil. in Linguistics, M. S. in Computational Linguistics etc. prepare the student for R&D jobs and higher research in the field. Universities like University of Delhi (DU), Indian Institute of Information Technology, Hyderabad (IIIT-H), JawaharLal Nehru University, Delhi (JNU) are a few of the schools offering such specialisation courses in India. Abroad, almost all the major universities offer a minor and a major in cognitive science; University of Texas at Austin, Johns Hopkins University, University of California, Berkeley and University of South Florida to name a few notable ones.

With respect to industries, companies seeking NLP experts require the linguists to possess skills such as knowledge of foreign language(s), platform-relevant computer scripting/programming skills, speech recognition, prior experience in a similar implementation etc. Graduates or postgraduates with experience in NLP, data mining, working with text analytics and information retrieval, working with unstructured data are hired as Natural Language Processing Analysts. In the academia, understanding of language modeling, machine learning, phrase structure parsing serves as a starting platform to pursue research and model more accurate algorithms for NLP, preferably inclusive of some machine learning algorithm for the system to evolve with use. Frederick Jelinek is a notable pioneer in the field for his “Probabilistic Information Theory: Discrete and memoryless models” and his leadership role in IBM’s effort to solve the general dictation problem during the 1970s. Sanjeev Menon, CTO of the trending app Light, a Google Now’s Indian rival, developed the answering engine built on NLP (Natural Language Processing), machine learning and man-machine hybrid technologies. Adam Cheyer is a co-founder of Siri and formerly a director of engineering in the iPhone group at Apple. Prior to Siri, he was a computer scientist and project director in SRI International's Artificial Intelligence Center, where he was the Chief Architect on the CALO project.

Siri Vs Cortana How Siri response when asked about Cortana

In recent years, the demand for Computational Linguists has risen with the increase of language technology products in the Internet. Job offers come from developers improving Internet search engines with linguistic means, or facilitating the user interface with lingubots or integrating speech recognition with language processing techniques. There are strong open-source development projects as well. Companies like Microsoft, IBM, Google, Cycorp, Comverse, LingSoft, Sony, Samsung and multiple research labs under Universities and Governments hire computational linguists to work in various aspects of speech technology - corpus development, language modeling, scripting and programming, phonetic transcription, grammar checking, and development of lexical resources. These companies may also hire linguists for localization of products for sale in other countries. The salary offered is as per the industry standards (16- 24 LPA) for freshers. Applicants with Masters degree are eligible for positions of R&D and higher experience, as project leaders and managers. Though fairly young, this field is expanding fast, both in terms of research and prototyping. It has potential of being the-next-big-thing seamlessly integrated as a virtual companion and assistant in our lives in years to come.