WHAT IS AI. WHAT IT CAN DO

Artificial Intelligence means different things to different people. But it is all about making machines intelligent. From recommendation systems in e-commerce sites to translation systems from google to self driving cars run on some of AI either partially or entirely. For us, it is a means to understand language. Specifically machine learning methods for natural languages.

Speech to Text

It can convert speech to text so that you can record your voice and send a text message or write an essay.

Text to Speech

It can generate speech from the text so that you read your favorite books while traveling on a bus.

Opical Character Recognition

It can look at an image and convert it to text.

Translation

It can translate between a number of languages.

AI tries to solve various kinds of problems. What we are doing is just one piece of it, and it is called NLP.

What people are saying !

Data is the new Oil

- Andrew Ng

AI is the new electricity.

- Andrew Ng

WHAT IS NLP

Natural Language Processing is the field in AI that deals with problems arising in areas where systems have to process and/or respond to languages used by humans. Simply NLP is how we make computers understand language and interact with us in natural language.

But language is a complex beast. Nobody understands how we humans understand language. How can we make computers understand it when we do not know how we do?

To mitigate this paradox and make progress in baby steps, NLP researchers have come up with a list of simple tasks to solve before we solve language in general. Text classification, translation and named entity recognition are few examples.

WHAT IS MACHINE LEARNING

Imagine a playground where children are playing a game, which you have never played or have seen. Just by observing them for a while, you can understand the rules of the game.

WHY DATA

Machine learning research and its applications have become a vital part of our everyday life. These systems are not programmed the way software used to be. They are trained on data, they learn on their own by reading a large amount of data.

WHY OPEN DATA

Existing datasets for Indian languages are not available for public or very tedious to obtain. They are behind the walled gardens of corporations and universities. Even the few who gains access to the dataset, are not allowed to share with fellow researchers. We aim to collect datasets that focus on regional indian languages and keep them in the public domain for everyone forever. There are hundreds of regional language and numerous dialects in India. Availability of datasets in the open domain encourages and helps you, I and everyone to learn and build AI applications. This will help everyone to focus on their actual work, easing access to data and prevent redundant data collection.

exisiting datasets for Indian languages are either not available for public access

WHERE ARE WE RIGHT NOW

Our project, vaaku2vec model reads the news articles again and again and tries to make out the sense of the words. It does this not just by reading the words, but by understanding in which order they appear, which words appear together and other patterns. The rules of how words can be strung together. The rules of language are understood intuitively by humans, for example, the sentence

"I live in Tambaram and _______ to Adambakkam for work"

the blank can be filled completed in by anyone of using commute, bike, drive etc. Such a task would be herculian for normal systems/algorithms.

The Team

IndicNLP

AI for Indian languages