Challenges in Sentiment Analysis of Marathi Text

Asmita Dhokrat*  C. Namrata Mahender**
*-**Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, MS, India.

Abstract

Social media is growing tremendously from last few years, and people are using social networking sites like Facebook, Instagram, Twitter etc. for sharing their opinions and their emotions for any social issue like CAA, Delhi rape case, elections etc. For expressing their views they use their native language for communication and the reason why lot for data is available for particular languages like Hindi, Marathi, Tamil, Telugu etc., and lot for work has been done on many Indian languages except Marathi. So in this paper we have discussed about Marathi Sentiment analysis and its challenges for data collection.

Keywords :

Introduction

Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics and text analytics to identify and extract sentiments from text or sentences. Sentiment analysis is nothing but to determine or express the attitude, judgment or emotion of speaker or a writer on a particular or any topic given. Sentiment analysis helps in recognizing the emotional tone, identifies the subject and generally measures in term of polarity of the statement i.e. positive, negative or neutral.

Sentiment analysis is “to find the human behavior, what human can think, what is his reaction, the way of expression, emotions, feelings on any general topic”.

Sentiment analysis is nothing but finding people's opinion and is also called as Opinion Mining. Opinion Mining (OM) is an area of Text Mining that has recently received a lot of attention due to the amount of opinion information that resides in web documents. It concerns the identification of opinions in a text and their classification as positive, negative or neutral. Opinion identification is more difficult than the topic-based one and it cannot be based on just observing the presence of single words. More sophisticated methods need to be employed in order to differentiate between the subjective and objective opinion of a reviewer or between the objective description of a movie and references to other people's comments (Chauchat et al., 2008).

1. Data Collection for Sentiment Analysis

For sentiment analysis the data is mostly collected through online or offline media and its polarity is calculated, which helps in finding sentiment on a topic or concept.

There are two methods for collection of data i.e. online (Social media) and offline which are briefly discussed below:

1.1 Online

On social media, social networking sites are available for collection of data like Facebook, Twitter, Whatsapp, Instagram for chatting, etc. where people share their emotions, their updates, pictures etc. Also it is used to find contacts, present their views, reviews on particular product before buying, express their feelings, opinion, emotion on any regional, national topics like CAA, Disha Case, Nirbhaya Case, PMO, Surgical Strike etc., and also write their comments on any video or any social content.

Because of all these reasons huge amount of online data is available to find sentiment or opinion of people which can be useful in many areas like product marketing, getting product review, political influence, trend analysis and many more.

1.2 Offline

For offline data, we can prepare general objective-based questions based on any burning issues going on and ask people to give answers. With the help of this questionnaire, we can find the sentiment or opinion of people.

2. Previous Work on SA for Indian Languages

Fourteen earlier papers have been studied to understand the Sentiment Analysis on Indian Languages. Table 1 presents the work on Sentiment Analysis for Indian Languages.

Table 1. Work on SA for Indian Languages

3. Need for Sentiment Analysis for Indian Languages

Here we have noticed that for communication people prefer their native language and that's the reason lot of lingual data is available for languages like Hindi, Bengali, Malayalam, Telugu, and Kannada. This paper tries to address Sentiment analysis for Non English languages, especially Marathi.

In Maharashtra, people talk in Marathi language and whatever online discussion is going on in Maharashtra is in Marathi language, and not only in the state, but Marathi people from any states also use Marathi language. Each and every Marathi feel proud while talking in Marathi, and that's the reason lots of Marathi data is available online for research.

But analysing past reviews, there is less amount of work done in Marathi language, and hence Marathi language has been chosen for this work.

Marathi language is in it’s preliminary stage, and our focus is extending research in Marathi language. Thus this paper discusses the process of sentiment analysis on Marathi corpus.

4. Proposed System

After collection of data, preprocessing is done, where the data is cleaned by removing special symbols, extra space, extra dots, and also remove some garbage value. Figure 1 shows the flow of our proposed system.

Figure 1. Work on SA for Indian Languages

4.1 Data Collection

As present no standard corpora is available for Marathi. The first stage is online data collection on various posts for a specific domain (burning issues) like Nirbhaya. Kopardi case 200 posts are collected, and on CAA there are 110 posts and for PMO (Narendra Modi and election) 100s posts were collected. All posts were in Marathi Language.

4.2 Data Cleaning

In this we remove unwanted characters, break attached words, correct spelling mistakes, grammar correction, and remove human expressions like lol, hahaha, mu etc.

4.3 Feature Extraction

Here we remove stop words, punctuation marks etc., and extract the exact words from data for processing.

4.4 POS Tagging

A special tag is assigned to each word in a text corpus to indicate parts of speech and other grammatical categories such as tense, numbers etc.

4.5 Sentiment Classification

Identifying opinions in text, and label them positive, negative and neutral, based on data collected from post.

4.6 Identify Polarity

Identify the sentence on the basis of polarity like positive and negative.

4.7 Result

Finally produce the results of the study.

5. Challenges in Marathi Sentiment Analysis

Following are some challenges we faced while collecting post for our data collection.

Conclusion

This paper presents the work done on sentiment analysis of various Indian Languages. Here according to previous study, lot of work has been done in languages like Hindi, Telugu, Malayalam etc., except Marathi language. On social media, people are talking in their native languages and so huge data is available in various languages. Marathi speakers are using Marathi language for talking online, and sharing their views, opinions, comments etc. So large amount of data is available in Marathi language as well, but very little work has been done in Marathi language. That's the reason lot of scope is there to work with Marathi Language. Here in this paper we discussed the proposed model for our research work and briefly explained the steps of our proposed model. There are lots of challenges we faced while collecting the data, and in future we are trying to overcome these challenges and are trying to enlarge our dataset.

References

[1]. Chauchat, A. S. J. H., Eric, L., Lumière, U., & Mendèsfrance, P. (2008). Opinion mining issues and agreement identification in forum texts. In Workshop, Data Mining Opinions (FODOP'08) Conjunction with the conference INFORSID (pp. 51-58).
[2]. Chaudhari, C. V., Khaire, A. V., Murtadak, R. R., & Sirsulla, K. S. (2017). Sentiment Analysis in Marathi using Marathi WordNet. Imp J Interdiscip Res, 3(4), 1253-1256.
[3]. Deshmukh, S. Patil, N., Rotiwar, S., & Nunes, S. (2017). Sentiment Analysis of Marathi Language. International Journal of Research Publications in Engineering and Technology, 3(6), 93-97.
[4]. Garapati, A., Bora, N., Balla, H., & Sai, M. (2019). SentiPhraseNet: An extended SentiWordNet approach for Telugu sentiment analysis. International Journal of Advance Research, Ideas and Innovations in Technology, 5(2), 433-436.
[5]. Ghosal, T., Das, S. K., & Bhattacharjee, S. (2015, December). Sentiment analysis on (bengali horoscope) corpus. In 2015, Annual IEEE India Conference (INDICON) (pp. 1-6). IEEE. https://doi.org/10.1109/INDICON.2015.744 3551
[6]. Hasan, K. A., & Rahman, M. (2014, December). Sentiment detection from bangla text using contextual th valency analysis. In 2014, 17 International Conference on Computer and Information Technology (ICCIT) (pp. 292- 295). IEEE. https://doi.org/10.1109/ICCITechn.2014.70731 51
[7]. Hegde, Y., & Padma, S. K. (2015, June). Sentiment analysis for Kannada using mobile product reviews: a case study. In 2015, IEEE International Advance Computing Conference (IACC) (pp. 822-827). IEEE. https://doi.org/10. 1109/IADCC.2015.7154821
[8]. Jha, V., Manjunath, N., Shenoy, P. D., Venugopal, K. R., & Patnaik, L. M. (2015, July). Homs: Hindi opinion mining nd system. In 2015, IEEE 2 International Conference on Recent Trends in Information Systems (ReTIS) (pp. 366-371). IEEE. https://doi.org/10.1109/ReTIS.2015.7232906
[9]. Mukku, S. S., & Mamidi, R. (2017, September). Actsa: Annotated corpus for telugu sentiment analysis. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems (pp. 54-58).
[10]. Mukku, S. S., Choudhary, N., & Mamidi, R. (2016). Enhanced Sentiment Classification of Telugu Text using ML Techniques. In 25th International Joint Conference on Artificial Intelligence, 29-34.
[11]. Nair, D. S., Jayan, J. P., Rajeev, R. R., & Sherly, E. (2014, September). SentiMa-sentiment extraction for Malayalam. In 2014, International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1719-1723). IEEE. https://doi.org/10.1109/ICACCI.20 14.6968548
[12]. Parupalli, S., Rao, V. A., & Mamidi, R. (2018). BCSAT: A benchmark corpus for sentiment analysis in telugu using word-level annotations. In 56th Annual Meeting of the Association for Computational Linguistics, ACL.
[13]. Pawar, S. V., & Mali, S. (2017). Sentiment Analysis in Marathi Language. International Journal on Recent and Innovation Trends in Computing and Communication, 5(8), 21-25.
[14]. Pratibha, G., Hegde, N., Reddy, Ch. A., & Maneesh, D. (2017). Parsing Sentiment in Telugu Language Sentences. International Journal of Creative Research Thoughts (IJCRT), 99-102.
[15]. Sitaram, D., Murthy, S., Ray, D., Sharma, D., & Dhar, K. (2015, July). Sentiment analysis of mixed language employing Hindi-English code switching. In 2015, International Conference on Machine Learning and Cybernetics (ICMLC) (Vol. 1, pp. 271-276). IEEE. https://doi. org/10.1109/ICMLC.2015.7340934