A Comparative Study of Machine Learning Algorithms using Feature Selection Methods for Movie Review Analysis

Rajwinder Kaur*, Prince Verma**
* M.Tech Scholar, Department of Computer Science and Engineering, CT Institute of Engineering Management and Technology, Punjab, India.
** Assistant Professor, Department of Computer Science and Engineering, CT Institute of Engineering Management and Technology, Punjab, India.
Periodicity:January - March'2017
DOI : https://doi.org/10.26634/jse.11.3.13620


Nowadays, the analysis of social sites, such as movie reviews’ sites, facebook, news feeds, and online shopping sites has been a broad area of research and customers post a large number of reviews in the form of comments to reveal their feelings as well as opinions as positive, negative, or neutral about a particular movie, product, pictures etc. To predict the reviews of users of such websites is a complex decision making process. These types of sites help the people to take decision about products. This paper proposes a Random Forest classifier with Information Gain based feature selection method for classification of movie review datasets. The results show that Information Gain method with Random Forest classifier has better performance in terms of Accuracy, Precision, and Recall.


Sentiment Analysis, Feature Selection, SVM, Random Forest, Evaluation Measures.

How to Cite this Article?

Kaur, R., and Verma, P. (2017). A Comparative Study of Machine Learning Algorithms using Feature Selection Methods for Movie Review Analysis. i-manager’s Journal on Software Engineering, 11(3), 1-9. https://doi.org/10.26634/jse.11.3.13620


[1]. Ahmad, A. S., Hassan, M. Y., Abdullah, M. P., Rahman, H. A., Hussin, F., Abdullah, H., & Saidur, R. (2014). A review on applications of ANN and SVM for building electrical energy consumption forecasting. Renewable and Sustainable Energy Reviews, 33, 102-109.
[2]. Batrinca, B., & Treleaven, P. C. (2015). Social media analytics: A survey of techniques, tools and platforms. AI & Society, 30(1), 89-116.
[3]. Bhojani, S. H., & Bhatt, N. (2016). Data Mining Techniques and Trends - A Review. Global Journal For Research Analysis, 5(5), 252-254.
[4]. Chikersal, P., Poria, S., & Cambria, E. (2015, June). SeNTU: Sentiment Analysis of Tweets by combining a Rulebased Classifier with Supervised Learning. In SemEval@ NAACL-HLT (pp. 647-651).
[5]. Choubey, D. K., & Paul, S. (2016). Classification techniques for diagnosis of diabetes: review. International Journal of Biomedical Engineering and Technology, 21(1), 15-39.
[6]. Chu, C. H., Wang, C. A., Chang, Y. C., Wu, Y. W., Hsieh, Y. L., & Hsu, W. L. (2016, November). Sentiment analysis on Chinese movie review with distributed keyword vector representation. In Technologies and Applications of Artificial Intelligence (TAAI), 2016 Conference on (pp. 84- 89). IEEE.
[7]. Giatsoglou, M., Vozalis, M. G., Diamantaras, K., Vakali, A., Sarigiannidis, G., & Chatzisavvas, K. C. (2017). Sentiment analysis leveraging emotions and word embeddings. Expert Systems with Applications, 69, 214- 224.
[8]. Gupta, P., Sharma, A., & Grover, J. (2016, September). Rating based mechanism to contrast abnormal posts on movies reviews using MapReduce paradigm. In Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2016 5th International Conference on (pp. 262-266). IEEE.
[9]. Hernández-Pereira, E., Bolón-Canedo, V., Sánchez- Maroño, N., Álvarez-Estévez, D., Moret-Bonillo, V., & Alonso-Betanzos, A. (2016). A comparison of performance of K-complex classification methods using feature selection. Information Sciences, 328, 1-14.
[10]. Kannvdiya, M., Patidar, K., & Kushwaha, R. S. (2016). A Survey on: Different Techniques and Features of Data Classification. International Journal of Research in Computer Applications and Robotics, 4(6), 1-6.
[11]. Kaur, S., & Grewal, A. K. (2016). A Review paper on Data Mining Classification Techniques for Detection of Lung Cancer. International Research Journal of Engineering and Technology (IRJET), 3(11), 1334-1338.
[12]. Khan, F. H., Qamar, U., & Bashir, S. (2016). SentiMI: Introducing point-wise mutual information with SentiWordNet to improve sentiment polarity detection. Applied Soft Computing, 39, 140-153.
[13]. Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications–A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311.
[14]. Modha, J. S., Pandi, G. S., & Modha, S. J. (2013). Automatic sentiment analysis for unstructured data. International Journal of Advanced Research in Computer Science and Software Engineering, 3(12), 91- 97.
[15]. Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S., & Coello, C. A. C. (2014). A survey of multiobjective evolutionary algorithms for data mining: Part I. IEEE Transactions on Evolutionary Computation, 18(1), 4-19.
[16]. Nehra, N. (2014). A Survey on Sentiment Analysis of Movie Reviews. International Journal of Innovative Research In Technology (IJIRT), 1(7), 36-40.
[17]. Raghuvanshi, N., & Patil, J. M. (2016, March). A brief review on sentiment analysis. In Electrical, Electronics, and Optimization Techniques (ICEEOT), International Conference on (pp. 2827-2831). IEEE.
[18]. Sahin, H., & Subasi, A. (2015). Classification of the cardiotocogram data for anticipation of fetal risks using machine learning techniques. Applied Soft Computing, 33, 231-238.
[19]. Sahu, T. P., & Ahuja, S. (2016, January). Sentiment analysis of movie reviews: A study on feature selection & classification algorithms. In Microelectronics, Computing and Communications (MicroCom), 2016 International Conference on (pp. 1-6). IEEE.
[20]. Sharma, P., & Mishra, N. (2016, October). Feature level sentiment analysis on movie reviews. In Next Generation Computing Technologies (NGCT), 2016 2nd International Conference on (pp. 306-311). IEEE.
[21]. Singh, V. K., Piryani, R., Uddin, A., & Waila, P. (2013, March). Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In Automation,Computing , Communication, Control and Compressed Sensing (iMac4s), 2013 International Multi-Conference on (pp. 712-717). IEEE.
[22]. Teng, Z., Vo, D. T., & Zhang, Y. (2016). Context- Sensitive Lexicon Features for Neural Sentiment Analysis. In EMNLP (pp. 1629-1638).
[23]. Tripathy, A., Agrawal, A., & Rath, S. K. (2016). Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications, 57, 117-126.
[24]. Vaghela, V. B., & Jadav, B. M. (2016). Analysis of Various Sentiment Classification Techniques. Analysis, 140(3), 22-27.
[25]. Yao, D., Yang, J., & Zhan, X. (2013). An improved random forest algorithm for class-imbalanced data classification and its application in PAD risk factors analysis. The Open Electrical & Electronic Engineering Journal, 7, 62-70.
[26]. You, Y. S., Lee, S., & Kim, J. (2016, October). Design and development of visualization tool for movie review and sentiment analysis. In Proceedings of the Sixth International Conference on Emerging Databases: Technologies, Applications, and Theory (pp. 117-123). ACM.

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

If you have access to this article please login to view the article or kindly login to purchase the article
Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.