Towards a New Lexicon-Based Features Vector for Sentiment Analysis: Application to Moroccan Arabic Tweets

The International Conference on Information, Communication & Cybersecurity

Published on November 10, 2021 by Moncef Garouani and Jamal Kharroubi

DOI: 10.1007/978-3-030-91738-8_7

Abstract

The emergence of the Web 2.0 technology generated a huge amount of raw data by enabling Internet users to post their opinions and reviews on the web. This data plays an important role in decision making for many peoples and organizations. An example of valuable insights that can be extracted from user’s posts is their opinions and sentiments regarding topics, events, services, products, etc. The English language has been the subject of extensive research on sentiment analysis. The proposed solutions are largely dominated by the use of two main analysis approaches based on machine learning techniques and the lexical approach. This work focuses on the second one to analyze the sentiments expressed in Moroccan tweets written in Arabic language : Standard Arabic (SA) and Moroccan Dialect (MD), and proposes a new method for extracting characteristics and representing data. The main idea of this method is to represent the text as a weight vector of feelings. Due to the lack of resources (databases and lexicon dictionaries) for the Arabic language, especially for the Moroccan one, this work starts with the construction of a corpus of 18.000 valid tweets based on 36 114 collected tweets that are manually tagged and classified as MD or SA. Then describes the steps of the construction of the Moroccan Senti-lexicon, a dictionary of 30.000 words labeled as positive, negative or neutral. The results of this study prove to be superior to those obtained by other comparable state of the art approaches.

Citation

@incollection{Garouani2022,
  doi = {10.1007/978-3-030-91738-8_7},
  url = {https://doi.org/10.1007/978-3-030-91738-8_7},
  year = {2022},
  publisher = {Springer International Publishing},
  pages = {67--76},
  author = {Moncef Garouani and Jamal Kharroubi},
  title = {Towards a~New Lexicon-Based Features Vector for~Sentiment Analysis: Application to~Moroccan Arabic Tweets},
  booktitle = {Advances in Information,  Communication and Cybersecurity}
}