Structuring Arabic lexical and morphological resources using TEI: theory and practice

Angelo Mario Del Grosso; Ouafae Nahli

doi:10.57675/IMIST.PRSM/ijist-v5i3.191

Structuring Arabic lexical and morphological resources using TEI: theory and practice

Angelo Mario Del Grosso Institute for Computational Linguistics "A. Zampolli" ILC-CNR (Via Moruzzi 1, Pisa) http://orcid.org/0000-0002-4867-6304
Ouafae Nahli

DOI: http://dx.doi.org/10.57675/IMIST.PRSM/ijist-v5i3.191

Abstract

An Arabic word can be described according to its lexical and its morphological information.Lexical information, conveyed by the root, consists of both semantic meaning and syntactic properties (e.g. parts of speech). Whereas, morphological information, encoded by patterns, is useful to group the words having similar syntactic, inflectional and semantic behaviour.The lexical analysis and morphological analysis were distinctly described from the very first studies of Arabic language. Although several scholarly works illustrate Arabic lexicon models encoding semantic meanings, a systematic description of word patterns continues to be very lacking. In this work, we have designed an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary al=qāmūs al=muḥīṭ. The morphological level describes patterns formalization which allows to enrich word descriptions with additional semantic, morphosyntactic and inflectional information.In order to build our digital resource, taking into account primary source, lexical requirements, and reusability, we followed the guidelines provided by the Text Encoding Initiative (TEI). We adopted the TEI module devoted to encoding digital dictionaries and lexicons to formally represent the medieval al=qāmūs al=muḥīṭ dictionary. Given the complexity to describe morphological information extant in the patterns, we also used the TEI module devoted to encoding feature structures.According to the obtained model, we can build an exhaustive resource which is composed of two components the lexical block and the morphological block. These two components are distinct but complementary resources, in which lexical data is connected to morphological information. In addition, the morphological resource can be used as a stand-alone tool allowing morphological analyzers to capture aspects of meaning that are not captured by current systems.

Published

Jan 3, 2022

How to Cite

DEL GROSSO, Angelo Mario; NAHLI, Ouafae. Structuring Arabic lexical and morphological resources using TEI: theory and practice. International Journal of Information Science and Technology, [S.l.], v. 5, n. 3, p. 3 - 14, jan. 2022. ISSN 2550-5114. Available at: <https://innove.org/ijist/index.php/ijist/article/view/191>. Date accessed: 25 june 2026. doi: http://dx.doi.org/10.57675/IMIST.PRSM/ijist-v5i3.191.

Citation Formats

Issue

Vol 5 No 3 (2021)

Section

Research Challenges in Digitalization and Societal Transformation

The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.

Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.

In order for iJIST to publish and disseminate research articles, we need publishing rights. This is determined by a publishing agreement between the author and iJIST.

By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.

Privacy Statement

The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.