A knowledge-based approach for keywords modeling into a semantic graph

By Oumayma Chergui, Ahlame Begdouri, Dominique Groux-Leclet


Web based search for a specific problem usually returns long lists of results, which may take up a lot of time to browse until finding the exact solution, if found at all. Community Question Answering systems on the other hand offer a good alternative to solve problems in a more efficient way, by directly asking the community, or automatically extract similar questions that have already been answered by other users. Using external knowledge bases for such similarity measures is a growing field of research, due to their rich content and semantic relations. Indeed, many research works base their semantic textual similarity measures on annotating texts or extracting specific knowledge from an external knowledge base.

Our research aims at creating a semantic domain-specific graph of keywords using data extracted from the DBpedia knowledge base. This keywords graph will be used later, in a graph-based similarity approach inside a CQA archive in order to retrieve similar questions. In this paper, we define the structure of the semantic graph and propose our method for automatically creating it, backed with experimental results.

Full Text:





M. Atif, “Utilising Wikipedia for text mining applications,” Ph.D. dissertation, College of Engineering and Informatics, National University of Ireland, Galway, 2015.

J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum, “YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia,” IJCAI Int. Jt. Conf. Artif. Intell., vol. 194, pp. 3161–3165, 2013.

A. Hotho, A. Nürnberger, and G. Paaß, “A Brief Survey of Text Mining,” LDV Forum - Gld. J. Comput. Linguist. Lang. Technol., vol. 20, pp. 19–62, 2005.

M. Färber, B. Ell, C. Menne, and A. Rettinger, “A Comparative Survey of DBpedia , Freebase,OpenCyc,Wikidata,And YAGO,” Semant. Web, vol. 1, pp. 1–5, 2015.

C. Paul, A. Rettinger, A. Mogadala, C. A. Knoblock, and P. Szekely, “Efficient Graph-based Document Similarity,” LNCS B. Ser. - Semant. Web. Latest Adv. New Domains, ESWC 2016, vol. 9678, pp. 334–349, 2016.

R. Thiagarajan, G. Manjunath, and M. Stumptner, “Computing Semantic Similarity Using Ontologies,” in International Semantic Web Conference (ISWC), 2008, Germany.

B. P. Nunes, B. Fetahu, R. Kawase, S. Dietze, M. A. Casanova, and D. Maynard, “Interlinking documents based on semantic graphs with an application,” SIST B. Ser. - Knowledge-Based Inf. Syst. Pract., vol. 30, pp. 139–155, 2015.

J. P. Leal, V. Rodrigues, and R. Queirós, “Computing Semantic Relatedness using DBPedia,” OpenAccess Ser. Informatics -, pp. 133–147, 2012.

G. Zhu and C. A. Iglesias, “Computing Semantic Similarity of Concepts in Knowledge Graphs,” vol. 29, no. 1, pp. 72–85, 2017.

D. Metzler, S. Dumais, and C. Meek, “Similarity Measures for Short Segments of Text,” LNCS B. Ser. - Adv. Inf. Retr., vol. 4425, pp. 16–27, 2007.

B. Sriram, “Short text classification in Twitter to improve information filtering,” MS dissertation, The Ohio State University, 2010.

M. Chein and M.L. Mugnier, “Graph-based Knowledge Representation: Computational Foundations of conceptual graphs”, in Advanced Information and Knowledge Processing, 2009








T. Wei, Y. Lu, H. Chang, Q. Zhou, and X. Bao, “A semantic approach for text clustering using WordNet and lexical chains,” Expert Syst. Appl., vol. 42, pp. 2264–2275, 2015.

K. Nakayama, “Wikipedia Mining for Triple Extraction Enhanced by Co-reference Resolution,” in First Workshop on Social Data on the Web (SDoW2008), 2008.

M. Schuhmacher and S. P. Ponzetto, “Knowledge-based graph document modeling,” in ACM international conference on Web search and data mining, 2014, July, pp. 543–552.

Y. I. A. Khalid and S. A. Noah, “A Framework for Integrating DBpedia in a Multi- Modality Ontology News Image Retrieval System,” in International Conference on Semantic Technology and Information Retrieval, 2011, pp. 144–149.

Z. Wu et al., “An efficient Wikipedia semantic matching approach to text document classification,” Inf. Sci. (Ny)., vol. 393, pp. 15–28, 2017.

O. Chergui, A. Begdouri, and D. Groux-Leclet, “CBR approach for knowledge reuse in a Community of Practice for university students”. in the 4th IEEE Inter. Col. on Inf. Sci. and Tech. (CiSt’16), 2016, October, pp. 553-558.

A. Aamodt and E. Plaza, “Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches,”, in AI Communications, vol. 7, no. 1, pp. 39–59, Mar. 1994

O. Chergui, A. Begdouri, and D. Groux-Leclet, “Keyword-based similarity using automatically generated semantic graph in an online Community of Practice”, LNCS Em. Tech. for Edu., vol. 10108, pp. 526 – 532, 2017.

C. D. Manning Hinrich Schiitze, Foundations of Statistical Natural Language Processing. 1999.

R. P. Kamdi and A. J. Agrawal, “Keywords based Closed Domain Question Answering System for Indian Penal Code Sections and Indian Amendment Laws,” I.J. Intell. Syst. Appl. Intell. Syst. Appl., vol. 12, no. 12, pp. 57–67, 2015.

A. Baltadzhieva, “Question Quality in Community Question Answering Forums : a survey,” Sigkdd Explorations, vol. 17, no. 1, pp. 8–13, 2015.

E. Wenger, “Communities of Practice: Learning, Meaning, and Identity”, New York: Cambridge University Press, 1998.

W. Yih and C. Meek, “Improving Similarity Measures for Short Segments of Text,” Adv. Inf. Retr., pp. 1489–1494, 2007.

A. H. Jadidinejad, F. Mahmoudi, and M. R. Meybodi, “Conceptual feature generation for textual information using a conceptual network constructed from Wikipedia,” Expert Syst., vol. 33, no. 1, pp. 92–106, 2016.

G. Salton, A.Wong, and C. S. Yang. “A vector space model for automatic indexing”. Communications of the ACM, vol.18, no. 11, pp. 613–620, 1975.

W. H. Gomaa and A. A. Fahmy, “A Survey of Text Similarity Approaches,” Int. J. Comput. Appl., vol. 68, no. 13, pp. 975–8887, 2013.

C. Bizer, “DBpedia - A Large-scale, Multilingual Knowledge Base Extracted fromnWikipedia,” SemanticWeb, vol. 1, pp. 1–29, 2012.

V. Rus, M. Lintean, A. C. Graesser, and D. S. McNamara, “Text-to-Text Similarity of Sentences,” Appl. Nat. Lang. Process., pp. 110–121, 2012.

M. A. Kadry and A. R. M. El Fadl, “A proposed model for assessement of social networking supported learning and its influence on learner behaviour,” in the Int. Conf. on Int. Mob. and Comp. Aid. Lear. , pp. 101–108, 2012.

J. Friedman, “Social Media Gains Momentum in Online Education”, 2014, [Online] http://www.usnews.com/education/online-education/articles/2014/11/05/social-media-gains-momentum-in-online-education

L. Deng and N.J. Tavares, “From Moodle to Facebook: Exploring students' motivation and experiences in online communities”, Computers & Education, vol. 68, p167–176, 2013.

International Journal of Information Science and Technology (iJIST) – ISSN: 2550-5114