Master Theses

Ontology-Based Model for Parsing Arabic Verbal Sentence

February 2019
By: Khaled M. Almunirawi

Parsing is necessary for distinguishing the meaning and understanding the intentions of the speaker and rolling out any ambiguities. It is nearly impossible to misunderstand the meaning if the parsing is done correctly. Parsing Arabic sentence is a difficult task due to the relatively free word order of Arabic, besides the length of the sentence and the omission of diacritics (vowels) in written Arabic.

In this research, we build a model to parse the Arabic verbal sentence based on Arabic grammar ontology which conceptualizes the Arabic grammar. We have added some classes and instances to the ontology in order to conceptualize the Arabic verbal sentence and build Grammar Rules, Verb Properties, Parsing Classes and Conjunction Checker components to form a verbal sentence knowledge base. The parsing model is supported by morphological analyses for sentence syntactic analysis and Arabic synonyms extractor for deriving synonyms.

We implemented the model and provided it with a user interface where the user can enter the sentence to be parsed, with options to partially or totally add diacritics to the words, and the possibility to remove ambiguity by choosing the most appropriate analysis from lexicon results, and then obtain the parsing result.

To evaluate the model we have selected 22 Arabic verbal sentences from Arabic grammar books that cover all 43 possibilities for a verbal sentence, and we performed several tests with these sentences, once with diacritics and once without them. The results we obtained show the accuracy of the model in parsing a sentence where the accuracy increases with the presence of diacritics, with avoiding free word order and with following the Arabic verbal sentence general pattern.


Parallel Text Classification Applied to Large Scale Arabic Text 

September 2017
By: Bushra Omar Alqarout 

Arabic text classification is becoming the focus of research and study for many researchers interested in Arabic text mining field especially with the rapid grow of Arabic content on the web. In this research, Naïve Bayes (NB) and Logistic Regression (LR) are used for Arabic text classification in parallel. When these algorithms are used for classification in a sequential manner, they have high cost and low performance. Naïve Bayes cost a lot of computations and time when it is applied on large scale datasets in size and feature dimensionality. On the other hand, logistic regression has iterative computations which cost heavy time and memory. Also, both algorithms do not give satisfying accuracy and efficiency rates especially with large Arabic dataset taking into account that Arabic language has complex morphology adding complexities to the computing cost. Therefore, in order to overcome the above limitations, these algorithms must be redesigned and implemented in parallel. 
In this research, we design and implement parallelized Naïve Bayes and Logistic Regression algorithms for large-scale Arabic text classification. Large-scale Arabic text corpuses are collected and created. This is followed by performing the proper text preprocessing tasks to present the text in appropriate representation for classification in two phases: sequential text preprocessing and term weighting with TF-IDF in parallel. The parallelized NB and LR algorithms are designed based on MapReduce model and executed using Apache Spark in-memory for big data processing. Various experiments are conducted on a standalone machine and on a computer clusters of 2, 4, 8, and 16 nodes. The results of these experiments are collected and analysed.  
We found that applying stemming approach reduced dataset documents’ sizes and affects the classification accuracy where root stemming gets more accurate results than light (light1) stemming. For fast results, NB is suitable and returns high accuracy rates around 99% for large-scale documents with high dimensionality. LR also gives accurate results except it takes longer time than NB. It gives 93% accuracy for AlBokhary corpus compared to NB which gives 89% accuracy for the same corpus. 


Human Personality Derivation Using Ontology-Based Modern Physiognomy

August 2017

By: Almotasembelah Awaja

People depend on the first facial impression in dealing with each others especially the strange ones. There are several people psychological factors in determining the first facial impression, here Egyptian, Chinese, and Greek and Islamic civilizations tried to find a relationship between facial features and personality traits. 
 The physiognomy was found in these civilizations, which means the inference of abilities and morality of humans by looking at the appearance of their bodies. 
Contemporary approaches and techniques such as the semantic web and ontology engineering can be effective in representing, processing and deciding in the science of physiognomy. There have been various efforts in this direction that led to encouraging results but opened new issues and needs further efforts.  
We build an accurate semantically enriched system, through ontology, for the derivation of personality in modern physiognomy domain. We create a knowledge base that includes the (HPDPOnto) ontology, set of individuals, and set of SWRL rule through building the semantically system. 
The accuracy of the approach including the physiognomy ontology is evaluated through measuring the correctness of the personality derivation results. The proposed system is evaluated using cases provided by a physiognomy expert. The results have shown that the system has correctly derived 19 out of 21 cases with ratio correctness of 90 %.


Improving Dependency Parsing of Verbal Arabic Sentences Using Semantic Features

August 2016
By: Heyam Khamis Elnajjar

Parsing is one of the most interesting areas in Natural Language Processing and Arabic parsing takes part of these researches. Parsing describes a word in a sentence grammatically, identifying the parts of speech, syntactic relations, etc. Dependency parsing syntactic structure consists of lexical items linked by binary asymmetric relations called dependencies. Dependency parsing community has, for the last few years shown considerable interest in parsing related to Morphologically Rich Languages with Flexible Word Order (MOR-FWO).

Linguistic semantics in Arabic language words play a big role over understanding the meaning of the sentence in a context. Because of the nature of Arabic language such as free word order, the absence of diacritics or even morphological features, and other phenomena. These lead us to the importance of using the semantics in reducing the gap in the parsing of Arabic sentences that depends on  the syntax structure and improving the machine learning parsing and therefore parsing models and applications.

We propose a dependency parsing approach for Modern Standard Arabic MSA Arabic verbal sentences utilizing the information available in lexical Arabic VerbNet to complement the morpho-syntactic information already available in data. This complementing information is encoded as an additional semantic feature for data driven parsing. We do a series of experiments over Arabic text wirenews in Arabic dependency parsing using MaltParser. In this work, we present our experiments, with just 332 sentences training data, we are able to build a dependency parser with state-of-the-art accuracy of 71.5% Labeled Attachment Score (LAS), 77.5% Unlabeled Attachment Score (UAS),and 2% increasing in total accuracy compared to case without using semantic features


A Confidentiality Protection Approach Based on Three-Way Fragmentation for Cloud Out-Sourcing of Mobile Data

October 2015
By: Rana H. Al-Talaa

Despite the increasing dependency on smart phones and mobile data-sensitive applications, these devices have limited storage and processing capabilities. They operate in unreliable environments leading to possible loss of valuable data, if not properly managed. Mobile data outsourcing through the cloud allows users to send their potentially sensitive data to external servers that become responsible for their storage, management, and dissemination. However, such cloud outsourcing may lead to violation of privacy if the network/server cannot be fully trusted. While encrypting the data prior to sending appears to be a solution for this problem, this is computationally intensive and infeasible in the case of mobile devices.
In this thesis, we develop a confidentiality protection approach for cloud out-sourcing of mobile data. It performs encryption for partial data that takes into account both the confidentiality constraints of the data being collected and the limitations of the mobile devices. Our approach employs hybrid fragmentation (vertical and horizontal fragmentation) to determine those parts that need to be encrypted and those that can be sent in clear, also those that should be stored at the owner's side.
We provide an implementation of the confidentiality protection approach. The implantation is basically based on the architecture of the approach and realizes the confidentiality constraints specified as a basis for the data partitioning.
We present an evaluation to the confidentiality approach to show its ability to satisfy the constraints imposed on the injury and martyr data and used to fragment these data as defined by the approach. We perform a number of tests to check if the data is fragmented and uploaded to the cloud as required and within the required time. Additionally we compare our approach to a similar approach to show that ours outperforms the other approach in satisfying the confidentiality constraints taking less space and time.


Ontology-Based Arabic Documents Classification

March 2015
By: Mohammed M. Abu Jasser

Automatic documents classification is an important task due to the rapid growth of the number of electronic documents. Classification aims to assign the document to a predefined category automatically based on its contents. In general, text classification plays an important role in information extraction and summarization, text retrieval, question answering, e-mail spam detection, web page content filtering, and automatic message routing.

Most existing methods and techniques in the field of documents classification are keyword based without many intelligent features. Even methods that ontology-based classification is limited to English language support.

In this research, we propose an approach to investigate the role of ontology (an Arabic news domain ontology) in Arabic documents classification. The results show that the proposed ontology-based approach achieves improvement in the process of documents classification based on the different evaluation criteria. Therefore the use of ontology contribute effectively in the process of Arabic documents classification.


Large-Scale Arabic Text Classification Using MapReduce

February 2015
By: Maher M. Abushab

Text classification on large-scale real documents has become one of the most core problems in text mining. For English and other languages many text classification works have been done with high performance. However, Arabic language still needs more attention and research since it is highly rich and require special processing.

Existing Arabic text classification approaches use techniques such as feature selection, data representation, feature extraction and sequential algorithms. Few attempts were done to classify large-scale Arabic text document in a distributed manner.

In our research, we propose a parallel classification approach based on the Naïve Bayes algorithm for large volume of Arabic text using MapReduce with enhanced speedup, performance and preserved accuracy.

The experiments show that the parallel classification approach can process large volume of Arabic text efficiently on a MapReduce cluster with 16 computers, which can significantly improve the speedup. Also, classification results show that the proposed parallel classifier has achieved accuracy, precision, recall, and F-measure with higher than 97%.


دور المعاملات الالكترونية في تطوير الاداء الحكومي في فلسطين  (دراسة حالة وزارة الاتصالات وتكنولوجيا المعلومات – قطاع غزة)

اسماعيل جمال حمادة

هدفت هذه الدراسة إلى معرفة دور المعاملات الإلكترونية في تطوير الأداء الحكومي من حيث (الكفاءة والفاعلية،الشفافية،جودة الخدمة)، كما وهدفت هذه الدراسة الى معرفة واقع تطبيق المعاملات الإلكترونية ومدى توفر متطلبات تشغيلها وتطبيقها في وزارة الإتصالات وتكنولوجيا المعلومات، بالإضافة الى تحديد أبرز التحديات والمشكلات التي تواجه وزارة الإتصالات وتكنولوجيا المعلومات وسبل التغلب عليها لاستكمال عملية التحول نحو المعاملات الإلكترونية.

ومن أجل تحقيق أهداف الدراسة قام الباحث باستخدام المنهج الوصفي التحليلي الذي يحاول من خلاله وصف الظاهرة موضوع الدراسة، وتحليل بياناتها، والعلاقة بين مكوناتها والآراء التي تطرح حولها والعمليات التي تتضمنها والآثار التي تحدثها، واستخدم طريقة الحصر الشامل نظراً لصغر حجم مجتمع الدراسة وسهولة الوصول إلى الفئة المستهدفة، حيث تم توزيع الاستبيانة على كافة أفراد مجتمع الدراسة والبالغ عددهم 111 موظف.

وقد توصلت الدراسة إلى عدة نتائج أهمها أن المتطلبات اللازمة لتطبيق المعاملات الإلكترونية في وزارة الإتصالات وتكنولوجيا المعلومات والمتعلقة بالمتطلبات الإدارية والبنية الفنية والموارد المالية والكوادر البشرية المؤهلة والمدربة على التطبيقات والأنظمة المحوسبة متوفرة.

كما بينت الدراسة أن معظم خدمات الوزارة تقدم من خلال المعاملات الإلكترونية، بالإضافة إلى وجود علاقة إيجابية بين تطبيق المعاملات الإلكترونية وتطوير الأداء من خلال زيادة الكفاءة والفاعلية وتعزيز الشفافية وتحسين الخدمة العمومية.

وخرجت الدراسة بعدد من التوصيات منها تعزيز مهارات وقدرات الموظفين اللازمة لتطبيق المعاملات الإلكترونية من خلال خطة للتدريب، ووضع آليات مناسبة للإعلان عن المعاملات الإلكترونية عبر الوسائل التسويقية والاعلانية، وضرورة توفير نظام خاص بالدفع الإلكتروني، وتوفير دليل لقواعد الممارسة الصحيحة لكافة التطبيقات والمعاملات الإلكترونية وذلك لأهميته في إرشاد الموظفين حول كيفية استخدم المعاملات الإلكترونية بطريقة صحيحة وسليمة، كما وأوصت الدراسة بضرورة تعزيز أمن المعلومات والعمل على استخدم تقنيات مناسبة للتأكد من سلامة المعاملات الإلكترونية من التزوير والتلاعب.


Automatic Ontology-Based Document Annotation for Arabic Information Retrieval

August 2013
By: Ashraf I. Kaloub

The rapid development in the semantic search technology gives motivation to build an efficient and scalable document annotation and  retrieval techniques. Most existing methods and techniques in the field of document annotation and retrieval depend on English documents. Although the growing amount of Arabic content is being spread over the internet and other resources, there is little work carried out on Arabic semantic search and Arabic document annotation and retrieval.

In this research we propose an approach for enhancing the process of information retrieval for Arabic language that depends on the ontology in the process of document annotation. The results and evaluation of the approach depend on the two common evaluation criteria precision and recall.


An Ontology-Based Approach to Support the Process of Judging Hadith Isnad

March 2013
By: Yehya M. Dalloul

The two fundamental sources of Islamic legislation are Qur’an and Hadith. Hadiths, or prophetic traditions, are narrations originating from the sayings and actions of Prophet Muhammad (peace be upon him). The narrators transmitted the sayings of the Prophet by Isnad and for the importance of Isnad  Muslims keen interest of Isnad science because It helps differentiate between the accepted and rejected Hadith, or, in other words, Sahih and weak Hadith .The Islamic scholars were the first to study Isnad accurately to determine who the trusted men are and vice versa. We want to develop this science through this research. We build an ontology-based Isnad Judgment System (IJS) that automatically generates a suggested judgment of Hadith Isnad. It based on the rules that Hadith scholars follow to produce a suggested judgment. A prototype of the approach implemented to provide a proof of concept for the requirements and to verify its accuracy. We evaluated the system according to Al-Albani scholar and according to Hadith specialist. The Results discussed in both approaches where the accuracy of the system in the first approach is 75% in the second approach is 81%. These results prove that the ontology supports the process of Isnad judgment. We evaluated the ontology using Task-Based framework it indicate that the accuracy of using the IJS ontology is 100%.


A High Performance Parallel Classifier for Large-Scale Arabic Text

March 2013
By: Mohammed M. Abu Tair
Text classification has become one of the most important techniques in text mining. It is the process of classifying documents into predefined categories or classes based on their content. A number of machine learning algorithms have been introduced to deal with automatic text classification. One of the common classification algorithms is the k-Nearest Neighbor (k-NN) which is known to be one of the best classifiers applied for different languages including Arabic language and it is included in numerous experiments as a basis for comparison. Furthermore, it is a simple classification algorithm and very easy to implement since it does not require a training phase that most classification algorithms must have. However, the k-NN algorithm is of low efficiency because it requires a large amount of computational power for evaluating a measure of the similarity between a test document and every training document and for sorting the similarities. Such a drawback makes it unsuitable to handle a large volume of text documents with high dimensionality and in particular in the Arabic language.
In our research, we propose to develop a parallel classifier for large-scale Arabic text that achieves the enhanced level of speedup, scalability, and accuracy. The proposed parallel classifier is based on the sequential k-NN algorithm. We test the parallel classifier using the Open Source Arabic Corpus (OSAC) which is the largest freely public Arabic corpus of text documents. We study the performance of the parallel classifier on a multicomputer cluster that consists of 14 computers. We report both timing and classification results. These results indicate that the proposed parallel classifier has very good speedup and scalability and is capable of handling large documents collections. Also, classification results show that the proposed classifier has achieved accuracy, precision, recall, and F-measure with higher than 95%.


Building and Evaluating a SOA-Based Model for Purchase Order Management in E-Commerce System

October 2012
By: Yousef M. Al-Ashqar 

E-Commerce systems are characterized by complex Web applications that use different operating systems and different technologies. One of the most popular E-Commerce applications is conducted between businesses (B2B) and between a business and a consumer (B2C) is Purchase Order Management. It consists of components such as Sales, Shipping and Billing.
In many cases nowadays Purchase Order Management components use integration approaches that lack interoperability and manageability resulting in customer dissatisfaction, time consumption and excessive costs.
In this research, we build a model to overcome shortcomings of current Purchase Order Management system. The model is based on the Service Oriented Architecture (SOA) principles, Enterprise Service Bus (ESB), and Web services. They offer many advantages and help achieve the goals of interoperability and manageability. The proposed model is evaluated by using a scenario based software architecture method and proves that it achieves the quality attributes set as goals for the model which are interoperability and manageability. A case study of the model is implemented as a prove-of-concept. A specific usage scenario for the model is discussed and further proves that the model accomplishes its functionality and quality attributes.


Automatic Arabic Domain-Relevant Term Extraction

September 2012
By: Manar S. Fayyad
Term extraction from text corpus is an important step in knowledge acquisition and it is the first step in many Natural Language Processing (NLP) methods and computer lingual systems. In Arabic language there are some works in the field of term extraction and few of them try to extract domain-relevant terms.
In this research a model for automatic Arabic domain-relevant term extraction from text corpus was proposed. The proposed model uses a hybrid approach composed of linguistic and statistical methods to extract terms relevant to specific domains depending on prevalence and tendency term ranking mechanism.
In order to realize the proposed model a multi domain corpus separated into 10 domains (Economic, History, Education and family, Religious and Fatwa's, Sport, Health, Astronomy, Low, Stories, and Cooking recipes) was used. Then this corpus preprocessed by removing non Arabic letters, punctuations, diacritics, and stop words. Then a candidate terms vector was extracted using a sliding window with variant length dropping the windows that contain a stop word.
Candidate terms have been ranked using Termhood method as a statistical method that measures the distributional behavior of candidate terms within the domain and across the rest of the corpus.
Then Candidate terms have been distributed over the domains depending on the higher rank result for the extracted terms constructing a domain term matrix. This matrix has been used in a simple classifier that classifies the testing corpus. The final step gives us a confusion matrix that indicates that the domain term matrix worked as a best classifier achieving an accuracy rate of 100% for some domains and very good in others. The total accuracy of the classifier was 95%. This is a highly accurate classifier.


A Web-Based Collaborative e-Learning Environment Based on a Model of Social Cognitive Development Theories
By: Najwa A. Baraka 
Putting all powerful Web technologies like Cloud Computing and Web 2.0 technologies together in an e-learning environment maximizes the opportunity for learners to acquire knowledge and skills in an interactive, collaborative and social manner and decreases technical efforts and financial burdens on educational institutions. This research proposes a collaborative e-learning model that consists of six levels and six tasks based on four social cognitive development theories which are: Connectivism, Social Cognitive Development, Social Interdependence and Cognitive Elaboration Perspectives. The proposed collaborative e-learning model levels are: Networking, Contribution, Cognitive Disequilibrium, Origination of Social Interaction, Knowledge Evolving, and Cognitive Equilibrium. The tasks of the proposed collaborative e-learning model are: Knowledge Feeding, Knowledge Self-Reflection, Knowledge Negotiation, Knowledge Elaboration, Knowledge Accommodation and Knowledge Shifting. A rich Web-based collaborative e-learning environment called ShareSpace is developed as a realization of the proposed collaborative e-learning model. ShareSpace is evaluated based on the proposed collaborative e-learning model, on a framework for evaluating computer supported collaborative learning and on an adaptable usability heuristic checklist for online courses. ShareSpace is an interactive and flexible social collaborative e-learning environment which can be utilized by educational institutes and contributes to the overall goal of learning process which is maximizing the learning outcome. 


A SOA Based Framework for the Palestinian e-Government Integrated Central Database
By: Suhail M. Madoukh
The Integrated Central Database is one of the core components in the Palestinian e-Government Technical Framework. The current Integrated Central Database model lacks features such as: Interoperability, Flexibility, and Manageability. The purpose of this research is to propose a SOA based solution for the Central Database that achieves the above features. This research presents and analyses the current architecture and implementation of the Palestinian e-Government Central Database model. The research proposes changing the current model into a Service Oriented Architecture (SOA) framework that is realized using Enterprise Service Bus (ESB) and Web Services. The proposed framework offers database replication and connectivity functionalities for central database. The proposed framework is evaluated by using a scenario based software architecture evaluation method and proves that it achieves the quality attributes set as goals for the framework which are: Interoperability, Flexibility and Manageability. Moreover, a prototype of the framework is implemented and validates the framework correctness. A specific usage scenario for the framework is discussed and further proves that the framework accomplishes its functionality and quality attributes.