Evidence of Students’ Academic Performance at the Federal College of Education Asaba Nigeria: Mining Education Data

One main objective of higher education is to provide quality education to its students. One way to achieve the highest level of quality in the higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students, and prediction about students’ performance. The knowledge is hidden among the educational data set and is extractable through data mining techniques. The present paper is designed to justify the capabilities of data mining techniques in the context of higher education by offering a data mining model for the higher education system in the university. In this research, the classification task is used to ev aluate student’s performance, and as many approaches are used for data classification, the decision tree method is used here. By this, we extract data that describes students’ summative performance at semester’s end, helps to identify the dropouts and stud ents who need special attention


I. Introduction
The advent of data technology in various fields has led to massive volumes of data in various forms like files, audio, videos, images, and lots of new data formats [1] [2].Data from diverse applications requires a correct method of extracting knowledge from large repositories for better decision-making [3].Knowledge discovery aims at birthing valuable, meaningful data via a collection of knowledge [4] [5].Knowledge mining uses various methods and algorithms to extract various forms of data.Data processing and mining for knowledge discovery tools have since recorded tremendous success in their impact [6][7] [8], and have become an essential facet in various organizations [9][10] [11] [12].Data processing techniques are introduced into new fields of statistics, databases, machine learning, pattern reorganization, AI, and Computation competencies.
There are growing study interests in using educational data mining.This recently evolving field, called educational data mining, concerns developing approaches that discover knowledge from data originating from educational environments [13] [14][15] [16].Educational data mining uses techniques like Decision Trees, Neural Networks, Naïve Bayes, and K-nearest neighbors [17].These techniques reveal many sorts of knowledge, like association rules, classifications, and clustering [18] [19].The

ARTICLE INFO
A B S T R A C T revealed knowledge is used in prediction about the enrolment of scholars during a particular course, alienation of traditional classroom teaching model, detection of unfair means utilized in online examination, detection of abnormal values within the result sheets of the scholars, prediction about students' performance then on [20] [21] [22] [23].
The study uses data mining methodologies to investigate students' performance within the various courses.Data mining offers many tasks to investigate student performance, and for such tasks in classification [24] -we seek to study student's performance using decision tree classification.Data such as class tests, attendance, assignment marks, and examination score(s) were collected and used to predict the performance at the top of the semester.

II. Methods
Data Mining is often utilized in the tutorial field to reinforce our understanding of the training process to specialize in identifying, extracting, and evaluating variables associated with the training process of scholars as described by many scholars.Mining in an academic environment is named educational data mining.Data mining in education may be a recent research field and this area of research area is gaining popularity due to its potential for educational institutes.Shiokawa et al. [25] describe data mining transactions as one that permits the users to research data from different dimensions, categorize it, and summarize the relationships identified during the mining process [25].The study was conducted on student performance by selecting 600 students from different colleges at Awadh University, Faizabad, India.Employing Bayes Classification on category, language, and background qualification, it had been found whether newcomer students will perform or not [26].
Ahmad et al. [27] conducted a study on student performance by selecting 300 students (225 males, 75 females) from a gaggle of schools affiliated to Punjab University of Pakistan.The hypothesis stated as "Student's attitude towards attendance in school, hours spent during a study on day to day after college, student's family income, students' mother's age, and mother's education are significantly related with student performance" was framed.Employing simple rectilinear regression analysis found that factors like the mother's education and the student's family income were highly correlated with the student's academic performance.
Brindlmayer [28] conducted a performance study on 400 students comprising 200 boys and 200 girls selected from the senior lyceum of Aligarh Muslim University, Aligarh, India, with the most objective to determine the prognostic value of various measures of cognition, personality, and demographic variables for fulfillment at higher secondary level in science stream.The choice was supported by the cluster sampling technique, during which the whole population of interest was divided into groups or clusters, and a random sample of those clusters was selected for further analyses.it had been found that girls with high socio-economic status had relatively higher academic achievement within the science stream, and boys with low socio-economic status generally had relatively higher academic achievement.
Nguyen et al. [29] gave a case study using student data to research their learning behavior to predict the results and warn students in danger before their final exams.They applied a decision (choice) tree model to predict the ultimate grade of scholars who studied the C++ course at Yarmouk University, Jordan, 2015.The 3-classification methods were used, namely ID3, C4.5, and the Naive Bayes.Their results indicated that the choice Tree model had better predictions than others.Also, Nilam [30] conducted a study on student performance by selecting 60 students from a degree college of Awadh University in India.Through association rule, they find the interesting in opting class teaching language.He describes using the k-means clustering algorithm to predict student's learning activities.The knowledge generated after implementing the info-mining technique could also be helpful for a teacher and college kids.
Chen [31], in his study on private tutoring and its implications, observed that the share of scholars receiving private tutoring in India was relatively above in Malaysia, Singapore, Japan, China, and Sri Lanka.It had also been observed that there was an enhancement of educational performance with the intensity of personal tutoring, and this variation of intensity of personal tutoring depends on the collective factor, namely socio-economic conditions.Haipinge et al. [32] conducted a study on student performance by selecting 300 students from 5 different degree colleges conducting the BCA (Bachelor of Computer Application) course.Through the Bayesian classification method on 17 attributes [33][34] [35][36], it was found that factors like students" grades within the senior secondary exam, living location, medium of teaching, mother's qualification, student other habits, family annual income, and student's family status were highly correlated with the scholar academic performance.
Data mining, also called Knowledge Discovery in Database (KDD), often refers to digging out [37][38] [39] or "mining" knowledge from large amounts of data.Data mining techniques are wont to operate vast volumes of data to get hidden patterns and relationships helpful in deciding [40][41] [42].While data mining and knowledge discovery within the database are frequently treated as synonyms, data mining is a component of the knowledge discovery process [43][44] [45].The sequences of steps identified in extracting knowledge from data are shown in Figure 1.Regression techniques are often adapted for prediction.Regression or multivariate analyses are often used to model the connection between one or more independent variables and dependent variables.In data mining, independent variables are attributes already known, and response variables are what we would like to predict.Unfortunately, many real-world problems are not predictive [50][51][52] [53].Prediction is a declaration of something self-evident, a task that can be assumed as the basis for argument.Thus, more complex techniques are used (e.g., logistic regression, decision trees, or neural nets) to forecast future values.An equivalent model type can often be used for both regression and classification.For instance, the CART (Classification and Regression Trees) decision tree algorithms are often built to classify categorical response variables and to forecast continuous response variables.Neural networks can also create classification and regression models [21][54].
Classification is the most ordinarily applied data processing technique, which employs a group of pre-classified examples to develop a model to classify the population of records at large.This approach frequently employs decision trees or neural network-based classification algorithms.The info classification process involves Learning and classification.In Learning, the classification algorithm analyzes the training data [55].In classification test data are wont to estimate the accuracy of the classification rules.If the accuracy is suitable, the principles are often applied to the new data tuples [23].The classifier-training algorithm uses these pre-classified examples to determine the parameters required for correct discrimination.The algorithm encodes these feats into a model/classifier [26] [56][57] [58].
The decision tree is a tree-shaped structure representing selection sets.These decisions generate rules for the classification of a dataset.Specific decision tree methods include Classification and Regression Trees [59][60] [61].
Clustering is often said because of the identification of comparable classes of objects.Using clustering techniques, we will further identify dense and sparse regions in object space and may discover overall distribution patterns and correlations among data attributes [62][63] [64].The classification approach can also be used to distinguish groups or classes of an object effectively but becomes costly, so clustering is often used as a preprocessing approach for attribute subset selection and classification [65][66] [67].
Association and correlation usually seek out frequent item set findings among large data sets.This finding helps businesses form certain decisions, like catalog design, cross-marketing, and customer shopping behavior analysis.Association Rule algorithms have to be ready to generate rules confidently values but one.Also, the possible association rules for a given dataset are usually extensive, and many principles are usually of little (if any) value [54].
A neural network may be a set of connected input/output units, and every connection features a weight present with it.At its training phase, the network learns by adjusting weights to be ready to predict the suitable class labels of the input tuples.Neural networks can derive meaning from complicated or imprecise data and may want to extract patterns and detect trends that are too complex to be noticed by humans or other computer techniques [25].These are compatible with continuousvalued inputs and outputs.Neural networks are best at identifying patterns or trends in data and are compatible with prediction or forecasting needs [68][69][70] [71].
A technique that classifies each record during a dataset supported a mixture of the classes of the k record(s) most almost like it during a historical dataset (where k is bigger than or adequate to 1).Sometimes called the k-nearest neighbor technique [72] [73].

A. Technical Experimental Framework
Today, a student's academic performance is determined by the internal assessment (formative tests) and end-of-the-semester (summative tests) examinations.The teacher administers the internal assessment based on students' performance in educational activities like class tests, seminars, assignments, general proficiency, attendance, and lab work.The scholar scores the end-of-semester examination within the semester examination.Each student must get minimum marks to pass a semester in internal and end semester examination.
The dataset used in this study was obtained from the Federal College of Education (Technical) Asaba, Delta State on the sampling method of the Post Graduate Diploma, of course, PDE-Technical Education (Post Graduate Diploma in Education, Technical Education Option) from session 2017 to 2020.Initially, the dimension of the info is 50.During this step, data stored in several tables was joined during a single table after joining process errors were removed.
In this step, only those fields were selected which were required for data processing.A couple of derived variables were selected.In contrast, some knowledge of the variables was extracted from the database.Table 1 gives all the predictor and response variables derived from the database for reference.

B. The Proposed ID3 Decision Tree Classifier
A tree in which each branch node represents a choice between some alternatives and every leaf node represents a choice is referred to as a decision tree.Decision trees are commonly used for gaining information for decision-making.The choice tree starts with a root node on which it's for users to require action.Users split each node from this node recursively, consistent with the decision tree learning algorithm.The ultimate result is a decision tree, each branch representing a possible scenario of a decision and its outcome.The three widely used decision tree learning algorithms are ID3, ASSISTANT, and C4.5.
The ID3 decision tree is a simple decision tree learning algorithm developed by Quinlan in 1986.The essential idea of the ID3 algorithm is to construct the decision tree by employing a top-down, greedy search through the given sets to check each attribute at every tree node.We introduce a metricinformation gain to pick the most useful attribute for classifying a given set.To find optimal thanks to classifying a learning set, we would like to try to attenuate the questions asked (i.e., minimizing the depth of the tree).Thus, some function will be needed to measure which questions provide the foremost balance splitting.The knowledge gain metric is such a function.

C. Measuring Impurity
Given a data table containing attributes and therefore the class of the attributes, we will measure the table's homogeneity (or heterogeneity) that supported the classes.We are saying a table is pure or homogenous if it contains only one class.If a knowledge table contains several classes, then we are saying that the table is impure or heterogeneous.There are several indices to live the degree of impurity quantitatively.The foremost well-known indices to live the degree of impurity are entropy, Gini index, and classification error.The calculation of Entropy as in (1).
A pure table's entropy (consisting of one class) is zero because the probability is 1 and log (1) = 0. Entropy reaches a maximum value when all classes within the table have equal probability.A Gini Index equation as in (2).
The Gini index of a pure table containing one class is zero because the probability is 1, and 1-1 2 = 0. Like entropy, the Gini index reaches a maximum value when all classes within the table have equal probability.The formula of classification error can be seen as in (3).
Similar to the Entropy and Gini Index, the Classification error index of a pure table (consisting of one class) is zero because the probability is 1 and 1-max (1) = 0.The worth of the classification error index is usually between 0 and 1.The utmost (maximum) Gini index for a given number of classes is usually adequate to the utmost of classification error index because, for a few classes n, we set probability as equal to p = 1/n, and therefore, the maximum Gini index happens at 1-n(1/n 2 ) = 1-(1/n), while maximum classification error-index also happens at 1-max{1/n}= 1-(1/n).

D. Splitting Criteria
We use the measure called Information Gain to determine the simplest attribute for a specific node within the tree.The knowledge gained expressed as Gain (S, A) of an attribute A is relative to a set of examples S -is defined as in (4).
Where value (A) is the set of all possible values for attribute A, and Sv is the subset of S that attribute A has value v (i.e.,   = {s  S| A(s) = v}).The first term within the equation for gain is simply the entropy of the first collection S, and therefore, the second term is that the arithmetic mean of the entropy after S is portioned using attribute A. The expected entropy is the sum of the entropies of every subset, weighted by the fraction of examples (|  |) || that belong to Gains (S, A) is therefore the expected reduction in entropy caused by knowing the worth of attribute A. Equation ( 5) and ( 6) is use for Split information and Gain Ratio.
Choosing a replacement attribute and partitioning the training examples is now repeated for every non-terminal descendant node.Attributes incorporated higher within the tree are excluded so that any given attribute can appear at the most once along any path through the tree.This process continues for every new leaf node until either of two conditions is met.First condition is every attribute has already been included along this path through the tree [44], or the training examples related to this leaf node all have an equivalent target attribute value (i.e., their entropy is zero) [51].The listing of the ID3 algorithm framework can be seen in Pseudocode 1.

III. Result and Discussion
A dataset of 50 students was used in this study (Table 2) as obtained/retrieved from the Professional Diploma in Education, Federal College of Education (Technical) Asaba, PDE-Technical Option from session 2017 to 2020 [74].To compute the knowledge gain for A relative to S -we compute the Entropy of S. But, S is the set of all of the 50-examples with value 12="Distinction", 15="Credit", 17="Merit", and 6="Fail".So, we have: CGP has the highest gain.Thus, it is used as the root node as shown in Figure 2. Table 3 represents the Gain Values.Table 4 shows the Split information values, while Table 5 represents the Gain Ratio.This process continues until all data is classified ideally or out of attributes.The knowledge represented by the decision tree can be extracted and represented in the form of IF-THEN rules as denoted in the Pseudocode 2.
One classification rule can be generated for each path from each terminal node to the root node.The pruning technique was executed by removing nodes with less than the desired number of objects.IF-THEN rules may be easier to understand.

Fig. 1 .
Fig. 1.The steps of extracting knowledge from data Numerous heuristic techniques, such as classification, clustering, regression, neural networks, association rules, decision trees, and genetic algorithms, have been successfully used for database knowledge discovery.These techniques and procedures in data mining need to be briefly mentioned to understand better [46][47][48][49].
() = −  (  ) −   log (  ) −   log (  ) −   log (  ) We use the measure called Information Gain to determine the simplest attribute for a specific node within the tree.The knowledge gain, Gain (S, A) of an attribute A, relative to a set of examples S: (, ) = () − [

Fig. 2 .
Fig. 2. The steps of extracting knowledge from data

Table 1 .
Student related variablesThe domain values for a few of the variables were defined for this investigation as follows:• CGP -Cumulative grade point obtained at the end of semester examinations.CGP is split into four classes: Distinction >4.50, Credit <4.50 & >3.50, Merit <3.50 & >2.50, Fail <2.50.• TP -Teaching practice performance obtained in the final semester: Teaching Practice programs are organized to see the performance of scholars in teaching as a profession.Teaching practice is evaluated into four classes: "A" >70, "B" <70 & >60, "C" <60 & >50, "F" <50.• ASS -Assignment performance.In each semester, two assignments are given to students by each teacher.Assignment performance is split into two classes: Yes -student-submitted assignment, No -Student not submitted assignment.• GA -General Aptitude.Like seminars, in each semester, general proficiency tests are organized.The General Proficiency test is split into two classes: Yes -student generally participated proficiency, No -Student not generally participated proficiency.• ATT -Attendance of Student.A minimum of 70% attendance is compulsory to participate in the End Semester Examination.However, albeit in exceptional cases, low-attendance students also participate in the End Semester Examination for genuine reasons.Attendance is split into three classes: Poor -<60%, Average -> 60%, and <80%, Good ->80%.• EP -Education Project.The education project is split into two classes: Yes -student completed education project, No -student not completed education project.Education project as a course is a credit load with a grading system of "A" >70, "B" <70 & >60, "C" <60 & >50, and "F" <50.• CGPA -Cumulative Grade Point Average obtained in PDE session(s) has been declared the response variable.It is split into four class values: Distinction >4.50, Credit <4.50 & >3.50, Merit < 3.50 & >2.50, and Fail <2.50.

Table 3 .
Gain values