

(20 Points) After training the classifier, the full decision tree is output for your perusal you may need to scroll up for this. J48 is the Weka implementation of the C4.5 algorithm, which uses the normalized information gain criterion to build a decision tree for classification.

Select classifiers > trees > J48 and click Start. First, train a decision tree classifier with default options. This way, we will train the classifiers using 90% of the training data and evaluate their performance on the remaining 10%. (20 Points) On the Classify tab, select the Percentage split test option and change its value to 90%. Visualise the data again to verify that the invalid data point was removed. Click Ok to set the parameters and Apply to apply the filter to the data. Set the attribute index to 13 (Age) and set the split point at 0. Click on the text of this filter to change the parameters.

Select filters > unsupervised > instance > RemoveWithValues. In the Preprocess tab click on Choose in the Filter pane. We want to remove all instances, where the age of an applicant is lower than 0 years, as this suggests that the instance is corrupted. (10 Points) To remove this instance from the dataset we will use a filter. How do you think it would affect Decision trees? A good way to check this is to test the performance of each classifier before and after removing this datapoint. Even a single point like this can significantly affect the performance of a classifier. (5 Points) In the previous point you should have found a data point, which seems to be corrupted, as some of its values are nonsensical. Do you notice anything unusual? You can click on any data point to display all it's values. Try visualising a scatter plot of age and duration. Click on any of the scatter plots to open a new window which shows the scatter plot for two selected attributes. (5 Points) When presented with a dataset, it is usually a good idea to visualise it first. Download the credit_Dataset.arff dataset and load it to Weka.
