en:Explorer - Classification (3.5.6)
From WekaDoc
En-ExplorerGuide_Classification-353.png
| Table of contents |
Selecting a Classifier
At the top of the classify section is the Classifier box. This box has a text field that gives the name of the currently selected classifier, and its options. Clicking on the text box with the left mouse button brings up a GenericObjectEditor dialog box, just the same as for filters, that you can use to configure the options of the current classifier. With a right click (or Alt+Shift+left click) you can once again copy the setup string to the clipboard or display the properties in a GenericObjectEditor dialog box. The Choose button allows you to choose one of the classifiers that are available in WEKA.
Test Options
The result of applying the chosen classifier will be tested according to the options that are set by clicking in the Test options box. There are four test modes:
- Use training set. The classifier is evaluated on how well it predicts the class of the instances it was trained on.
- Supplied test set. The classifier is evaluated on how well it predicts the class of a set of instances loaded from a file. Clicking the Set... button brings up a dialog allowing you to choose the file to test on.
- Cross-validation. The classifier is evaluated by cross-validation, using the number of folds that are entered in the Folds text field.
- Percentage split. The classifier is evaluated on how well it predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.
Note: No matter which evaluation method is used, the model that is output is always the one build from all the training data.
Further testing options can be set by clicking on the More options... button:
- Output model. The classification model on the full training set is output so that it can be viewed, visualized, etc. This option is selected by default.
- Output per-class stats. The precision/recall and true/false statistics for each class are output. This option is also selected by default.
- Output entropy evaluation measures. Entropy evaluation measures are included in the output. This option is not selected by default.
- Output confusion matrix. The confusion matrix of the classifier's predictions is included in the output. This option is selected by default.
- Store predictions for visualization. The classifier's predictions are remembered so that they can be visualized. This option is selected by default.
- Output predictions. The predictions on the evaluation data are output. Note that in the case of a cross-validation the instance numbers do not correspond to the location in the data!
- Cost-sensitive evaluation. The errors is evaluated with respect to a cost matrix. The Set... button allows you to specify the cost matrix used.
- Random seed for xval / % Split. This specifies the random seed used when randomizing the data before it is divided up for evaluation purposes.
- Preserve order for % Split. This suppresses the randomization of the data before splitting into train and test set.
- Output source code. If the classifier can output the built model as Java source code, you can specify the class name here. The code will be printed in the Classifier output area.
The Class Attribute
The classifiers in WEKA are designed to be trained to predict a single `class' attribute, which is the target for prediction. Some classifiers can only learn nominal classes; others can only learn numeric classes (regression problems); still others can learn both.
By default, the class is taken to be the last attribute in the data. If you want to train a classifier to predict a different attribute, click on the box below the Test options box to bring up a drop-down list of attributes to choose from.
Training a Classifier
Once the classifier, test options and class have all been set, the learning process is started by clicking on the Start button. While the classifier is busy being trained, the little bird moves around. You can stop the training process at any time by clicking on the Stop button.
When training is complete, several things happen. The Classifier output area to the right of the display is filled with text describing the results of training and testing. A new entry appears in the Result list box. We look at the result list below; but first we investigate the text that has been output.
The Classifier Output Text
The text in the Classifier output area has scroll bars allowing you to browse the results. Of course, you can also resize the Explorer window to get a larger display area. The output is split into several sections:
- Run information. A list of information giving the learning scheme options, relation name, instances, attributes and test mode that were involved in the process.
- Classifier model (full training set). A textual representation of the classification model that was produced on the full training data.
- The results of the chosen test mode are broken down thus:
- Summary. A list of statistics summarizing how accurately the classifier was able to predict the true class of the instances under the chosen test mode.
- Detailed Accuracy By Class. A more detailed per-class break down of the classifier's prediction accuracy.
- Confusion Matrix. Shows how many instances have been assigned to each class. Elements show the number of test examples whose actual class is the row and whose predicted class is the column.
- Source code (optional). This section lists the Java source code if one chose Output source code in the More options dialog.
The Result List
After training several classifiers, the result list will contain several entries. Left-clicking the entries flicks back and forth between the various results that have been generated. Right-clicking an entry invokes a menu containing these items:
- View in main window. Shows the output in the main window (just like left-clicking the entry).
- View in separate window. Opens a new independent window for viewing the results.
- Save result buffer. Brings up a dialog allowing you to save a text file containing the textual output.
- Load model. Loads a pre-trained model object from a binary file.
- Save model. Saves a model object to a binary file. Objects are saved in Java `serialized object' form.
- Re-evaluate model on current test set. Takes the model that has been built and tests its performance on the data set that has been specified with the Set.. button under the Supplied test set option.
- Visualize classifier errors. Brings up a visualization window that plots the results of classification. Correctly classified instances are represented by crosses, whereas incorrectly classified ones show up as squares.
- Visualize tree or Visualize graph. Brings up a graphical representation of the structure of the classifier model, if possible (i.e. for decision trees or Bayesian networks). The graph visualization option only appears if a Bayesian network classifier has been built. In the tree visualizer, you can bring up a menu by right-clicking a blank area, pan around by dragging the mouse, and see the training instances at each node by clicking on it. CTRL-clicking zooms the view out, while SHIFT-dragging a box zooms the view in. The graph visualizer should be self-explanatory.
- Visualize margin curve. Generates a plot illustrating the prediction margin. The margin is defined as the difference between the probability predicted for the actual class and the highest probability predicted for the other classes. For example, boosting algorithms may achieve better performance on test data by increasing the margins on the training data.
- Visualize threshold curve. Generates a plot illustrating the tradeoffs in prediction that are obtained by varying the threshold value between classes. For example, with the default threshold value of 0.5, the predicted probability of `positive' must be greater than 0.5 for the instance to be predicted as `positive'. The plot can be used to visualize the precision/recall tradeoff, for ROC curve analysis (true positive rate vs false positive rate), and for other types of curves.
- Visualize cost curve. Generates a plot that gives an explicit representation of the expected cost, as described by Drummond and Holte (2000).
- Plugins. This menu item only appears if there are visualization plugins available (by default: none). More about these plugins can be found in the WekaWiki article Explorer visualization plugins.
Options are greyed out if they do not apply to the specific set of results.
