en:Experimenter - Standard Experiments - Advanced (3.5.4)

From WekaDoc

Table of contents

Defining an Experiment

When the Experimenter is started in Advanced mode, the Setup tab is displayed. Click New to initialize an experiment. This causes default parameters to be defined for the experiment.

Enlarge

To define the dataset to be processed by a scheme, first select Use relative paths in the Datasets panel of the Setup tab and then click on Add new... to open a dialog window.

Enlarge

Double click on the data folder to view the available datasets or navigate to an alternate location. Select iris.arff and click Open to select the Iris dataset.

Enlarge
Enlarge

The dataset name is now displayed in the Datasets panel of the Setup tab.

Saving the Results of the Experiment

To identify a dataset to which the results are to be sent, click on the InstancesResultListener entry in the Destination panel. The output file parameter is near the bottom of the window, beside the text outputFile. Click on this parameter to display a file selection window.

Enlarge
Enlarge

Type the name of the output file, click Select, and then click close (x). The file name is displayed in the outputFile panel. Click on OK to close the window.

Enlarge

The dataset name is displayed in the 'Destination panel of the Setup tab.

Enlarge

Saving the Experiment Definition

The experiment definition can be saved at any time. Select Save... at the top of the Setup tab. Type the dataset name with the extension exp (or select the dataset name if the experiment definition dataset already exists) for binary files or choose Experiment configuration files (*.xml) from the file types combobox (the XML files are robust with respect to version changes).

Enlarge

The experiment can be restored by selecting Open in the Setup tab and then selecting Experiment1.exp in the dialog window.

Running an Experiment

To run the current experiment, click the Run tab at the top of the Experiment Environment window. The current experiment performs 10 randomized train and test runs on the Iris dataset, using 66% of the patterns for training and 34% for testing, and using the ZeroR scheme.

Enlarge

Click Start to run the experiment.

Enlarge

If the experiment was defined correctly, the 3 messages shown above will be displayed in the Log panel. The results of the experiment are saved to the dataset Experiment1.arff. The first few lines in this dataset are shown below.

 @relation InstanceResultListener

 @attribute Key_Dataset {iris}
 @attribute Key_Run {1,2,3,4,5,6,7,8,9,10}
 @attribute Key_Scheme {weka.classifiers.rules.ZeroR,weka.classifiers.trees.J48}
 @attribute Key_Scheme_options {,'-C 0.25 -M 2'}
 @attribute Key_Scheme_version_ID {48055541465867954,-217733168393644444}
 @attribute Date_time numeric
 @attribute Number_of_training_instances numeric
 @attribute Number_of_testing_instances numeric
 @attribute Number_correct numeric
 @attribute Number_incorrect numeric
 @attribute Number_unclassified numeric
 @attribute Percent_correct numeric
 @attribute Percent_incorrect numeric
 @attribute Percent_unclassified numeric
 @attribute Kappa_statistic numeric
 @attribute Mean_absolute_error numeric
 @attribute Root_mean_squared_error numeric
 @attribute Relative_absolute_error numeric
 @attribute Root_relative_squared_error numeric
 @attribute SF_prior_entropy numeric
 @attribute SF_scheme_entropy numeric
 @attribute SF_entropy_gain numeric
 @attribute SF_mean_prior_entropy numeric
 @attribute SF_mean_scheme_entropy numeric
 @attribute SF_mean_entropy_gain numeric
 @attribute KB_information numeric
 @attribute KB_mean_information numeric
 @attribute KB_relative_information numeric
 @attribute True_positive_rate numeric
 @attribute Num_true_positives numeric
 @attribute False_positive_rate numeric
 @attribute Num_false_positives numeric
 @attribute True_negative_rate numeric
 @attribute Num_true_negatives numeric
 @attribute False_negative_rate numeric
 @attribute Num_false_negatives numeric
 @attribute IR_precision numeric
 @attribute IR_recall numeric
 @attribute F_measure numeric
 @attribute Area_under_ROC numeric
 @attribute Time_training numeric
 @attribute Time_testing numeric
 @attribute Summary {'Number of leaves: 3\nSize of the tree: 5\n','Number of leaves: 5\nSize of the tree: 9\n','Number of leaves: 4\nSize of the tree: 7\n'}
 @attribute measureTreeSize numeric
 @attribute measureNumLeaves numeric
 @attribute measureNumRules numeric

 @data
 
 iris,1,weka.classifiers.rules.ZeroR,,48055541465867954,20051221.033,99,51,
 17,34,0,33.333333,66.666667,0,0,0.444444,0.471405,100,100,80.833088,80.833088,
 0,1.584963,1.584963,0,0,0,0,1,17,1,34,0,0,0,0,0.333333,1,0.5,0.5,0,0,?,?,?,?

Changing the Experiment Parameters

Changing the Classifier

The parameters of an experiment can be changed by clicking on the Result generator panel.

Enlarge

The RandomSplitResultProducer performs repeated train/test runs. The number of instances (expressed as a percentage) used for training is given in the trainPercent box. (The number of runs is specified in the Runs panel in the Setup tab.)

A small help file can be displayed by clicking More in the About panel.

Enlarge

Click on the splitEvaluator entry to display the SplitEvaluator properties.

Enlarge

Click on the classifier entry (ZeroR) to display the scheme properties.

Enlarge

This scheme has no modifiable properties (besides debug mode on/off) but most other schemes do have properties that can be modified by the user. The Capabilities button opens a small dialog listing all the attribute and class types this classifier can handle. Click on the Choose button to select a different scheme. The window below shows the parameters available for the J48 decision-tree scheme. If desired, modify the parameters and then click OK to close the window.

Enlarge

The name of the new scheme is displayed in the Result generator panel.

Enlarge

Adding Additional Schemes

Additional Schemes can be added in the Generator properties panel. To begin, change the dropdown list entry from Disabled to Enabled in the Generator properties panel.

Enlarge

Click Select property and expand splitEvaluator so that the classifier entry is visible in the property list; click Select.

Enlarge

The scheme name is displayed in the Generator properties panel.

Enlarge

To add another scheme, click on the Choose button to display the GenericObjectEditor window.

Enlarge

The Filter... button enables one to highlight classifiers that can handle certain attribute and class types. With the Remove filter button all the selected capabilities will get cleared and the highlighting removed again.

To change to a decision-tree scheme, select J48 (in subgroup trees).

Enlarge

The new scheme is added to the Generator properties panel. Click Add to add the new scheme.

Enlarge

Now when the experiment is run, results are generated for both schemes.

To add additional schemes, repeat this process. To remove a scheme, select the scheme by clicking on it and then click Delete.

Adding Additional Datasets

The scheme(s) may be run on any number of datasets at a time. Additional datasets are added by clicking Add new... in the Datasets panel. Datasets are deleted from the experiment by selecting the dataset and then clicking Delete Selected.

Raw Output

The raw output generated by a scheme during an experiment can be saved to a file and then examined at a later time. Open the ResultProducer window by clicking on the Result Generator panel in the Setup tab.

Enlarge

Click on rawOutput and select the True entry from the drop-down list. By default, the output is sent to the zip file splitEvaluatorOut.zip. The output file can be changed by clicking on the outputFile panel in the window. Now when the experiment is run, the result of each processing run is archived, as shown below.

Enlarge

The contents of the first run are:

ClassifierSplitEvaluator: weka.classifiers.trees.J48 -C 0.25 -M 2(version 
     -217733168393644444)Classifier model: 
J48 pruned tree
------------------

petalwidth <= 0.6: Iris-setosa (33.0)
petalwidth > 0.6
|   petalwidth <= 1.5: Iris-versicolor (31.0/1.0)
|   petalwidth > 1.5: Iris-virginica (35.0/3.0) 

Number of Leaves  : 	3

Size of the tree : 	5 


Correctly Classified Instances          47               92.1569 %
Incorrectly Classified Instances         4                7.8431 %
Kappa statistic                          0.8824
Mean absolute error                      0.0723
Root mean squared error                  0.2191
Relative absolute error                 16.2754 %
Root relative squared error             46.4676 %
Total Number of Instances               51     
measureTreeSize : 5.0
measureNumLeaves : 3.0
measureNumRules : 3.0

Other Result Producers

Cross-Validation Result Producer

Bug in ExperimenterDefaults: more... (https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2005-October/005146.html)

To change from random train and test experiments to cross-validation experiments, click on the Result generator entry. At the top of the window, click on the drop-down list and select CrossValidationResultProducer. The window now contains parameters specific to cross-validation such as the number of partitions/folds. The experiment performs 10-fold cross-validation instead of train and test in the given example.

Enlarge

The Result generator panel now indicates that cross-validation will be performed. Click on More to generate a brief description of the CrossValidationResultProducer.

Enlarge

As with the RandomSplitResultProducer, multiple schemes can be run during cross-validation by adding them to the Generator properties panel.

Enlarge

The number of runs is set to 1 in the Setup tab, so that only one run of cross-validation for each scheme and dataset is executed.

When this experiment is analysed, the following results are generated. Note that there are 30 (1 run times 10 folds times 3 schemes) result lines processed.

Enlarge

Averaging Result Producer

An alternative to the CrossValidationResultProducer is the AveragingResultProducer. This result producer takes the average of a set of runs (which are typically cross-validation runs). This result producer is identified by clicking the Result generator panel and then choosing the AveragingResultProducer from the GenericObjectEditor.

Enlarge

The associated help file is shown below.

Enlarge

Clicking the resultProducer panel brings up the following window.

Enlarge

As with the other ResultProducers, additional schemes can be defined. When the AveragingResultProducer is used, the classifier property is located deeper in the Generator properties hierarchy.

Enlarge
Enlarge

In this experiment, the ZeroR, OneR, and J48 schemes are run 10 times with 10-fold cross-validation. Each run of 10 cross-validation folds is then averaged, producing one result line for each run (instead of one result line for each fold as in the previous example using the CrossValidationResultProducer) for a total of 30 result lines. If the raw output is saved, all 300 results are sent to the archive.

Enlarge