en:Experimenter - Standard Experiments - Advanced (3.5.4)
From WekaDoc
| Table of contents |
Defining an Experiment
When the Experimenter is started in Advanced mode, the Setup tab is displayed. Click New to initialize an experiment. This causes default parameters to be defined for the experiment.
To define the dataset to be processed by a scheme, first select Use relative paths in the Datasets panel of the Setup tab and then click on Add new... to open a dialog window.
Double click on the data folder to view the available datasets or navigate to an alternate location. Select iris.arff and click Open to select the Iris dataset.
The dataset name is now displayed in the Datasets panel of the Setup tab.
Saving the Results of the Experiment
To identify a dataset to which the results are to be sent, click on the InstancesResultListener entry in the Destination panel. The output file parameter is near the bottom of the window, beside the text outputFile. Click on this parameter to display a file selection window.
Type the name of the output file, click Select, and then click close (x). The file name is displayed in the outputFile panel. Click on OK to close the window.
The dataset name is displayed in the 'Destination panel of the Setup tab.
Saving the Experiment Definition
The experiment definition can be saved at any time. Select Save... at the top of the Setup tab. Type the dataset name with the extension exp (or select the dataset name if the experiment definition dataset already exists) for binary files or choose Experiment configuration files (*.xml) from the file types combobox (the XML files are robust with respect to version changes).
The experiment can be restored by selecting Open in the Setup tab and then selecting Experiment1.exp in the dialog window.
Running an Experiment
To run the current experiment, click the Run tab at the top of the Experiment
Environment window. The current experiment performs 10 randomized train and
test runs on the Iris dataset, using 66% of the patterns for training and 34%
for testing, and using the ZeroR scheme.
Click Start to run the experiment.
If the experiment was defined correctly, the 3 messages shown above will be displayed in the Log panel. The results of the experiment are saved to the dataset Experiment1.arff. The first few lines in this dataset are shown below.
@relation InstanceResultListener
@attribute Key_Dataset {iris}
@attribute Key_Run {1,2,3,4,5,6,7,8,9,10}
@attribute Key_Scheme {weka.classifiers.rules.ZeroR,weka.classifiers.trees.J48}
@attribute Key_Scheme_options {,'-C 0.25 -M 2'}
@attribute Key_Scheme_version_ID {48055541465867954,-217733168393644444}
@attribute Date_time numeric
@attribute Number_of_training_instances numeric
@attribute Number_of_testing_instances numeric
@attribute Number_correct numeric
@attribute Number_incorrect numeric
@attribute Number_unclassified numeric
@attribute Percent_correct numeric
@attribute Percent_incorrect numeric
@attribute Percent_unclassified numeric
@attribute Kappa_statistic numeric
@attribute Mean_absolute_error numeric
@attribute Root_mean_squared_error numeric
@attribute Relative_absolute_error numeric
@attribute Root_relative_squared_error numeric
@attribute SF_prior_entropy numeric
@attribute SF_scheme_entropy numeric
@attribute SF_entropy_gain numeric
@attribute SF_mean_prior_entropy numeric
@attribute SF_mean_scheme_entropy numeric
@attribute SF_mean_entropy_gain numeric
@attribute KB_information numeric
@attribute KB_mean_information numeric
@attribute KB_relative_information numeric
@attribute True_positive_rate numeric
@attribute Num_true_positives numeric
@attribute False_positive_rate numeric
@attribute Num_false_positives numeric
@attribute True_negative_rate numeric
@attribute Num_true_negatives numeric
@attribute False_negative_rate numeric
@attribute Num_false_negatives numeric
@attribute IR_precision numeric
@attribute IR_recall numeric
@attribute F_measure numeric
@attribute Area_under_ROC numeric
@attribute Time_training numeric
@attribute Time_testing numeric
@attribute Summary {'Number of leaves: 3\nSize of the tree: 5\n','Number of leaves: 5\nSize of the tree: 9\n','Number of leaves: 4\nSize of the tree: 7\n'}
@attribute measureTreeSize numeric
@attribute measureNumLeaves numeric
@attribute measureNumRules numeric
@data
iris,1,weka.classifiers.rules.ZeroR,,48055541465867954,20051221.033,99,51,
17,34,0,33.333333,66.666667,0,0,0.444444,0.471405,100,100,80.833088,80.833088,
0,1.584963,1.584963,0,0,0,0,1,17,1,34,0,0,0,0,0.333333,1,0.5,0.5,0,0,?,?,?,?
Changing the Experiment Parameters
Changing the Classifier
The parameters of an experiment can be changed by clicking on the Result generator panel.
The RandomSplitResultProducer performs repeated train/test runs. The number of instances (expressed as a percentage) used for training is given in the trainPercent box. (The number of runs is specified in the Runs panel in the Setup tab.)
A small help file can be displayed by clicking More in the About panel.
Click on the splitEvaluator entry to display the SplitEvaluator properties.
Click on the classifier entry (ZeroR) to display the scheme properties.
This scheme has no modifiable properties (besides debug mode on/off) but most
other schemes do have properties that can be modified by the user. The Capabilities button opens a small dialog listing all the attribute and class types this classifier can handle. Click on the
Choose button to select a different scheme. The window below shows the
parameters available for the J48 decision-tree scheme. If desired, modify the
parameters and then click OK to close the window.
The name of the new scheme is displayed in the Result generator panel.
Adding Additional Schemes
Additional Schemes can be added in the Generator properties panel. To begin, change the dropdown list entry from Disabled to Enabled in the Generator properties panel.
Click Select property and expand splitEvaluator so that the classifier entry is visible in the property list; click Select.
The scheme name is displayed in the Generator properties panel.
To add another scheme, click on the Choose button to display the GenericObjectEditor window.
The Filter... button enables one to highlight classifiers that can handle certain attribute and class types. With the Remove filter button all the selected capabilities will get cleared and the highlighting removed again.
To change to a decision-tree scheme, select J48 (in subgroup trees).
The new scheme is added to the Generator properties panel. Click Add to add the new scheme.
Now when the experiment is run, results are generated for both schemes.
To add additional schemes, repeat this process. To remove a scheme, select the scheme by clicking on it and then click Delete.
Adding Additional Datasets
The scheme(s) may be run on any number of datasets at a time. Additional datasets are added by clicking Add new... in the Datasets panel. Datasets are deleted from the experiment by selecting the dataset and then clicking Delete Selected.
Raw Output
The raw output generated by a scheme during an experiment can be saved to a file and then examined at a later time. Open the ResultProducer window by clicking on the Result Generator panel in the Setup tab.
Click on rawOutput and select the True entry from the drop-down list. By default, the output is sent to the zip file splitEvaluatorOut.zip. The output file can be changed by clicking on the outputFile panel in the window. Now when the experiment is run, the result of each processing run is archived, as shown below.
The contents of the first run are:
ClassifierSplitEvaluator: weka.classifiers.trees.J48 -C 0.25 -M 2(version
-217733168393644444)Classifier model:
J48 pruned tree
------------------
petalwidth <= 0.6: Iris-setosa (33.0)
petalwidth > 0.6
| petalwidth <= 1.5: Iris-versicolor (31.0/1.0)
| petalwidth > 1.5: Iris-virginica (35.0/3.0)
Number of Leaves : 3
Size of the tree : 5
Correctly Classified Instances 47 92.1569 %
Incorrectly Classified Instances 4 7.8431 %
Kappa statistic 0.8824
Mean absolute error 0.0723
Root mean squared error 0.2191
Relative absolute error 16.2754 %
Root relative squared error 46.4676 %
Total Number of Instances 51
measureTreeSize : 5.0
measureNumLeaves : 3.0
measureNumRules : 3.0
Other Result Producers
Cross-Validation Result Producer
Bug in ExperimenterDefaults: more... (https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2005-October/005146.html)
To change from random train and test experiments to cross-validation experiments, click on the Result generator entry. At the top of the window, click on the drop-down list and select CrossValidationResultProducer. The window now contains parameters specific to cross-validation such as the number of partitions/folds. The experiment performs 10-fold cross-validation instead of train and test in the given example.
The Result generator panel now indicates that cross-validation will be performed. Click on More to generate a brief description of the CrossValidationResultProducer.
As with the RandomSplitResultProducer, multiple schemes can be run during cross-validation by adding them to the Generator properties panel.
The number of runs is set to 1 in the Setup tab, so that only one run of cross-validation for each scheme and dataset is executed.
When this experiment is analysed, the following results are generated. Note that there are 30 (1 run times 10 folds times 3 schemes) result lines processed.
Averaging Result Producer
An alternative to the CrossValidationResultProducer is the AveragingResultProducer. This result producer takes the average of a set of runs (which are typically cross-validation runs). This result producer is identified by clicking the Result generator panel and then choosing the AveragingResultProducer from the GenericObjectEditor.
The associated help file is shown below.
Clicking the resultProducer panel brings up the following window.
As with the other ResultProducers, additional schemes can be defined. When the AveragingResultProducer is used, the classifier property is located deeper in the Generator properties hierarchy.
In this experiment, the ZeroR, OneR, and J48 schemes are run 10 times with 10-fold cross-validation. Each run of 10 cross-validation folds is then averaged, producing one result line for each run (instead of one result line for each fold as in the previous example using the CrossValidationResultProducer) for a total of 30 result lines. If the raw output is saved, all 300 results are sent to the archive.
