en:Experimenter - Analysing Results (3.5.3)
From WekaDoc
| Table of contents |
Setup
Weka includes an experiment analyser that can be used to analyse the results of experiments that were sent to an InstancesResultListener. The experiment
shown below uses 3 schemes, ZeroR, OneR, and J48, to classify the Iris data in an experiment using 10 train and test runs, with 66% of the data used for training and 34% used for testing.
En-Experimenter_Analyser01-353.png
After the experiment setup is complete, run the experiment. Then, to analyse the results, select the Analyse tab at the top of the Experiment Environment window.
Click on Experiment to analyse the results of the current experiment.
En-Experimenter_Analyser02-353.png
The number of result lines available (Got 30 results) is shown in the Source panel. This experiment consisted of 10 runs, for 3 schemes, for 1 dataset, for a total of 30 result lines. Results can also be loaded from an earlier experiment file by clicking File and loading the appropriate .arff results file. Similarly, results sent to a database (using the DatabaseResultListener) can be loaded from the database.
Select the Percent_correct attribute from the Comparison field and click Perform test to generate a comparison of the 3 schemes.
En-Experimenter_Analyser03-353.png
The schemes used in the experiment are shown in the columns and the datasets used are shown in the rows.
The percentage correct for each of the 3 schemes is shown in each dataset row: 33.33% for ZeroR, 94.31% for OneR, and 94.90% for J48. The annotation v or * indicates that a specific result is statistically better (v) or worse (*) than the baseline scheme (in this case, ZeroR) at the significance level specified (currently 0.05). The results of both OneR and J48 are statistically better than the baseline established by ZeroR. At the bottom of each column
after the first column is a count (xx/ yy/ zz) of the number of times that the scheme was better than (xx), the same as (yy), or worse than (zz) the baseline
scheme on the datasets used in the experiment. In this example, there was only one dataset and OneR was better than ZeroR once and never equivalent to or worse than ZeroR (1/0/0); J48 was also better than ZeroR on the dataset.
The standard deviation of the attribute being evaluated can be generated by selecting the Show std. deviations check box and hitting Perform test again. The value (10) at the beginning of the iris row represents the number of estimates that are used to calculate the standard deviation (the number of runs in this case).
En-Experimenter_Analyser04-353.png
Selecting Number_correct as the comparison field and clicking Perform test generates the average number correct (out of 50 test patterns - 33% of 150 patterns in the Iris dataset).
En-Experimenter_Analyser05-353.png
Clicking on the button for the Output format leads to a dialog that lets you choose the precision for the mean and the std. deviations, as well as the format of the output. Checking the Show Average checkbox adds an additional line to the output listing the average of each column. With the Remove filter classnames checkbox one can remove the filter name and options from processed datasets (filter names in Weka can be quite lengthy).
The following formats are supported:
- CSV
- GNUPlot
- HTML
- LaTeX
- Plain text (default)
- Significance only
En-Experimenter_Analyser_OutputFormat-353.png
Saving the Results
The information displayed in the Test output panel is controlled by the currently-selected entry in the Result list panel. Clicking on an entry causes the results corresponding to that entry to be displayed.
En-Experimenter_Analyser06-353.png
The results shown in the Test output panel can be saved to a file by clicking Save output. Only one set of results can be saved at a time but Weka permits the user to save all results to the same dataset by saving them one at a time and using the Append option instead of the Overwrite option for the second and subsequent saves.
En-Experimenter_Analyser07-353.png
Changing the Baseline Scheme
The baseline scheme can be changed by clicking Select base... and then selecting the desired scheme. Selecting the OneR scheme causes the other schemes to be compared individually with the OneR scheme.
En-Experimenter_Analyser08-353.png
If the test is performed on the Percent_correct field with OneR as the base
scheme, the system indicates that there is no statistical difference between the results for OneR and J48. There is however a statistically significant difference between OneR and ZeroR.
En-Experimenter_Analyser09-353.png
Statistical Significance
The term statistical significance used in the previous section refers to the result of a pair-wise comparison of schemes using either a standard T-Test or the corrected resampled T-Test. The latter test is the default, because the standard T-Test can generate too many significant differences due to dependencies in the estimates (in particular when anything other than one run of an x-fold cross-validation is used). For more information on the T-Test, consult the Weka text (Data Mining by I. Witten and E. Frank) or an introductory statistics text. As the significance level is decreased, the confidence in the conclusion increases.
In the current experiment, there is not a statistically significant difference between the OneR and J48 schemes.
Summary Test
Selecting Summary from Test base and performing a test causes the following information to be generated.
En-Experimenter_Analyser10-353.png
In this experiment, the first row (- 1 1) indicates that column b (OneR) is
better than row a (ZeroR) and that column c (J48) is also better than
row a. The number in brackets represents the number of significant wins for the column with regard to the row. A 0 means that the scheme in the corresponding column did not score a single (significant) win with regard to the scheme in the row.
Ranking Test
Selecting Ranking from Test base causes the following information to be generated.
En-Experimenter_Analyser11-353.png
The ranking test ranks the schemes according to the total wins (>) and losses (<) against the other schemes. The first column (>-<) is the difference between the number of wins and the number of losses.
