en:Experimenter - Analysing Results (3.5.3)

From WekaDoc

Table of contents

Setup

Weka includes an experiment analyser that can be used to analyse the results of experiments that were sent to an InstancesResultListener. The experiment shown below uses 3 schemes, ZeroR, OneR, and J48, to classify the Iris data in an experiment using 10 train and test runs, with 66% of the data used for training and 34% used for testing.

Missing image
En-Experimenter_Analyser01-353.png

After the experiment setup is complete, run the experiment. Then, to analyse the results, select the Analyse tab at the top of the Experiment Environment window.

Click on Experiment to analyse the results of the current experiment.

Missing image
En-Experimenter_Analyser02-353.png

The number of result lines available (Got 30 results) is shown in the Source panel. This experiment consisted of 10 runs, for 3 schemes, for 1 dataset, for a total of 30 result lines. Results can also be loaded from an earlier experiment file by clicking File and loading the appropriate .arff results file. Similarly, results sent to a database (using the DatabaseResultListener) can be loaded from the database.

Select the Percent_correct attribute from the Comparison field and click Perform test to generate a comparison of the 3 schemes.

Missing image
En-Experimenter_Analyser03-353.png

The schemes used in the experiment are shown in the columns and the datasets used are shown in the rows.

The percentage correct for each of the 3 schemes is shown in each dataset row: 33.33% for ZeroR, 94.31% for OneR, and 94.90% for J48. The annotation v or * indicates that a specific result is statistically better (v) or worse (*) than the baseline scheme (in this case, ZeroR) at the significance level specified (currently 0.05). The results of both OneR and J48 are statistically better than the baseline established by ZeroR. At the bottom of each column after the first column is a count (xx/ yy/ zz) of the number of times that the scheme was better than (xx), the same as (yy), or worse than (zz) the baseline scheme on the datasets used in the experiment. In this example, there was only one dataset and OneR was better than ZeroR once and never equivalent to or worse than ZeroR (1/0/0); J48 was also better than ZeroR on the dataset.

The standard deviation of the attribute being evaluated can be generated by selecting the Show std. deviations check box and hitting Perform test again. The value (10) at the beginning of the iris row represents the number of estimates that are used to calculate the standard deviation (the number of runs in this case).


Missing image
En-Experimenter_Analyser04-353.png

Selecting Number_correct as the comparison field and clicking Perform test generates the average number correct (out of 50 test patterns - 33% of 150 patterns in the Iris dataset).

Missing image
En-Experimenter_Analyser05-353.png

Clicking on the button for the Output format leads to a dialog that lets you choose the precision for the mean and the std. deviations, as well as the format of the output. Checking the Show Average checkbox adds an additional line to the output listing the average of each column. With the Remove filter classnames checkbox one can remove the filter name and options from processed datasets (filter names in Weka can be quite lengthy).

The following formats are supported:

  • CSV
  • GNUPlot
  • HTML
  • LaTeX
  • Plain text (default)
  • Significance only
Missing image
En-Experimenter_Analyser_OutputFormat-353.png

Saving the Results

The information displayed in the Test output panel is controlled by the currently-selected entry in the Result list panel. Clicking on an entry causes the results corresponding to that entry to be displayed.

Missing image
En-Experimenter_Analyser06-353.png

The results shown in the Test output panel can be saved to a file by clicking Save output. Only one set of results can be saved at a time but Weka permits the user to save all results to the same dataset by saving them one at a time and using the Append option instead of the Overwrite option for the second and subsequent saves.

Missing image
En-Experimenter_Analyser07-353.png

Changing the Baseline Scheme

The baseline scheme can be changed by clicking Select base... and then selecting the desired scheme. Selecting the OneR scheme causes the other schemes to be compared individually with the OneR scheme.

Missing image
En-Experimenter_Analyser08-353.png

If the test is performed on the Percent_correct field with OneR as the base scheme, the system indicates that there is no statistical difference between the results for OneR and J48. There is however a statistically significant difference between OneR and ZeroR.

Missing image
En-Experimenter_Analyser09-353.png

Statistical Significance

The term statistical significance used in the previous section refers to the result of a pair-wise comparison of schemes using either a standard T-Test or the corrected resampled T-Test. The latter test is the default, because the standard T-Test can generate too many significant differences due to dependencies in the estimates (in particular when anything other than one run of an x-fold cross-validation is used). For more information on the T-Test, consult the Weka text (Data Mining by I. Witten and E. Frank) or an introductory statistics text. As the significance level is decreased, the confidence in the conclusion increases.

In the current experiment, there is not a statistically significant difference between the OneR and J48 schemes.

Summary Test

Selecting Summary from Test base and performing a test causes the following information to be generated.

Missing image
En-Experimenter_Analyser10-353.png

In this experiment, the first row (- 1 1) indicates that column b (OneR) is better than row a (ZeroR) and that column c (J48) is also better than row a. The number in brackets represents the number of significant wins for the column with regard to the row. A 0 means that the scheme in the corresponding column did not score a single (significant) win with regard to the scheme in the row.

Ranking Test

Selecting Ranking from Test base causes the following information to be generated.

Missing image
En-Experimenter_Analyser11-353.png

The ranking test ranks the schemes according to the total wins (>) and losses (<) against the other schemes. The first column (>-<) is the difference between the number of wins and the number of losses.