Frequently Asked Questions
From WekaWiki
General
What's the difference between book and developer version?
The book version, as the name indicates, is tied to the Data Mining book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) by Ian H. Witten (http://www.cs.waikato.ac.nz/~ihw) and Eibe Frank (http://www.cs.waikato.ac.nz/~eibe). The API of this version of WEKA got frozen with the publication of the book in 2005 (actually a bit earlier, since publications take quite a while). It therefore only receives bug-fixes but no new features, like new classifiers, filters, etc. The developer version then became the code branch for active development and has received numerous extensions and enhancements. Any contribution must be compatible with this version.
Where can I get old versions of Weka?
If you need a specific version of Weka, e.g., due to some third-party tools, go Weka's project page (http://sourceforge.net/projects/weka/) on Sourceforge.net (http://sourceforge.net/). In the download section (http://sourceforge.net/project/showfiles.php?group_id=5091) you have access to all the releases ever made.
Using Weka
How do I use Weka from command line?
Reading the Weka Primer article will help you understand the usage of the command line, as well as the How to run Weka schemes from commandline article.
Can I check my CLASSPATH from within Weka?
Yes, you can. Just start up the SimpleCLI and issue the following command:
java weka.core.SystemInfo
Look for the property java.class.path, which lists the CLASSPATH Weka was started with.
Can I check how much memory is available for Weka?
You can easily check, how much memory Weka can use (this depends on the maximum heap size the Java Virtual Machine was started with).
- developer version
- start the SimpleCLI
- run the following command:
java weka.core.SystemInfo - the property
memory.maxlists the maximum amount of memory available to Weka
- book version (and developer version)
- start the Explorer
- right-click in the log panel
- select Memory information to output the information to the log
In case you should run into an OutOfMemoryException, you will have to increase the maximum heap size. How much you can allocate, depends heavily on the operating system and the underlying hardware (see sections 32-Bit and 64-Bit of the Java Virtual Machine article). Also, have a look at the OutOfMemoryException section further down.
Where does Weka look for .props files?
Weka not only uses the .props files that are present in the jar archive, but also the ones in the user's home directory and the current directory, i.e., the one Weka was started from. For a complete overview, see the section Precedence in the Properties file article.
How do I use libsvm in Weka?
If you run the classifier weka.classifiers.functions.LibSVM and get the libsvm classes not in CLASSPATH! error message, you are missing the libsvm jar archive in your current classpath. The LibSVM classifier is only a wrapper and doesn't need the libsvm classes to compile (uses Reflection). Check out the LibSVM article for details about how to use this classifier.
The snowball stemmers don't work, what am I doing wrong?
When you're trying to use the Snowball stemmers in the StringToWordVector nothing happens and you get the message Stemmer 'porter' unknown! in the console. If this happens, you don't have the snowball classes in your classpath. Check out the article about the Stemmers for how to add the snowball stemmers to Weka.
Can I make a screenshot of a plot or graph directly in Weka?
Yes, you can. The currently supported formats are BMP, EPS, JPEG and PNG. The magic button is Alt+Shift+Left-Click.
Can I change the colors (background, axes, etc.) of the plots in Weka?
Sure, this information is stored in the Visualize.props properties file:
-
weka.gui.visualize.Plot2D.axisColourdefines the color of the axes -
weka.gui.visualize.Plot2D.backgroundColoursets the background color
For more information see the articles about Properties file (especially the section Precedence will tell you where to place the .props file.) and Visualize.props itself.
How do I connect to a database?
With a bit of effort you can easily access databases via JDBC. You need the following:
- JDBC driver for the database you want to access in your CLASSPATH.
- A customized DatabaseUtils.props file. The following example files are located in the
weka/experimentdirectory of theweka.jararchive:- HSQLDB -
DatabaseUtils.props.hsql(>= 3.4.1/3.5.0) - MS SQL Server 2000 -
DatabaseUtils.props.mssqlserver(>= 3.4.9/3.5.4) - MS SQL Server 2005 Express Edition -
DatabaseUtils.props.mssqlserver2005(> 3.4.10/3.5.5) - MySQL -
DatabaseUtils.props.mysql(>= 3.4.9/3.5.4) - ODBC -
DatabaseUtils.props.odbc(>= 3.4.9/3.5.4) - Oracle -
DatabaseUtils.props.oracle(>= 3.4.9/3.5.4) - PostgreSQL -
DatabaseUtils.props.postgresql(>= 3.4.9/3.5.4)
- HSQLDB -
For more details see the following articles:
How to resolve an OutOfMemoryException?
When Java Virtual Machine starts, one needs to tell it how much memory it can allocate at maximum. In case of machine learning and data mining, the default can be quite often not sufficient. See the Invocation section of the Java Virtual Machine article.
How do I make predictions with a trained model?
Since Weka allows models to be saved (as Java binary serialized objects), one can use those models again to perform predictions. Check out the article Making predictions for more details.
How can I track instances in Weka?
Weka doesn't support internal IDs for instances, one has to use ID attributes. See How do I use ID attributes.
How do I perform attribute selection?
Weka offers different approaches for performing attribute selection: 1. directly with the attribute selection classes, 2. with a meta-classifier, and 3. with a filter. Check out the Performing attribute selection article for more details and examples.
How do I generate compatible train and test sets that get processed with a filter?
Running a filter twice, once with the train set as input and then the second time with the test set, will create almost certainly two incompatible files. Why is that? Every time you run a filter, it will get initialized based on the input data, and, of course, training and test set will differ, hence creating incompatible output. You can avoid this by using batch filtering. See the article on Batch filtering for more details.
How do I perform clustering?
Weka offers clustering capabilities not only as standalone schemes, but also as filters and classifiers. Check out the article about Using cluster algorithms more detailed information.
Where can I find information regarding ROC curves?
Just check out the ROC category (http://weka.sourceforge.net/wiki/index.php/Category:ROC), which lists all the articles covering the subject of ROC curves and AUC. These articles cover GUI handling as well as how to create ROC curves from code.
How do I generate Learning curves?
You can generate learning curves using the Advanced mode of the Experimenter. See the article Learning curves for more details.
Can I tune the parameters of a classifier?
Yes, you can do that with one of the following meta-classifiers:
-
weka.classifiers.meta.CVParameterSelection -
weka.classifiers.meta.GridSearch(only developer version)
See the Javadoc of the respective classifier or the Optimizing parameters article for more information.
How do I use ID attributes?
See the section Instance ID in the Troubleshooting article for more information of how to use attribute IDs in Weka.
How can I perform multi-instance learning in Weka?
The article Multi-instance classification explains what classifiers can perform multi-instance classification and what format the data has to be in for these multi-instance classifiers.
How do I perform cost-sensitive classification?
The article Cost-sensitive classification lists further articles that cover this topic.
I have unbalanced data - now what?
You can either perform Cost-sensitive classification or resample your data to get a more balanced class distribution (see supervised Resample (http://weka.sourceforge.net/doc.dev/weka/filters/supervised/instance/Resample.html) filter).
Can I run an experiment using clusterers in the Experimenter?
Yes, see the section Running an Experiment Using Clusterers in the wekadoc.
Developing with Weka
Where can I get Weka's source code?
Every Weka release comes with a jar archive (this is just a simple ZIP archive) that contains the complete sources. It is called weka-src.jar. Alternatively, you can get Weka's source code also from CVS.
How do I compile Weka?
You can compile the source code simply with any (Sun-compliant) java compiler, or use ant, or an IDE. Check out the article about Compiling Weka, which contains links to further articles, covering topics about ant and IDEs (e.g., Eclipse, NetBeans or JBuilder).
What is CVS and what do I need to do to access it?
CVS (Concurrent Versions System) is the version control system that we use for Weka's source code. In order to access the CVS repository and retrieve the source code from there, you need a CVS client. Check out the article about CVS for HOWTOs on various clients. If you want specific versions of Weka, check out the CVS section on WekaDoc for the version you're interested in.
How do I get the latest bugfixes?
The article How to get the latest bugfixes explains it in detail (it's basically either obtaining the source code from CVS and compiling it yourself or getting a snapshot from the download section).
How do I use Weka's classes in my own code?
It's not that hard to use Weka classes in your own code, the following articles give a good overview of how to do that:
- Use Weka in your Java code
- Programmatic Use
- In general, the articles of the source code (http://weka.sourceforge.net/wiki/index.php/Category:Source_code) category.
Note: Weka is open-source software under the GNU General Public License (http://www.gnu.org/copyleft/gpl.html), which means that your code has to be licensed under the GPL as well.
How do I write a new classifier or filter?
Basically, a classifier needs to be derived from weka.classifiers.Classifier and a filter from weka.filters.Filter. But this is only part of the story. The following articles cover the development of new schemes in great detail:
If your scheme is outside the usual Weka packages, you need to make Weka aware of this package in order to be able to use it in the GUI as well. See How do I add a new classifier, filter, kernel, etc.? for more information about this.
How do I add a new classifier, filter, kernel, etc.?
As of Weka 3.4.4, all the derived classes of superclasses that can be edited in the GenericObjectEditor, like subclasses of weka.classifiers.Classifier for instance, are determined dynamically at runtime. Read here for more information.
Can I compile Weka into native code?
Yes, you have the following options:
- Excelsior JET (http://www.excelsior-usa.com/jet.html) - a commercial tool for compiling Java into native code (Windows/Linux)
- gcj (http://gcc.gnu.org/java/) - a free, cross-platform tool for compiling Java into native code
see the article Compiling Weka with gcj for more details.
Can I use Weka from C#?
Yes, you can. Read the Use Weka with the Microsoft .NET Framework article for more information. There is also a tutorial for IKVM available.
Can I use Weka from Python?
Yes and no. If you're starting from scratch, you might want to consider Jython (http://www.jython.org/), a rewrite of Python (http://www.python.org/) to seamlessly integrate with Java. The drawback is, that you can only use the libraries that Jython implements, not others like NumPy (http://numpy.scipy.org/) or SciPy (http://www.scipy.org/). The article Using Weka from Jython explains how to use Weka classes from Jython and how to implement a new classifier in Jython, with an example of ZeroR implemented in Jython.
An approach making use of the javax.script package (new in Java 6) is Jepp (http://jepp.sourceforge.net/), Java embedded Python. Jepp seems to have the same limitations as Jython, not being able to import Scipy or Numpy, but one can import pure Python libraries. The arcticle Using Weka via Jepp contains more information and examples.
Another solution, to access Java from within Python applications is JPype (http://jpype.sourceforge.net/), but It's still not fully matured.
Serialization is nice, but what about generating actual Java code from Weka classes?
Some of Weka's schemes support the generation of Java source code based on their internal state. See the Generating source code from Weka classes article for more details.
How can I contribute to Weka?
Contributions (or links to them) can be either posted to the Weka Mailing List or sent to the Weka maintainer (normally also the admin of the Weka homepage). The conditions for new classifiers (schemes in general) are that, firstly, they have to be published in the proceedings of a renowned conference (e.g., ICML) or as an article of respected journal (e.g., Machine Learning) and, secondly, that they outperform other standard schemes (e.g., J48/C4.5).
But please bear in mind, that we don't have a lot of man power, i.e., being the Weka maintainer is NOT a full-time position.
Windows
How do I modify the CLASSPATH?
- See the article CLASSPATH and check out the section Win32 (2k and XP) for changing the environment variable. This article explains how to add a MySQL jar to the variable.
- With version 3.5.4 or later you can also just use the
RunWeka.inifile to modify your CLASSPATH. For more details, see this (https://list.scms.waikato.ac.nz/mailman/htdig/wekalist/2007-March/009429.html) post.
How do I modify the RunWeka.bat file?
Check out the documentation for your version of Weka on the WekaDoc Wiki, since that file underwent several modifications over time. There's documentation about the CLASSPATH (under Technical documentation in the Appendix) which contains a section for RunWeka.bat.
Troubleshooting
OutOfMemoryException
See the OutOfMemoryException section in the troubleshooting article for more information.
More troubleshooting
If you have further problems with Weka, check out the troubleshooting article as well.
