en:Databases (3.5.4)

From WekaDoc

(Redirected from en:Databases (3.5.6))
Table of contents

CLASSPATH

See the CLASSPATH article for how to set up your CLASSPATH environment variable, in order to make the JDBC driver available for Weka.

Configuration files

Thanks to JDBC it is easy to connect to Databases that provide a JDBC driver. The base setup is loaded from a properties file named

DatabaseUtils.props

and located in your user.home or in the current directory. The default properties file is located in the weka.experiment package. It refers to a idb JDBC connection to a InstantDB database; InstantDB seems a [product (http://en.wikipedia.org/wiki/Lutris_Technologiesdead)] no supported anymore.

You can get sample properties file from the weka.jar or weka-src.jar jar-archive, both part of a normal Weka release. If you open up one of those files, you'll find the properties file in the sub-folder weka/experiment.

Weka already comes with example setup files for different databases:

  • HSQLDB (http://hsqldb.sourceforge.net/)
    DatabaseUtils.props.hsql
  • MS SQL Server (http://www.microsoft.com/sql/)
    DatabaseUtils.props.mssqlserver
  • MySQL (http://www.mysql.com/)
    DatabaseUtils.props.mysql
  • ODBC
    DatabaseUtils.props.odbc - making use of the JDBC-ODBC-bridge that comes with Sun's JDK. With this you can access, e.g., MS Access. For more information check out the Windows Databases article.
  • Oracle (http://www.oracle.com/)
    DatabaseUtils.props.oracle
  • PostgreSQL (http://www.postgresql.org/)
    DatabaseUtils.props.postgresql

How property files are processed, you can find here.

Setup

Under normal circumstances you only have to edit the following two properties:

  • jdbcDriver
    the classname of the JDBC driver, necessary to connect to your database, e.g.
    • MySQL
      org.gjt.mm.mysql.Driver
    • PostgreSQL
      org.postgresql.Driver
    • Oracle
      oracle.jdbc.driver.OracleDriver
  • jdbcURL
    the JDBC URL pointing to your database (can be still changed in the Experimenter/Explorer), e.g. for the database MyDatabase on the server server.my.domain:
    • MySQL
      jdbc:mysql://server.my.domain:3306/MyDatabase
    • PostgreSQL
      jdbc:postgresql://server.my.domain:5432/MyDatabase
      You can also specify user and password directly in the URL:
      jdbc:postgresql://server.my.domain:5432/MyDatabase?user=<...>&password=<...>
      where you have to replace the <...> with the correct values
    • Oracle (thin driver)
      jdbc:oracle:thin:@server.my.domain:1526:orcl
      (Note: @machineName:port:SID)
      for the Express Edition you can use
      jdbc:oracle:thin:@localhost:1521:XE

Missing Datatypes

Sometimes (e.g. with MySQL) it can happen that a column type cannot be interpreted. In that case it is necessary to map the name of the column type to the Java type it should be interpreted as. E.g. the MySQL type TEXT is returned as BLOB from the JDBC driver and has to be mapped to String (0 represents String - the mappings can be found in the comments of the properties file):

BLOB=0

Stored Procedures

Let's say you're tired of typing the same query over and over again. A good way to shorten that, is to create a stored procedure.

PostgreSQL 7.4.x

The following example creates a procedure called emplyoee_name" that returns the names of all the employees in table employee. Even though it doesn't make much sense to create a stored procedure for this query, nonetheless, it shows how to create and call stored procedures in PostgreSQL.

  • Create
CREATE OR REPLACE FUNCTION public.employee_name()
 RETURNS SETOF text AS 'select name from employee'
 LANGUAGE 'sql' VOLATILE;
  • SQL statement to call procedure
SELECT * FROM employee_name()
  • Retrieve data via InstanceQuery (http://weka.sourceforge.net/doc/weka/experiment/InstanceQuery.html)
java weka.experiment.InstanceQuery -Q "SELECT * FROM employee_name()" -U <user> -P <password>

Troubleshooting

  • In case you're experiencing problems connecting to your database, check out the WEKA Mailing List. It is possible that somebody else encountered the same problem as you and you'll find a post containing the solution to your problem.
  • Specific MS SQL Server 2000 Troubleshooting
  • The Added driver: ... output on the commandline does not mean that the actual class was found, but only that Weka will attempt to load the class later on in order to establish a database connection.
  • The error message No suitable driver can be caused by the following:
    • The JDBC driver you are attempting to load is not in the CLASSPATH (Note: using -jar in the java commandline overwrites the CLASSPATH environment variable!)
    • The JDBC driver class is misspelled in the jdbcDriver property
    • The jdbcURL property has a spelling error and tries to use a non-existing protocol

Links