weka.core
Class Stopwords

java.lang.Object
  extended by weka.core.Stopwords
All Implemented Interfaces:
RevisionHandler

public class Stopwords
extends java.lang.Object
implements RevisionHandler

Class that can test whether a given string is a stop word. Lowercases all words before the test.

The format for reading and writing is one word per line, lines starting with '#' are interpreted as comments and therefore skipped.

The default stopwords are based on Rainbow.

Accepts the following parameter:

-i file
loads the stopwords from the given file

-o file
saves the stopwords to the given file

-p
outputs the current stopwords on stdout

Any additional parameters are interpreted as words to test as stopwords.

Version:
$Revision: 8034 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)

Constructor Summary
Stopwords()
          initializes the stopwords (based on Rainbow).
 
Method Summary
 void add(java.lang.String word)
          adds the given word to the stopword list (is automatically converted to lower case and trimmed)
 void clear()
          removes all stopwords
 java.util.Enumeration elements()
          Returns a sorted enumeration over all stored stopwords
 java.lang.String getRevision()
          Returns the revision string.
 boolean is(java.lang.String word)
          Returns true if the given string is a stop word.
static boolean isStopword(java.lang.String str)
          Returns true if the given string is a stop word.
static void main(java.lang.String[] args)
          Accepts the following parameter:

-i file
loads the stopwords from the given file

-o file
saves the stopwords to the given file

-p
outputs the current stopwords on stdout

Any additional parameters are interpreted as words to test as stopwords.

 void read(java.io.BufferedReader reader)
          Generates a new Stopwords object from the reader.
 void read(java.io.File file)
          Generates a new Stopwords object from the given file
 void read(java.lang.String filename)
          Generates a new Stopwords object from the given file
 boolean remove(java.lang.String word)
          removes the word from the stopword list
 java.lang.String toString()
          returns the current stopwords in a string
 void write(java.io.BufferedWriter writer)
          Writes the current stopwords to the given writer.
 void write(java.io.File file)
          Writes the current stopwords to the given file
 void write(java.lang.String filename)
          Writes the current stopwords to the given file
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Stopwords

public Stopwords()
initializes the stopwords (based on Rainbow).

Method Detail

clear

public void clear()
removes all stopwords


add

public void add(java.lang.String word)
adds the given word to the stopword list (is automatically converted to lower case and trimmed)

Parameters:
word - the word to add

remove

public boolean remove(java.lang.String word)
removes the word from the stopword list

Parameters:
word - the word to remove
Returns:
true if the word was found in the list and then removed

is

public boolean is(java.lang.String word)
Returns true if the given string is a stop word.

Parameters:
word - the word to test
Returns:
true if the word is a stopword

elements

public java.util.Enumeration elements()
Returns a sorted enumeration over all stored stopwords

Returns:
the enumeration over all stopwords

read

public void read(java.lang.String filename)
          throws java.lang.Exception
Generates a new Stopwords object from the given file

Parameters:
filename - the file to read the stopwords from
Throws:
java.lang.Exception - if reading fails

read

public void read(java.io.File file)
          throws java.lang.Exception
Generates a new Stopwords object from the given file

Parameters:
file - the file to read the stopwords from
Throws:
java.lang.Exception - if reading fails

read

public void read(java.io.BufferedReader reader)
          throws java.lang.Exception
Generates a new Stopwords object from the reader. The reader is closed automatically.

Parameters:
reader - the reader to get the stopwords from
Throws:
java.lang.Exception - if reading fails

write

public void write(java.lang.String filename)
           throws java.lang.Exception
Writes the current stopwords to the given file

Parameters:
filename - the file to write the stopwords to
Throws:
java.lang.Exception - if writing fails

write

public void write(java.io.File file)
           throws java.lang.Exception
Writes the current stopwords to the given file

Parameters:
file - the file to write the stopwords to
Throws:
java.lang.Exception - if writing fails

write

public void write(java.io.BufferedWriter writer)
           throws java.lang.Exception
Writes the current stopwords to the given writer. The writer is closed automatically.

Parameters:
writer - the writer to get the stopwords from
Throws:
java.lang.Exception - if writing fails

toString

public java.lang.String toString()
returns the current stopwords in a string

Overrides:
toString in class java.lang.Object
Returns:
the current stopwords

isStopword

public static boolean isStopword(java.lang.String str)
Returns true if the given string is a stop word.

Parameters:
str - the word to test
Returns:
true if the word is a stopword

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Returns:
the revision

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Accepts the following parameter:

-i file
loads the stopwords from the given file

-o file
saves the stopwords to the given file

-p
outputs the current stopwords on stdout

Any additional parameters are interpreted as words to test as stopwords.

Parameters:
args - commandline parameters
Throws:
java.lang.Exception - if something goes wrong