public class XMeans
extends weka.clusterers.RandomizableClusterer
implements weka.core.TechnicalInformationHandler
BibTeX:
@inproceedings{Pelleg2000, author = {Dan Pelleg and Andrew W. Moore}, booktitle = {Seventeenth International Conference on Machine Learning}, pages = {727-734}, publisher = {Morgan Kaufmann}, title = {X-means: Extending K-means with Efficient Estimation of the Number of Clusters}, year = {2000} }
Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
RandomizableClusterer
,
Serialized FormModifier and Type | Field and Description |
---|---|
static int |
D_CONVCHCLOSER
have a closer look at converge children.
|
static int |
D_CURR
for current debug.
|
static int |
D_FOLLOWSPLIT
follows the splitting of the centers.
|
static int |
D_GENERAL
general debugging.
|
static int |
D_ITERCOUNT
follow iterations.
|
static int |
D_KDTREE
check on kdtree.
|
static int |
D_METH_MISUSE
functions were maybe misused.
|
static int |
D_PRINTCENTERS
print the centers.
|
static int |
D_RANDOMVECTOR
check on random vectors.
|
boolean |
m_CurrDebugFlag
Flag: I'm debugging.
|
static int |
R_HIGH
Index in ranges for HIGH.
|
static int |
R_LOW
Index in ranges for LOW.
|
static int |
R_WIDTH
Index in ranges for WIDTH.
|
Constructor and Description |
---|
XMeans()
the default constructor.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
binValueTipText()
Returns the tip text for this property.
|
void |
buildClusterer(weka.core.Instances data)
Generates the X-Means clusterer.
|
boolean |
checkForNominalAttributes(weka.core.Instances data)
Checks for nominal attributes in the dataset.
|
int |
clusterInstance(weka.core.Instance instance)
Classifies a given instance.
|
java.lang.String |
cutOffFactorTipText()
Returns the tip text for this property.
|
java.lang.String |
debugLevelTipText()
Returns the tip text for this property.
|
java.lang.String |
debugVectorsFileTipText()
Returns the tip text for this property.
|
java.lang.String |
distanceFTipText()
Returns the tip text for this property.
|
double |
getBinValue()
Gets value that represents true in a new numeric attribute.
|
weka.core.Capabilities |
getCapabilities()
Returns default capabilities of the clusterer.
|
weka.core.Instances |
getClusterCenters()
Return the centers of the clusters as an Instances object
|
double |
getCutOffFactor()
Gets the cutoff factor.
|
int |
getDebugLevel()
Gets the debug level.
|
java.io.File |
getDebugVectorsFile()
Gets the file name for a file that has the random vectors stored.
|
weka.core.DistanceFunction |
getDistanceF()
Gets the distance function.
|
java.io.File |
getInputCenterFile()
Gets the file to read the list of centers from.
|
weka.core.neighboursearch.KDTree |
getKDTree()
Gets the KDTree class.
|
int |
getMaxIterations()
Gets the maximum number of iterations.
|
int |
getMaxKMeans()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxKMeansForChildren()
Gets the maximum number of iterations in KMeans.
|
int |
getMaxNumClusters()
Gets the maximum number of clusters to generate.
|
int |
getMinNumClusters()
Gets the minimum number of clusters to generate.
|
weka.core.Instance |
getNextDebugVectorsInstance(weka.core.Instances model)
Read an instance from debug vectors file.
|
java.lang.String[] |
getOptions()
Gets the current settings of SimpleKMeans.
|
java.io.File |
getOutputCenterFile()
Gets the file to write the list of centers to.
|
java.lang.String |
getRevision()
Returns the revision string.
|
weka.core.TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed
information about the technical background of this class, e.g., paper
reference or book this class is based on.
|
boolean |
getUseKDTree()
Gets whether the KDTree is used or not.
|
java.lang.String |
globalInfo()
Returns a string describing this clusterer.
|
void |
initDebugVectorsInput()
Initialises the debug vector input.
|
java.lang.String |
inputCenterFileTipText()
Returns the tip text for this property.
|
java.lang.String |
KDTreeTipText()
Returns the tip text for this property.
|
java.util.Enumeration<weka.core.Option> |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(java.lang.String[] argv)
Main method for testing this class.
|
java.lang.String |
maxIterationsTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansForChildrenTipText()
Returns the tip text for this property.
|
java.lang.String |
maxKMeansTipText()
Returns the tip text for this property.
|
java.lang.String |
maxNumClustersTipText()
Returns the tip text for this property.
|
java.lang.String |
minNumClustersTipText()
Returns the tip text for this property.
|
int |
numberOfClusters()
Returns the number of clusters.
|
java.lang.String |
outputCenterFileTipText()
Returns the tip text for this property.
|
void |
setBinValue(double value)
Sets the distance value between true and false of binary attributes.
|
void |
setCutOffFactor(double i)
Sets a new cutoff factor.
|
void |
setDebugLevel(int d)
Sets the debug level.
|
void |
setDebugVectorsFile(java.io.File value)
Sets the file that has the random vectors stored.
|
void |
setDistanceF(weka.core.DistanceFunction distanceF)
gets the "binary" distance value.
|
void |
setInputCenterFile(java.io.File value)
Sets the file to read the list of centers from.
|
void |
setKDTree(weka.core.neighboursearch.KDTree k)
Sets the KDTree class.
|
void |
setMaxIterations(int i)
Sets the maximum number of iterations to perform.
|
void |
setMaxKMeans(int i)
Set the maximum number of iterations to perform in KMeans.
|
void |
setMaxKMeansForChildren(int i)
Sets the maximum number of iterations KMeans that is performed on the child
centers.
|
void |
setMaxNumClusters(int n)
Sets the maximum number of clusters to generate.
|
void |
setMinNumClusters(int n)
Sets the minimum number of clusters to generate.
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options.
|
void |
setOutputCenterFile(java.io.File value)
Sets file to write the list of centers to.
|
void |
setUseKDTree(boolean value)
Sets whether to use the KDTree or not.
|
java.lang.String |
toString()
Return a string describing this clusterer.
|
java.lang.String |
useKDTreeTipText()
Returns the tip text for this property.
|
debugTipText, distributionForInstance, doNotCheckCapabilitiesTipText, forName, getDebug, getDoNotCheckCapabilities, makeCopies, makeCopy, postExecution, preExecution, run, runClusterer, setDebug, setDoNotCheckCapabilities
public static int R_LOW
public static int R_HIGH
public static int R_WIDTH
public static int D_PRINTCENTERS
public static int D_FOLLOWSPLIT
public static int D_CONVCHCLOSER
public static int D_RANDOMVECTOR
public static int D_KDTREE
public static int D_ITERCOUNT
public static int D_METH_MISUSE
public static int D_CURR
public static int D_GENERAL
public boolean m_CurrDebugFlag
public java.lang.String globalInfo()
public weka.core.TechnicalInformation getTechnicalInformation()
getTechnicalInformation
in interface weka.core.TechnicalInformationHandler
public weka.core.Capabilities getCapabilities()
getCapabilities
in interface weka.clusterers.Clusterer
getCapabilities
in interface weka.core.CapabilitiesHandler
getCapabilities
in class weka.clusterers.AbstractClusterer
public void buildClusterer(weka.core.Instances data) throws java.lang.Exception
buildClusterer
in interface weka.clusterers.Clusterer
buildClusterer
in class weka.clusterers.AbstractClusterer
data
- set of instances serving as training datajava.lang.Exception
- if the clusterer has not been generated successfullypublic boolean checkForNominalAttributes(weka.core.Instances data)
data
- the data to checkpublic int clusterInstance(weka.core.Instance instance) throws java.lang.Exception
clusterInstance
in interface weka.clusterers.Clusterer
clusterInstance
in class weka.clusterers.AbstractClusterer
instance
- the instance to be assigned to a clusterjava.lang.Exception
- if instance could not be classified successfullypublic int numberOfClusters()
numberOfClusters
in interface weka.clusterers.Clusterer
numberOfClusters
in class weka.clusterers.AbstractClusterer
public java.util.Enumeration<weka.core.Option> listOptions()
listOptions
in interface weka.core.OptionHandler
listOptions
in class weka.clusterers.RandomizableClusterer
public java.lang.String minNumClustersTipText()
public void setMinNumClusters(int n)
n
- the minimum number of clusters to generatepublic int getMinNumClusters()
public java.lang.String maxNumClustersTipText()
public void setMaxNumClusters(int n)
n
- the maximum number of clusters to generatepublic int getMaxNumClusters()
public java.lang.String maxIterationsTipText()
public void setMaxIterations(int i) throws java.lang.Exception
i
- the number of iterationsjava.lang.Exception
- if i is less than 1public int getMaxIterations()
public java.lang.String maxKMeansTipText()
public void setMaxKMeans(int i)
i
- the number of iterationspublic int getMaxKMeans()
public java.lang.String maxKMeansForChildrenTipText()
public void setMaxKMeansForChildren(int i)
i
- the number of iterationspublic int getMaxKMeansForChildren()
public java.lang.String cutOffFactorTipText()
public void setCutOffFactor(double i)
i
- the new cutoff factorpublic double getCutOffFactor()
public java.lang.String binValueTipText()
public double getBinValue()
public void setBinValue(double value)
value
- the distancepublic java.lang.String distanceFTipText()
public void setDistanceF(weka.core.DistanceFunction distanceF)
distanceF
- the distance function with all options setpublic weka.core.DistanceFunction getDistanceF()
public java.lang.String debugVectorsFileTipText()
public void setDebugVectorsFile(java.io.File value)
value
- the file to read the random vectors frompublic java.io.File getDebugVectorsFile()
public void initDebugVectorsInput() throws java.lang.Exception
java.lang.Exception
- if there is error opening the debug input file.public weka.core.Instance getNextDebugVectorsInstance(weka.core.Instances model) throws java.lang.Exception
model
- the data model for the instance.java.lang.Exception
- if there are no debug vector in m_DebugVectors.public java.lang.String inputCenterFileTipText()
public void setInputCenterFile(java.io.File value)
value
- the file to read centers frompublic java.io.File getInputCenterFile()
public java.lang.String outputCenterFileTipText()
public void setOutputCenterFile(java.io.File value)
value
- file to write centers topublic java.io.File getOutputCenterFile()
public java.lang.String KDTreeTipText()
public void setKDTree(weka.core.neighboursearch.KDTree k)
k
- a KDTree object with all options setpublic weka.core.neighboursearch.KDTree getKDTree()
public java.lang.String useKDTreeTipText()
public void setUseKDTree(boolean value)
value
- if true the KDTree is usedpublic boolean getUseKDTree()
public java.lang.String debugLevelTipText()
public void setDebugLevel(int d)
d
- debuglevelpublic int getDebugLevel()
public void setOptions(java.lang.String[] options) throws java.lang.Exception
Valid options are:
-I <num> maximum number of overall iterations (default 1).
-M <num> maximum number of iterations in the kMeans loop in the Improve-Parameter part (default 1000).
-J <num> maximum number of iterations in the kMeans loop for the splitted centroids in the Improve-Structure part (default 1000).
-L <num> minimum number of clusters (default 2).
-H <num> maximum number of clusters (default 4).
-B <value> distance value for binary attributes (default 1.0).
-use-kdtree Uses the KDTree internally (default no).
-K <KDTree class specification> Full class name of KDTree class to use, followed by scheme options. eg: "weka.core.neighboursearch.kdtrees.KDTree -P" (default no KDTree class used).
-C <value> cutoff factor, takes the given percentage of the splitted centroids if none of the children win (default 0.0).
-D <distance function class specification> Full class name of Distance function class to use, followed by scheme options. (default weka.core.EuclideanDistance).
-N <file name> file to read starting centers from (ARFF format).
-O <file name> file to write centers to (ARFF format).
-U <int> The debug level. (default 0)
-Y <file name> The debug vectors file.
-S <num> Random number seed. (default 10)
setOptions
in interface weka.core.OptionHandler
setOptions
in class weka.clusterers.RandomizableClusterer
options
- the list of options as an array of stringsjava.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface weka.core.OptionHandler
getOptions
in class weka.clusterers.RandomizableClusterer
public java.lang.String toString()
toString
in class java.lang.Object
public weka.core.Instances getClusterCenters()
public java.lang.String getRevision()
getRevision
in interface weka.core.RevisionHandler
getRevision
in class weka.clusterers.AbstractClusterer
public static void main(java.lang.String[] argv)
argv
- should contain options