org.simbrain.gauge.core
Class Dataset

java.lang.Object
  extended by org.simbrain.gauge.core.Dataset

public class Dataset
extends Object

Dataset represents a set of n-dimensional points. Both the low and high dimensional data of the current Projector are instances of this class. Dataset provides methods for working with such sets (e.g. open dataset up, adding points, checking their integrity, finding nearest neighbors of a point, calculating their interpoint distances, etc.). It is assumed that all points in a dataset have the same dimensionality.


Constructor Summary
Dataset()
          Default constructor for adding datasets.
Dataset(ArrayList data)
          Creates an instance of dataset.
Dataset(int ndims)
          Creates and instance of Dataset.
 
Method Summary
 void addPoint(double[] row)
          Add datapoint without checking whether it is unique or not.
 boolean addPoint(double[] row, double tolerance)
          Add a new datapoint to the dataset.
 void calculateDistances()
          Calculate inter-point distancese.
 boolean checkConsistentDimensions()
          Check that all the vectors in the dataset have the same dimension.
 void clear()
          Clear all data, high and low dimensional.
 double getClosestDistance(double[] point)
          Returns the point closest to a given point.
 int getClosestIndex(double[] point)
          Returns the index of the closest point.
 double getComponent(int datapointNumber, int dimension)
          Get a specific coordinate of a specific datapoint.
 double getCovariance(int i, int j)
          Returns the covariance of the ith component of the dataset with respect to the jth component.
 Jama.Matrix getCovarianceMatrix()
          Returns a covariance matrix for the dataset.
 ArrayList getDataset()
           
 int getDimensions()
           
 double getDistance(double[] point1, double[] point2)
          Returns tyhe euclidean distance between two points.
 double getDistance(int index1, int index2)
          Get the distance between two points.
 double[][] getDistances()
          Returns a matrix of interpoint distances, between the points in the dataset.
 double[][] getDoubles()
          Returns a matrix of double, one row for each datapoint, representing the dataset.
 String[][] getDoubleStrings()
          Returns a matrix of strings, one row for each datapoint, representing the dataset.
 int getKthNearestNeighbor(int k, double[] point)
          Returns the k'th nearest neighbor.
 int getKthVariantDimension(int k)
          Returns the k'th most variant dimesion.
 double getMaximumDistance()
          Get the maximimum interpoint distance between points in the dataset.
 double getMean(int d)
          Returns the mean of the dataset on a given dimension.
 double getMinimumDistance()
          Get the minimum interpoint distance between points in the dataset.
 int getNumPoints()
           
 ArrayList getPersistentData()
           
 double[] getPoint(int i)
          Get a specificed point in the dataset.
 double getSumDistances()
           
 void init()
          Initialize the dataset, setting the main variables to the property values.
 void init(int dims, int numpoints)
          Re-initialize a dataset to a specific number of dimensions and number of points.
 void initCastor()
          Initializes Dataset from persitent data.
 void initPersistentData()
          Initializes persistant data.
 boolean isUniquePoint(double[] point, double tolerance)
          Check that a given point is "new", that is, that it is not already in the dataset.
 void perturbOverlappingPoints(double factor)
          Find repeated points and perturb them slightly so they don't overlap.
 void printDataset()
          Print out all points in the dataset Useful for debugging.
 void randomize(int upperBound)
          Randomize dataset to a value between 0 and upperBound.
 void readData(File file)
          Read in stored dataset file.
 void resultsToMaple()
          Print out low dimensional points so maple can plot them Just does low dimension = 2.
 void saveData(File theFile)
          Save the current datast to a stored file.
 void setComponent(int datapointNumber, int dimension, double newValue)
          Set a specific coordinate of a specific datapoint.
 void setDataset(ArrayList list)
           
 void setPersistentData(ArrayList theData)
          Sets data that is to be persitent.
 void setPoint(int i, double[] point)
          Set a specified point in the dataset.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Dataset

public Dataset()
Default constructor for adding datasets.


Dataset

public Dataset(ArrayList data)
Creates an instance of dataset.

Parameters:
data - ArrayList of data to be used for Dataset

Dataset

public Dataset(int ndims)
Creates and instance of Dataset.

Parameters:
ndims - dimension of dataset
Method Detail

init

public void init()
Initialize the dataset, setting the main variables to the property values. Assumes the dataset already exists, but that it has changed.


init

public void init(int dims,
                 int numpoints)
Re-initialize a dataset to a specific number of dimensions and number of points. Populates the dataset with stubs.

Parameters:
dims - Dimensions of the dataset
numpoints - Number of datapoints in the dataset

clear

public void clear()
Clear all data, high and low dimensional.


checkConsistentDimensions

public boolean checkConsistentDimensions()
Check that all the vectors in the dataset have the same dimension.

Returns:
boolean value

randomize

public void randomize(int upperBound)
Randomize dataset to a value between 0 and upperBound.

Parameters:
upperBound - highest value to be used

calculateDistances

public void calculateDistances()
Calculate inter-point distancese.


getMinimumDistance

public double getMinimumDistance()
Get the minimum interpoint distance between points in the dataset.

Returns:
minimum distance between any two points in the low-d dataset

getMaximumDistance

public double getMaximumDistance()
Get the maximimum interpoint distance between points in the dataset.

Returns:
maximum distance between any two points in the low-d dataset

readData

public void readData(File file)
Read in stored dataset file.

Parameters:
file - Name of file to read in

saveData

public void saveData(File theFile)
Save the current datast to a stored file.

Parameters:
theFile - the file where data should be saved

perturbOverlappingPoints

public void perturbOverlappingPoints(double factor)
Find repeated points and perturb them slightly so they don't overlap.

Parameters:
factor - Distance to perturb

resultsToMaple

public void resultsToMaple()
Print out low dimensional points so maple can plot them Just does low dimension = 2.


getPoint

public double[] getPoint(int i)
Get a specificed point in the dataset.

Parameters:
i - index of the point to get
Returns:
the n-dimensional datapoint

setPoint

public void setPoint(int i,
                     double[] point)
Set a specified point in the dataset.

Parameters:
i - the point to set
point - the new n-dimensional point

getComponent

public double getComponent(int datapointNumber,
                           int dimension)
Get a specific coordinate of a specific datapoint. Say, the second component of the third datapoint in a 5-dimensional dataset with 50 points.

Parameters:
datapointNumber - index of the point to get
dimension - dimension of the desired component
Returns:
the value of of n'th component of the specified datapoint

setComponent

public void setComponent(int datapointNumber,
                         int dimension,
                         double newValue)
Set a specific coordinate of a specific datapoint. Say, the second component of the third datapoint in a 5-dimensional dataset with 50 points.

Parameters:
datapointNumber - index of the point to get
dimension - dimension of the desired component
newValue - the new value of the n'th component of the specified datapoint

addPoint

public boolean addPoint(double[] row,
                        double tolerance)
Add a new datapoint to the dataset.

Parameters:
row - A point in the high dimensional space
tolerance - forwarded to isUniquePoint; if -1 then add point regardless of whether it is unique or not
Returns:
true if point added, false otherwise

addPoint

public void addPoint(double[] row)
Add datapoint without checking whether it is unique or not.

Parameters:
row - point to be added

isUniquePoint

public boolean isUniquePoint(double[] point,
                             double tolerance)
Check that a given point is "new", that is, that it is not already in the dataset.

Parameters:
point - the point to check
tolerance - distance within which a point is considered old, and outside of which it is considered new
Returns:
true if the point is new, false otherwise

getClosestDistance

public double getClosestDistance(double[] point)
Returns the point closest to a given point.

Parameters:
point - the point to check
Returns:
the distance between this point and the closest other point in the dataset

getClosestIndex

public int getClosestIndex(double[] point)
Returns the index of the closest point.

Parameters:
point - the point to check
Returns:
the index of the point closest to this one in the dataset

getKthNearestNeighbor

public int getKthNearestNeighbor(int k,
                                 double[] point)
Returns the k'th nearest neighbor.

Parameters:
k - which nearest neighbor (first, second, etc.) to find
point - the point whose neighbors are to be found
Returns:
index of nearest neighbor

getDistance

public double getDistance(int index1,
                          int index2)
Get the distance between two points.

Parameters:
index1 - index of point 1
index2 - index of point 2
Returns:
distance between points 1 and 2

getDistance

public double getDistance(double[] point1,
                          double[] point2)
Returns tyhe euclidean distance between two points.

Parameters:
point1 - First point of distance
point2 - Second point of distance
Returns:
the Euclidean distance between points 1 and 2

getDimensions

public int getDimensions()
Returns:
the dimensionality of the points in the dataset

getDistances

public double[][] getDistances()
Returns a matrix of interpoint distances, between the points in the dataset. Note that the lower triangular duplicates the upper triangular

Returns:
a matrix of interpoint distances

getNumPoints

public int getNumPoints()
Returns:
the number of points in the dataset

getSumDistances

public double getSumDistances()
Returns:
the sum of the distances between points in the dataset

getMean

public double getMean(int d)
Returns the mean of the dataset on a given dimension.

Parameters:
d - index of the dimension whose mean to get
Returns:
mean of dataset on dimension d

getCovariance

public double getCovariance(int i,
                            int j)
Returns the covariance of the ith component of the dataset with respect to the jth component.

Parameters:
i - first dimension
j - seconnd dimesion
Returns:
covariance of i with respect to j

getCovarianceMatrix

public Jama.Matrix getCovarianceMatrix()
Returns a covariance matrix for the dataset.

Returns:
covariance matrix which describes how the data covary along each dimension

getKthVariantDimension

public int getKthVariantDimension(int k)
Returns the k'th most variant dimesion. For example, the most variant dimension (k=1), or the least variant dimension (k=num_dimensions).

Parameters:
k - Number of variant dimension
Returns:
the k'th most variant dimension

getDataset

public ArrayList getDataset()
Returns:
a reference to the dataset

setDataset

public void setDataset(ArrayList list)
Parameters:
list - the dataset

printDataset

public void printDataset()
Print out all points in the dataset Useful for debugging.


getDoubleStrings

public String[][] getDoubleStrings()
Returns a matrix of strings, one row for each datapoint, representing the dataset.

Returns:
a matrix of strings representing the dataset

getDoubles

public double[][] getDoubles()
Returns a matrix of double, one row for each datapoint, representing the dataset.

Returns:
a matrix of double representing the dataset

initPersistentData

public void initPersistentData()
Initializes persistant data.


getPersistentData

public ArrayList getPersistentData()
Returns:
a form of the dataset usable by Castor for persistence

setPersistentData

public void setPersistentData(ArrayList theData)
Sets data that is to be persitent.

Parameters:
theData - Data set to be persitent

initCastor

public void initCastor()
Initializes Dataset from persitent data.