Data Class Reference

The Data object encapsulates a set of labeled data samples. More...

#include <Data.h>

Inherits NamedObject.

Inherited by Image.

Inheritance diagram for Data:

Inheritance graph
[legend]
Collaboration diagram for Data:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 Data (const char *name, NameList *classNames, NameList *dimNames)
 Creates an empty, labeled Data object; for adding entries incrementally.
 Data (const char *name, int numEntries, NameList *classNames, NameList *dimNames)
 Creates an empty, labeled Data object based on the passed meta-data, and allocates the space for the data entries and the labels.
 Data (const char *name, istream &is, bool isAscii, bool readLabels)
 Creates a Data object, based on the meta-data in the file, and fills it from the passed file stream.
 Data (const char *name, istream &is, bool isAscii, int numEntries, NameList *classNames, NameList *dimNames, bool readDoubles=false)
 Creates a Data object based on the passed meta-data, and fills it from the passed file stream.
 Data (const char *name, int numEntries, NameList *dimNames)
 Creates an empty, unlabeled Data object based on the passed meta-data, and allocates the space for the data entries.
 Data (const char *name, NameList *dimNames)
 Creates an empty, unlabeled Data object; for adding entries incrementally, using addEntry (float *input).
 Data (const char *name, istream &is, NameList *dimNames, int numEntries, bool isAscii=true)
 Creates a Data object based on the passed meta-data, and fills it from the passed file stream.
 Data (const char *name, Data *src)
 Copy constructor - Makes a new copy of the passed data set.
void loadLabelsFromStreamAscii (istream &is)
 Loads a set of labels from the passed input stream.
void loadLabelsFromStreamBinary (istream &is, bool readInts)
 Reads a binary array of labels (the must be numEntries of them).
void loadFromStreamAscii (istream &is)
 Loads the data (no labels) from the passed input stream.
void loadFromStreamAscii (istream &is, int labelPosition)
 Loads data and labels from the passed input stream.
void loadFromStreamBinary (istream &is, bool readDoubles=false, bool readLabels=false)
 Reads a binary data matrix (as floats or doubles), optionally followed by a binary array of labels (as integers).
void setData (float *data)
 Given a Data object for which storage has been allocated, this method lets you pass in directly the matrix of floating point data values.
float * getData ()
 The converse of setData() - Returns a reference to the data array.
void normalize ()
 Rescales each entry, so that each dimension's extrema are scaled to the range [0, 1].
int getNumEntries ()
 Returns the number of samples in the data set.
NameListgetDimNames ()
 Returns the NameList describing the data set's features.
NameListgetClasses ()
 Returns the NameList describing the data set's class labels.
void setLabels (int *labels)
 Allows the setting of sample labels.
int getLabel (int entryIndex)
 Retrieves the label at a given position (only returns the first label).
LabelSetgetLabels (int entryIndex)
 Retrieves a set of labels for a given position.
float getEntry (int rowNum, int dimIndex)
 Returns the specified data entry.
void addEntry (float *inputs)
 For growing the data set incrementally; this method is for adding unlabeled samples.
void addEntry (float *inputs, int label)
 For growing the data set incrementally; this method is for adding labeled samples.
void setEntry (int index, float *inputs, int label)
 For explicit access to the data set, this method lets you add a labeled sample at a specific location.
void save (ostream &os, bool isAscii, bool writeData, bool writeHeader)
 Writes the data set to the passed output file stream.
void multiplyLabels (istream &labelDefsIs)
 Specifies a mapping to multiply the labels that are defined for each sample.
void deleteClass (int classIdx)
 Throws out all samples of the specified class, shrinking the data set accordingly.
void sortByLabel ()
 Re-sorts the data set by increasing label index.
void listEntries (ostream &os)
 Prints the contents of the data set to the passed stream.
void plot (ostream &ost, int dim1, int dim2, bool ignoreZero)
 Generates a character-graphics plot of the data set and sends it to the passed output stream.
void hist (ostream &s)
 Writes a histogram by class of the data set.

Friends

class Image
ostream & operator<< (ostream &s, Data &d)
 Prints the contents of the data set to the passed stream.

Detailed Description

The Data object encapsulates a set of labeled data samples.

It can be filled from passed arrays or from files. Data files can be specified in text or binary form. Data and associated labels can be loaded together, or separately.
Three types of meta-data describe the contents of the data set:


Constructor & Destructor Documentation

Data::Data const char *  name,
NameList classNames,
NameList dimNames
 

Creates an empty, labeled Data object; for adding entries incrementally.

Parameters:
name The data set's name
classNames NameList describing the label names for this data set.
dimNames The NameList that describes the data set's features

Data::Data const char *  name,
int  numEntries,
NameList classNames,
NameList dimNames
 

Creates an empty, labeled Data object based on the passed meta-data, and allocates the space for the data entries and the labels.

This constructor is meant to be used in conjunction with the setData() and setLabels() method, which lets you pass in the whole data set at once.

Parameters:
name The data set's name
numEntries The number of samples in the data set.
classNames NameList describing the label names for this data set.
dimNames The NameList that describes the data set's features.

Data::Data const char *  name,
istream &  is,
bool  isAscii,
bool  readLabels
 

Creates a Data object, based on the meta-data in the file, and fills it from the passed file stream.

The data file may be text (ascii) or binary (no - see note), and may or may not contain label definitions. If text-based, the data file must start with a data header, and if binary, the meta-data must be defined separately. This command only supports the loading of single labels per data file.

Parameters:
name The data set's name
is The input stream that refers to the data file.
isAscii Whether the input data is binary or ascii.
readLabels Whether the data include labels or not
Bug:
The binary loading condition doesn't work yet:s To load binary data, first define an empty Data object, and then either pass the data using setData(), or load it from file using loadFromStreamBinary().

Data::Data const char *  name,
istream &  is,
bool  isAscii,
int  numEntries,
NameList classNames,
NameList dimNames,
bool  readDoubles = false
 

Creates a Data object based on the passed meta-data, and fills it from the passed file stream.

The data file may be text (ascii) or binary, and are labeled, i.e., label definitions are required. In ASCII files, the labels are assumed to be in position zero, whereas in binary data, they come in a second array, following the data points. If the data points are binary, their precision must be specified (i.e., C++ floats or doubles).

Parameters:
name The data set's name
is The input stream that refers to the data file.
isAscii Whether the data set is text-based, or binary.
numEntries The number of samples in the data set.
classNames NameList describing the label names for this data set.
dimNames The NameList that describes the data set's features.
readDoubles If true, double-point precision, if false, single.

Data::Data const char *  name,
int  numEntries,
NameList dimNames
 

Creates an empty, unlabeled Data object based on the passed meta-data, and allocates the space for the data entries.

This constructor is meant to be used in conjunction with the setData() method, which lets you pass in the whole data set at once.

Parameters:
name The data set's name
numEntries The number of samples in the data set.
dimNames The NameList that describes the data set's features.

Data::Data const char *  name,
NameList dimNames
 

Creates an empty, unlabeled Data object; for adding entries incrementally, using addEntry (float *input).

Parameters:
name The data set's name
dimNames The NameList that describes the data set's features

Data::Data const char *  name,
istream &  is,
NameList dimNames,
int  numEntries,
bool  isAscii = true
 

Creates a Data object based on the passed meta-data, and fills it from the passed file stream.

The data file may be text (ascii) or binary, and are unlabeled, i.e., no label definitions are included. If the data are binary, they are single-point floating point precision (i.e., C++ floats).

Parameters:
name The data set's name
is The input stream that refers to the data file.
dimNames The NameList that describes the data set's features.
numEntries The number of samples in the data set.
isAscii Whether the data set is text-based, or binary.

Data::Data const char *  name,
Data src
 

Copy constructor - Makes a new copy of the passed data set.

Using this is a simple way to split out the ground truth component of the data. New storage is allocated for the feature and label data, but the dimNames and classNames are copied by reference.

Parameters:
name The new data set's name
src The data set to be copied.


Member Function Documentation

void Data::addEntry float *  inputs,
int  label
 

For growing the data set incrementally; this method is for adding labeled samples.

Parameters:
inputs An array of data points, one per data set feature dimensions.
label A single label in the range defined for the data set.
Bug:
The two addEntry methods are hopelessly inefficient - in and labels should be replaced with vectors, rather than C arrays

void Data::addEntry float *  inputs  ) 
 

For growing the data set incrementally; this method is for adding unlabeled samples.

Parameters:
inputs An array of data points, one per data set feature dimensions.
Bug:
The two addEntry methods are hopelessly inefficient - in and labels should be replaced with vectors, rather than C arrays

void Data::deleteClass int  classIdx  ) 
 

Throws out all samples of the specified class, shrinking the data set accordingly.

This is used for example to restrict the data to the ground truth samples only, assuming the unlabeled samples have a particular label.

Parameters:
classIdx The index of the class for which to discard entries.

NameList& Data::getClasses  )  [inline]
 

Returns the NameList describing the data set's class labels.

float * Data::getData  ) 
 

The converse of setData() - Returns a reference to the data array.

NameList& Data::getDimNames  )  [inline]
 

Returns the NameList describing the data set's features.

float Data::getEntry int  rowNum,
int  colNum
 

Returns the specified data entry.

Parameters:
rowNum The sample index, in [0, numEntries-1]
colNum The dimension index, in [0, numDims-1]
Returns:
The data value for that position.

int Data::getLabel int  entryIndex  ) 
 

Retrieves the label at a given position (only returns the first label).

Parameters:
entryIndex The index of the sample for which the label is requested.
Returns:
The requested label index.

LabelSet & Data::getLabels int  entryIndex  ) 
 

Retrieves a set of labels for a given position.

Parameters:
entryIndex The index of the sample for which the label set is requested.
Returns:
A LabelSet object holding the labels for the sample.

int Data::getNumEntries  )  [inline]
 

Returns the number of samples in the data set.

void Data::hist ostream &  ost  ) 
 

Writes a histogram by class of the data set.

Parameters:
ost The stream to send the histogram to.

void Data::listEntries ostream &  ost  ) 
 

Prints the contents of the data set to the passed stream.

Parameters:
ost The stream to direct the output to.

void Data::loadFromStreamAscii istream &  is,
int  labelPosition
 

Loads data and labels from the passed input stream.

The data file is assumed to have no header, so the meta-data should have been set during the constructor call.

Parameters:
is The input stream that refers to the data file.
labelPosition The position of the single label in a line of data. 0 for first place, 1 for second, etc., and -1 for last place.

void Data::loadFromStreamAscii istream &  is  ) 
 

Loads the data (no labels) from the passed input stream.

The data file is assumed to have no header, so the meta-data should have been set during the constructor call. A heartbeat '.' is printed every 10,000 samples loaded.

Parameters:
is The input stream that refers to the data file.

void Data::loadFromStreamBinary istream &  is,
bool  readDoubles = false,
bool  readLabels = false
 

Reads a binary data matrix (as floats or doubles), optionally followed by a binary array of labels (as integers).

The data file is assumed to have no header, so the meta-data should have been set during the constructor call.

Parameters:
is The input stream that refers to the data file.
readDoubles If true, the data are double precision, if false, they're floats.
readLabels If true, then an array of labels follows the data.

void Data::loadLabelsFromStreamAscii istream &  is  ) 
 

Loads a set of labels from the passed input stream.

The stream should have one line per data sample, i.e., there should be numEntries lines, and may have more than one label per line, separated by white-space. This is currently the most flexible approach to loading multiple-label sets.

Parameters:
is The input stream that refers to a file of labels.

void Data::loadLabelsFromStreamBinary istream &  is,
bool  readInts
 

Reads a binary array of labels (the must be numEntries of them).

The labels may be specified as ints or shorts. Only one label per sample is currently supported.

Parameters:
is The input stream referring to a file of binary labels.
readInts If true, labels are stored as ints, else as shorts.

void Data::multiplyLabels istream &  labelDefsIs  ) 
 

Specifies a mapping to multiply the labels that are defined for each sample.

The contents of the passed input stream define the mapping to use. The syntax to use is as follows (comments prefixed by '#' characters may be interspersed): A NameList (e.g., "[ a b c ]") specifying the new set of labels, which should be a superset of the old list, with the old labels coming first in the list, i.e., if the old list is something like [ a b c ], then the new list should be something like [ a b c d e f ]. Then, one line for each of the old labels: On each line, first the old label line, followed by the set of labels that should be substituted for the old label, presumably including the old label, though this isn't required. As an example:
[ unlab. miss beach ocean ice river road park resid indst water openSp built natur manMd ]

unlab. unlab.
miss miss
beach beach openSp natur
ocean ocean water natur
ice ice water natur
river river water natur
road road manMd
park park openSp natur
resid resid built manMd
indst indst built manMd

After calling this function, a data set with single labels will have each of those labels replaced by the appropriate set of one or more labels, as specified by the mapping in the file.

void Data::normalize  ) 
 

Rescales each entry, so that each dimension's extrema are scaled to the range [0, 1].

The operation is performed one dimension at a time, i.e., each featural dimension is rescaled with respect to the extrema for that dimension, rather than with respect to the extrema of the entire data set.

void Data::plot ostream &  ost,
int  dim1,
int  dim2,
bool  ignoreZero
 

Generates a character-graphics plot of the data set and sends it to the passed output stream.

This is only useful for small data sets. The result is a scatter plot of the data in the two dimensions specified.

Parameters:
ost The output stream that should receive the plot output.
dim1 The dimension that defines the X axis of the plot.
dim2 The dimension that defines the Y axis of the plot.
ignoreZero If true, samples with class 0 aren't plotted. Class 0 is generally used to mean 'no label', so this omits samples without ground truth information from the plot.

void Data::save ostream &  os,
bool  isAscii,
bool  writeLabels,
bool  writeHeader
 

Writes the data set to the passed output file stream.

If a binary save is requested, the writeLabels and writeHeader flags are ignored, and the data are written as a matrix of floats, with the sample index varying more slowly than the dimension index. If an ASCII save is requested, then the writeLabels and writeHeader flags control what is written.

Parameters:
os The stream to which to send the data set.
isAscii Whether to write an ASCII or binary output.
writeLabels If ASCII, whether to prefix each sample with the label.
writeHeader If ASCII, whether to prefix the data with a descriptive header.

void Data::setData float *  data  ) 
 

Given a Data object for which storage has been allocated, this method lets you pass in directly the matrix of floating point data values.

In the passed matrix (Two dimensions mapped to a 1-D C array), the sample index varies more slowly than the dimension index.

Parameters:
data The C++ array of floats containing the data (numEntries * numDims values).

void Data::setEntry int  index,
float *  inputs,
int  label
 

For explicit access to the data set, this method lets you add a labeled sample at a specific location.

Parameters:
index The location of the new data point, in [0, numEntries-1].
inputs An array of data points, one per data set feature dimensions.
label A single label in the range defined for the data set.

void Data::setLabels int *  labels  ) 
 

Allows the setting of sample labels.

The passed array specifies an integer label for each sample. Multiple labels may be specified by calling this method multiple times, though in this case a label must be specified for each sample, so all samples must have the same number of labels.

Parameters:
labels An array of integers, specifying one label per data sample.

void Data::sortByLabel  ) 
 

Re-sorts the data set by increasing label index.


Friends And Related Function Documentation

friend class Image [friend]
 

ostream& operator<< ostream &  s,
Data d
[friend]
 

Prints the contents of the data set to the passed stream.


The documentation for this class was generated from the following files:
Generated on Tue Dec 13 11:00:27 2005 for Classer by  doxygen 1.4.3