#include <Data.h>
Inherits NamedObject.
Inherited by Image.
Inheritance diagram for Data:
Public Member Functions | |
Data (const char *name, NameList *classNames, NameList *dimNames) | |
Creates an empty, labeled Data object; for adding entries incrementally. | |
Data (const char *name, int numEntries, NameList *classNames, NameList *dimNames) | |
Creates an empty, labeled Data object based on the passed meta-data, and allocates the space for the data entries and the labels. | |
Data (const char *name, istream &is, bool isAscii, bool readLabels) | |
Creates a Data object, based on the meta-data in the file, and fills it from the passed file stream. | |
Data (const char *name, istream &is, bool isAscii, int numEntries, NameList *classNames, NameList *dimNames, bool readDoubles=false) | |
Creates a Data object based on the passed meta-data, and fills it from the passed file stream. | |
Data (const char *name, int numEntries, NameList *dimNames) | |
Creates an empty, unlabeled Data object based on the passed meta-data, and allocates the space for the data entries. | |
Data (const char *name, NameList *dimNames) | |
Creates an empty, unlabeled Data object; for adding entries incrementally, using addEntry (float *input). | |
Data (const char *name, istream &is, NameList *dimNames, int numEntries, bool isAscii=true) | |
Creates a Data object based on the passed meta-data, and fills it from the passed file stream. | |
Data (const char *name, Data *src) | |
Copy constructor - Makes a new copy of the passed data set. | |
void | loadLabelsFromStreamAscii (istream &is) |
Loads a set of labels from the passed input stream. | |
void | loadLabelsFromStreamBinary (istream &is, bool readInts) |
Reads a binary array of labels (the must be numEntries of them). | |
void | loadFromStreamAscii (istream &is) |
Loads the data (no labels) from the passed input stream. | |
void | loadFromStreamAscii (istream &is, int labelPosition) |
Loads data and labels from the passed input stream. | |
void | loadFromStreamBinary (istream &is, bool readDoubles=false, bool readLabels=false) |
Reads a binary data matrix (as floats or doubles), optionally followed by a binary array of labels (as integers). | |
void | setData (float *data) |
Given a Data object for which storage has been allocated, this method lets you pass in directly the matrix of floating point data values. | |
float * | getData () |
The converse of setData() - Returns a reference to the data array. | |
void | normalize () |
Rescales each entry, so that each dimension's extrema are scaled to the range [0, 1]. | |
int | getNumEntries () |
Returns the number of samples in the data set. | |
NameList & | getDimNames () |
Returns the NameList describing the data set's features. | |
NameList & | getClasses () |
Returns the NameList describing the data set's class labels. | |
void | setLabels (int *labels) |
Allows the setting of sample labels. | |
int | getLabel (int entryIndex) |
Retrieves the label at a given position (only returns the first label). | |
LabelSet & | getLabels (int entryIndex) |
Retrieves a set of labels for a given position. | |
float | getEntry (int rowNum, int dimIndex) |
Returns the specified data entry. | |
void | addEntry (float *inputs) |
For growing the data set incrementally; this method is for adding unlabeled samples. | |
void | addEntry (float *inputs, int label) |
For growing the data set incrementally; this method is for adding labeled samples. | |
void | setEntry (int index, float *inputs, int label) |
For explicit access to the data set, this method lets you add a labeled sample at a specific location. | |
void | save (ostream &os, bool isAscii, bool writeData, bool writeHeader) |
Writes the data set to the passed output file stream. | |
void | multiplyLabels (istream &labelDefsIs) |
Specifies a mapping to multiply the labels that are defined for each sample. | |
void | deleteClass (int classIdx) |
Throws out all samples of the specified class, shrinking the data set accordingly. | |
void | sortByLabel () |
Re-sorts the data set by increasing label index. | |
void | listEntries (ostream &os) |
Prints the contents of the data set to the passed stream. | |
void | plot (ostream &ost, int dim1, int dim2, bool ignoreZero) |
Generates a character-graphics plot of the data set and sends it to the passed output stream. | |
void | hist (ostream &s) |
Writes a histogram by class of the data set. | |
Friends | |
class | Image |
ostream & | operator<< (ostream &s, Data &d) |
Prints the contents of the data set to the passed stream. |
It can be filled from passed arrays or from files. Data files can be specified in text or binary form. Data and associated labels can be loaded together, or separately.
Three types of meta-data describe the contents of the data set:
|
Creates an empty, labeled Data object; for adding entries incrementally.
|
|
Creates an empty, labeled Data object based on the passed meta-data, and allocates the space for the data entries and the labels. This constructor is meant to be used in conjunction with the setData() and setLabels() method, which lets you pass in the whole data set at once. |
|
Creates a Data object, based on the meta-data in the file, and fills it from the passed file stream. The data file may be text (ascii) or binary (no - see note), and may or may not contain label definitions. If text-based, the data file must start with a data header, and if binary, the meta-data must be defined separately. This command only supports the loading of single labels per data file.
|
|
Creates a Data object based on the passed meta-data, and fills it from the passed file stream. The data file may be text (ascii) or binary, and are labeled, i.e., label definitions are required. In ASCII files, the labels are assumed to be in position zero, whereas in binary data, they come in a second array, following the data points. If the data points are binary, their precision must be specified (i.e., C++ floats or doubles).
|
|
Creates an empty, unlabeled Data object based on the passed meta-data, and allocates the space for the data entries. This constructor is meant to be used in conjunction with the setData() method, which lets you pass in the whole data set at once.
|
|
Creates an empty, unlabeled Data object; for adding entries incrementally, using addEntry (float *input).
|
|
Creates a Data object based on the passed meta-data, and fills it from the passed file stream. The data file may be text (ascii) or binary, and are unlabeled, i.e., no label definitions are included. If the data are binary, they are single-point floating point precision (i.e., C++ floats).
|
|
Copy constructor - Makes a new copy of the passed data set. Using this is a simple way to split out the ground truth component of the data. New storage is allocated for the feature and label data, but the dimNames and classNames are copied by reference.
|
|
For growing the data set incrementally; this method is for adding labeled samples.
|
|
For growing the data set incrementally; this method is for adding unlabeled samples.
|
|
Throws out all samples of the specified class, shrinking the data set accordingly. This is used for example to restrict the data to the ground truth samples only, assuming the unlabeled samples have a particular label.
|
|
Returns the NameList describing the data set's class labels.
|
|
The converse of setData() - Returns a reference to the data array.
|
|
Returns the NameList describing the data set's features.
|
|
Returns the specified data entry.
|
|
Retrieves the label at a given position (only returns the first label).
|
|
Retrieves a set of labels for a given position.
|
|
Returns the number of samples in the data set.
|
|
Writes a histogram by class of the data set.
|
|
Prints the contents of the data set to the passed stream.
|
|
Loads data and labels from the passed input stream. The data file is assumed to have no header, so the meta-data should have been set during the constructor call.
|
|
Loads the data (no labels) from the passed input stream. The data file is assumed to have no header, so the meta-data should have been set during the constructor call. A heartbeat '.' is printed every 10,000 samples loaded.
|
|
Reads a binary data matrix (as floats or doubles), optionally followed by a binary array of labels (as integers). The data file is assumed to have no header, so the meta-data should have been set during the constructor call.
|
|
Loads a set of labels from the passed input stream. The stream should have one line per data sample, i.e., there should be numEntries lines, and may have more than one label per line, separated by white-space. This is currently the most flexible approach to loading multiple-label sets.
|
|
Reads a binary array of labels (the must be numEntries of them). The labels may be specified as ints or shorts. Only one label per sample is currently supported.
|
|
Specifies a mapping to multiply the labels that are defined for each sample.
The contents of the passed input stream define the mapping to use. The syntax to use is as follows (comments prefixed by '#' characters may be interspersed): A NameList (e.g., "[ a b c ]") specifying the new set of labels, which should be a superset of the old list, with the old labels coming first in the list, i.e., if the old list is something like [ a b c ], then the new list should be something like [ a b c d e f ]. Then, one line for each of the old labels: On each line, first the old label line, followed by the set of labels that should be substituted for the old label, presumably including the old label, though this isn't required. As an example:
After calling this function, a data set with single labels will have each of those labels replaced by the appropriate set of one or more labels, as specified by the mapping in the file. |
|
Rescales each entry, so that each dimension's extrema are scaled to the range [0, 1]. The operation is performed one dimension at a time, i.e., each featural dimension is rescaled with respect to the extrema for that dimension, rather than with respect to the extrema of the entire data set. |
|
Generates a character-graphics plot of the data set and sends it to the passed output stream. This is only useful for small data sets. The result is a scatter plot of the data in the two dimensions specified.
|
|
Writes the data set to the passed output file stream. If a binary save is requested, the writeLabels and writeHeader flags are ignored, and the data are written as a matrix of floats, with the sample index varying more slowly than the dimension index. If an ASCII save is requested, then the writeLabels and writeHeader flags control what is written.
|
|
Given a Data object for which storage has been allocated, this method lets you pass in directly the matrix of floating point data values. In the passed matrix (Two dimensions mapped to a 1-D C array), the sample index varies more slowly than the dimension index.
|
|
For explicit access to the data set, this method lets you add a labeled sample at a specific location.
|
|
Allows the setting of sample labels. The passed array specifies an integer label for each sample. Multiple labels may be specified by calling this method multiple times, though in this case a label must be specified for each sample, so all samples must have the same number of labels.
|
|
Re-sorts the data set by increasing label index.
|
|
|
|
Prints the contents of the data set to the passed stream.
|