Module type Oc45.S

module type S = sig .. end
Output signature of the functor Oc45.Make.


Data types

exception InvalidArgument of string
Raised when trying to construct a data set with wrong arguments.
type feature = int 
A feature id.
exception BadContinuity of feature
Raised when appending a data vector with a continuous feature instead of a discrete one, or the other way.
exception DiscreteFeatOutOfBounds of feature * int
DiscreteFeatureOutOfBounds feat class is raised when trying to classify a data vector with its discrete feature feat equal to class when the tree was created assuming that the values for this feature would remain < class. This usually means that the value is very rare and was not encountered in the training set, thus the limit inferred for the maximal value of this feature is not high enough. You then have to set it manually, using Oc45.S.setFeatureMax.
type category = int 
A category (ie. classification) id.
type contData 
The type of a continuous data, defined through Oc45.Make
type dataVal = 
| Discrete of int
| Continuous of contData
A value of a feature field.
type data = dataVal array 
A feature/data value association table.
type trainVal = {
   data : data; (*
Associates each feature id to its value. If the feature is continuous, it may take any value; if the feature is discrete, it must be an integer in a range 0..N inclusive for a bound N inferred as the maximum of the given data. You can also set this bound manually with Oc45.S.setFeatureMax.
*)
   category : category; (*
The category to which this data vector belongs.
*)
}
A value used to train the algorithm.
type trainSet 
Generated by Oc45.S.emptyTrainSet, represents a training set for the algorithm.
type decisionTree 
Output of Oc45.S.c45.

Main functions

val c45 : trainSet -> decisionTree
Generates a decision tree from a training set.
val classify : decisionTree -> data -> category
Classifies a data vector, given a decision tree.

Training set manipulation

val emptyTrainSet : int -> int -> bool array -> trainSet
emptyTrainSet nbFeatures nbCategories featContinuity creates an empty train set with nbFeatures features and nbCategories categories. The array featContinuity must have nbFeatures elements, with a true value if the corresponding feature is continuous (that is, may take any value) or false if the feature is discrete in a restrained set (eg., "Yes"/"No").

Raises Oc45.S.InvalidArgument if featContinuity has not a length of nbFeatures

val addData : trainVal -> trainSet -> trainSet
Adds the given value to the training set.
val addDataList : trainVal list -> trainSet -> trainSet
Adds a list of data vectors to the given training set.
val getSet : trainSet -> trainVal list
Extracts the data vector list from a training set.
val setFeatureMax : int -> int -> trainSet -> unit
setFeatureMax feat maxVal trainSet sets the maximum value the discrete feature feat may take. A discrete value is represented by an integer between 0 and maxVal (inclusive).

In most cases, you won't have to call this function and the bound will be automatically set to the maximum value you gave, but you can still set it in case you need to have more values that are not represented.

val getNbFeatures : trainSet -> int
Returns the number of features.
val getFeatureMax : trainSet -> int array
Returns the feature bound array, see Oc45.S.setFeatureMax.
val getFeatContinuity : trainSet -> bool array
Returns the feature continuity array, see Oc45.S.emptyTrainSet.
val getNbCategories : trainSet -> int
Returns the number of categories.
val getSetSize : trainSet -> int
Returns the number of training cases in a given training set.

Pretty-printing

val toDot : Format.formatter ->
(Format.formatter -> contData -> unit) -> decisionTree -> unit
Pretty-prints the given decision tree as a Dot file in the given formatter, using the second argument as a pretty-printer for the Oc45.S.contData type (ie., the type of a continuous data).
val toDotStdout : (Format.formatter -> contData -> unit) -> decisionTree -> unit
Same as Oc45.S.toDot, but prints directly to stdout.