module Make:
Oc45.S with the given comparable type as
a continuous data value.
Eg., to obtain a module Oc45.S working on floats, you can define
module FloatOc45 = Oc45.Make(sig type t=float let compare = compare end).
| Parameters: |
|
exception InvalidArgument of string
typefeature =int
exception BadContinuity of feature
exception DiscreteFeatOutOfBounds of feature * int
DiscreteFeatureOutOfBounds feat class is raised when trying to
classify a data vector with its discrete feature feat equal to class
when the tree was created assuming that the values for this feature would
remain < class.
This usually means that the value is very rare and was not encountered in
the training set, thus the limit inferred for the maximal value of this
feature is not high enough. You then have to set it manually, using
Oc45.S.setFeatureMax.typecategory =int
type contData
Oc45.Maketype dataVal =
| |
Discrete of |
| |
Continuous of |
typedata =dataVal array
type trainVal = {
|
data : |
(* |
Associates each feature id to its value. If the feature is
continuous, it may take any value; if the feature is discrete,
it must be an integer in a range 0..N inclusive for a bound N
inferred as the maximum of the given data. You can also set
this bound manually with
Oc45.S.setFeatureMax. | *) |
|
category : |
(* |
The category to which this data vector belongs.
| *) |
type trainSet
Oc45.S.emptyTrainSet, represents a training set
for the algorithm.type decisionTree
Oc45.S.c45.val c45 : trainSet -> decisionTreeval classify : decisionTree -> data -> categoryval emptyTrainSet : int -> int -> bool array -> trainSetemptyTrainSet nbFeatures nbCategories featContinuity creates an
empty train set with nbFeatures features and nbCategories
categories. The array featContinuity must have nbFeatures
elements, with a true value if
the corresponding feature is continuous (that is, may take any value) or
false if the feature is discrete in a restrained set (eg., "Yes"/"No").
Raises Oc45.S.InvalidArgument if featContinuity has not a length
of nbFeatures
val addData : trainVal -> trainSet -> trainSetval addDataList : trainVal list -> trainSet -> trainSetval getSet : trainSet -> trainVal listval setFeatureMax : int -> int -> trainSet -> unitsetFeatureMax feat maxVal trainSet sets the maximum value the
discrete feature feat may take. A discrete value is represented by
an integer between 0 and maxVal (inclusive).
In most cases, you won't have to call this function and the bound will be
automatically set to the maximum value you gave, but you can still set
it in case you need to have more values that are not represented.
val getNbFeatures : trainSet -> intval getFeatureMax : trainSet -> int arrayOc45.S.setFeatureMax.val getFeatContinuity : trainSet -> bool arrayOc45.S.emptyTrainSet.val getNbCategories : trainSet -> intval getSetSize : trainSet -> intval toDot : Format.formatter ->
(Format.formatter -> contData -> unit) -> decisionTree -> unitOc45.S.contData type (ie., the type of a continuous data).val toDotStdout : (Format.formatter -> contData -> unit) -> decisionTree -> unit