package splits
- Alphabetic
- Public
- All
Type Members
-
class
CategoricalSplit extends Split
Split based on inclusion in a set
-
class
NoSplit extends Split
If no split was found
-
class
RealSplit extends Split
Split based on a real value in the index position
-
trait
Split extends Serializable
Splits are used by decision trees to partition the input space
Value Members
-
object
ClassificationSplitter
Find the best split for classification problems.
Find the best split for classification problems.
Created by maxhutch on 12/2/16.
-
object
MultiTaskSplitter
Created by maxhutch on 11/29/16.
-
object
RegressionSplitter
Find the best split for regression problems.
Find the best split for regression problems.
The best split is the one that reduces the total weighted variance: totalVariance = N_left * \sigma_left2 + N_right * \sigma_right2 which, in scala-ish, would be: totalVariance = leftWeight * (leftSquareSum /leftWeight - (leftSum / leftWeight )2) + rightWeight * (rightSquareSum/rightWeight - (rightSum / rightWeight)2) Because we are comparing them, we can subtract off leftSquareSum + rightSquareSum, which yields the following simple expression after some simplification: totalVariance = -leftSum * leftSum / leftWeight - Math.pow(totalSum - leftSum, 2) / (totalWeight - leftWeight) which depends only on updates to leftSum and leftWeight (since totalSum and totalWeight are constant).
Created by maxhutch on 11/29/16.