object RegressionSplitter
Find the best split for regression problems.
The best split is the one that reduces the total weighted variance: totalVariance = N_left * \sigma_left2 + N_right * \sigma_right2 which, in scala-ish, would be: totalVariance = leftWeight * (leftSquareSum /leftWeight - (leftSum / leftWeight )2) + rightWeight * (rightSquareSum/rightWeight - (rightSum / rightWeight)2) Because we are comparing them, we can subtract off leftSquareSum + rightSquareSum, which yields the following simple expression after some simplification: totalVariance = -leftSum * leftSum / leftWeight - Math.pow(totalSum - leftSum, 2) / (totalWeight - leftWeight) which depends only on updates to leftSum and leftWeight (since totalSum and totalWeight are constant).
Created by maxhutch on 11/29/16.
- Alphabetic
- By Inheritance
- RegressionSplitter
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
getBestCategoricalSplit(data: Seq[(Vector[AnyVal], Double, Double)], calculator: VarianceCalculator, index: Int, minCount: Int): (CategoricalSplit, Double)
Get find the best categorical splitter.
Get find the best categorical splitter.
- data
to split
- index
of the feature to split on
- returns
the best split of this feature
-
def
getBestRealSplit(data: Seq[(Vector[AnyVal], Double, Double)], calculator: VarianceCalculator, index: Int, minCount: Int): (RealSplit, Double)
Find the best split on a continuous variable
Find the best split on a continuous variable
- data
to split
- index
of the feature to split on
- returns
the best split of this feature
-
def
getBestSplit(data: Seq[(Vector[AnyVal], Double, Double)], numFeatures: Int, minInstances: Int): (Split, Double)
Get the best split, considering numFeature random features (w/o replacement)
Get the best split, considering numFeature random features (w/o replacement)
- data
to split
- numFeatures
to consider, randomly
- returns
a split object that optimally divides data
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )