case class RegressionSplitter(randomizePivotLocation: Boolean = false, rng: Random = Random) extends Splitter[Double] with Product with Serializable
Find the best split for regression problems.
The best split is the one that reduces the total weighted variance: totalVariance = N_left * \sigma_left2 + N_right * \sigma_right2 which, in scala-ish, would be: totalVariance = leftWeight * (leftSquareSum /leftWeight - (leftSum / leftWeight )2) + rightWeight * (rightSquareSum/rightWeight - (rightSum / rightWeight)2) Because we are comparing them, we can subtract off leftSquareSum + rightSquareSum, which yields the following simple expression after some simplification: totalVariance = -leftSum * leftSum / leftWeight - Math.pow(totalSum - leftSum, 2) / (totalWeight - leftWeight) which depends only on updates to leftSum and leftWeight (since totalSum and totalWeight are constant).
Created by maxhutch on 11/29/16.
- Alphabetic
- By Inheritance
- RegressionSplitter
- Serializable
- Product
- Equals
- Splitter
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new RegressionSplitter(randomizePivotLocation: Boolean = false, rng: Random = Random)
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def getBestCategoricalSplit(data: Seq[(Vector[AnyVal], Double, Double)], calculator: VarianceCalculator, index: Int, minCount: Int): (CategoricalSplit, Double)
Get find the best categorical splitter.
Get find the best categorical splitter.
- data
to split
- index
of the feature to split on
- returns
the best split of this feature
- def getBestSplit(data: Seq[(Vector[AnyVal], Double, Double)], numFeatures: Int, minInstances: Int): (Split, Double)
Get the best split, considering numFeature random features (w/o replacement)
Get the best split, considering numFeature random features (w/o replacement)
- data
to split
- numFeatures
to consider, randomly
- returns
a split object that optimally divides data
- Definition Classes
- RegressionSplitter → Splitter
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- val randomizePivotLocation: Boolean
- val rng: Random
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()