Packages

case class BoltzmannSplitter(temperature: Double, rng: Random = Random) extends Splitter[Double] with Product with Serializable

Find a split for a regression problem

The splits are picked with a probability that is related to the reduction in variance: P(split) ~ exp[ - {remaining variance} / ({temperature} * {total variance}) ] recalling that the "variance" here is weighted by the sample size (so its really the sum of the square difference from the mean of that side of the split). This is analogous to simulated annealing and Metropolis-Hastings.

The motivation here is to reduce the correlation of the trees by making random choices between splits that are almost just as good as the strictly optimal one. Reducing the correlation between trees will reduce the variance in an ensemble method (e.g. random forests): the variance will both decrease more quickly with the tree count and will reach a lower floor. In this paragraph, we're using "variance" as in "bias-variance trade-off".

Division by the local total variance make the splitting behavior invariant to data size and the scale of the labels. That means, however, that you can't set the temperature based on a known absolute noise scale. For that, you'd want to divide by the total weight rather than the total variance.

TODO: allow the rescaling to happen based on the total weight instead of the total variance, as an option

Created by maxhutch on 11/29/16.

temperature

used to control how sensitive the probability of a split is to its change in variance. The temperature can be thought of as a hyperparameter.

Linear Supertypes
Serializable, Product, Equals, Splitter[Double], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. BoltzmannSplitter
  2. Serializable
  3. Product
  4. Equals
  5. Splitter
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new BoltzmannSplitter(temperature: Double, rng: Random = Random)

    temperature

    used to control how sensitive the probability of a split is to its change in variance. The temperature can be thought of as a hyperparameter.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  8. def getBestSplit(data: Seq[(Vector[AnyVal], Double, Double)], numFeatures: Int, minInstances: Int): (Split, Double)

    Get the a split probabalisticly, considering numFeature random features (w/o replacement), ensuring that the resulting partitions have at least minInstances in them

    Get the a split probabalisticly, considering numFeature random features (w/o replacement), ensuring that the resulting partitions have at least minInstances in them

    data

    to split

    numFeatures

    to consider, randomly

    minInstances

    minimum instances permitted in a post-split partition

    returns

    a split object that optimally divides data

    Definition Classes
    BoltzmannSplitterSplitter
  9. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  11. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  13. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. def productElementNames: Iterator[String]
    Definition Classes
    Product
  15. val rng: Random
  16. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  17. val temperature: Double
  18. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  19. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  20. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from Splitter[Double]

Inherited from AnyRef

Inherited from Any

Ungrouped