Class StratifiedKFoldSplitter

java.lang.Object
org.neo4j.gds.ml.splitting.StratifiedKFoldSplitter

public class StratifiedKFoldSplitter extends Object
Splits an HugeLongArray of nodes into k NodeSplits, each of which contains a train set and a test set. Logically, the nodes are first divided into k nearly equal sized buckets, and for each NodeSplit, one of the buckets is taken as test set and the remaining ones concatenated into the train set. The split is stratified, meaning that if each node is seen as having a class given by targets.get(nodeId), then for each distinct class, each bucket contains roughly the same number of nodes with that class.
  • Constructor Details

    • StratifiedKFoldSplitter

      public StratifiedKFoldSplitter(int k, org.neo4j.gds.core.utils.paged.ReadOnlyHugeLongArray ids, org.eclipse.collections.api.block.function.primitive.LongToLongFunction targets, Optional<Long> randomSeed, SortedSet<Long> distinctInternalTargets)
  • Method Details

    • memoryEstimationForNodeSet

      public static org.neo4j.gds.core.utils.mem.MemoryEstimation memoryEstimationForNodeSet(int k, double trainFraction)
    • memoryEstimation

      public static org.neo4j.gds.core.utils.mem.MemoryEstimation memoryEstimation(int k, ToLongFunction<org.neo4j.gds.core.GraphDimensions> idsSetSizeExtractor)
    • splits

      public List<TrainingExamplesSplit> splits()