Skip to content

feat: minIter and checkConvergnceEachK arguments for Pregel and all the Pregel-based algorithms #833

@SemyonSinchenko

Description

@SemyonSinchenko

Is your feature request related to a problem? Please describe.
At the moment in Pregel there are only two approaches:

  • check the adaptive convergence on each pregel superstep
  • does not check it at all and rely fully on the maxIter

Describe the solution you would like
Two new options.

  1. minIter -- if minIter > 0 than skip the first minIter iterations of Pregel (do not do the convergence / earlyStopping checks)
  2. checkConvergnceEachK -- if specified, instead of checking convergence / early stopping on each iteration, do it on each K iteration

Component

  • Scala Core Internal
  • Scala API
  • Spark Connect Plugin
  • Infrastructure
  • PySpark Classic
  • PySpark Connect

Additional context
In most scenarios users well known an approximate minIter: if one wants shortest paths, they are typically knows well an approximate amount of hops. Specifying minIter bigger than required not a big problem IRL: with a price of a couple of additional iterations that changes nothing in result, users will get a significantly much faster iterations overall. For example, for a graph with sp 5 hops specifying minIter to 6 will add one additional iteration while eliminate an action in 5 iterations!

The same is true for checkConvergnceEachK -- for example, PageRank (that we did not implement yet in DF Pregel) may be significantly faster with checkConvergnceEachK=2: by the price of at most one additional superstep we are removing the additional isEmpty action in half of iterations. In the case of convergence in 10 rounds it will 1 additional iteration while removing action in 5!

While we cannot just change defaults until the 1.0 release (it will be a breaking change), we can add loud language to documentation and encourage users to change the values as well we can tune this two options for our own benchmarks.

Are you planning on creating a PR?

  • I'm willing to make a pull-request

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions