Roadmap / Future Plans

Help is very welcome for any of the below:

  • Automated optimisation / tuning of e.g. block_size for a given input; can be made algorithm-agnostic.

    • Maybe some thing like AK.@tune reduce(f, src, init=init, block_size=$block_size) block_size=(64, 128, 256, 512, 1024). Macro wizards help!

    • Or make it general like:

    AK.@tune begin
        reduce(f, src, init=init,
               block_size=$block_size,
               switch_below=$switch_below)
        block_size=(64, 128, 256, 512, 1024)
        switch_below=(1, 10, 100, 1000, 10000)
    end
  • Add performant multithreaded Julia implementations to all algorithms; e.g. foreachindex has one, any does not.

  • Any way to expose the warp-size from the backends? Would be useful in reductions.

  • Define default init values for often-used reductions? Or just expose higher-level functions like sum, minimum, etc.?

  • Add a performance regressions runner.

  • Other ideas? Post an issue, or open a discussion on the Julia Discourse.