Accumulate / Prefix Sum / Scan

Compute accumulated running totals along a sequence by applying a binary operator to all elements up to the current one; often used in GPU programming as a first step in finding / extracting subsets of data.

  • accumulate! (in-place), accumulate (allocating); inclusive or exclusive.

  • Other names: prefix sum, thrust::scan, cumulative sum; inclusive (or exclusive) if the first element is included in the accumulation (or not).

Function signature:

accumulate!(op, v::AbstractGPUVector; init, inclusive::Bool=true,
            block_size::Int=128,
            temp_v::Union{Nothing, AbstractGPUVector}=nothing,
            temp_flags::Union{Nothing, AbstractGPUVector}=nothing)
accumulate(op, v::AbstractGPUVector; init, inclusive::Bool=true,
           block_size::Int=128,
           temp_v::Union{Nothing, AbstractGPUVector}=nothing,
           temp_flags::Union{Nothing, AbstractGPUVector}=nothing)

Example computing an inclusive prefix sum (the typical GPU "scan"):

import AcceleratedKernels as AK
using oneAPI

v = oneAPI.ones(Int32, 100_000)
AK.accumulate!(+, v, init=0)

The temporaries temp_v and temp_flags should both have at least (length(v) + 2 * block_size - 1) ÷ (2 * block_size) elements; eltype(v) === eltype(temp_v); the elements in temp_flags can be any integers, but Int8 is used by default to reduce memory usage.