Accumulate / Prefix Sum / Scan
Compute accumulated running totals along a sequence by applying a binary operator to all elements up to the current one; often used in GPU programming as a first step in finding / extracting subsets of data.
accumulate!
(in-place),accumulate
(allocating); inclusive or exclusive.Other names: prefix sum,
thrust::scan
, cumulative sum; inclusive (or exclusive) if the first element is included in the accumulation (or not).
Function signature:
accumulate!(op, v::AbstractGPUVector; init, inclusive::Bool=true,
block_size::Int=128,
temp_v::Union{Nothing, AbstractGPUVector}=nothing,
temp_flags::Union{Nothing, AbstractGPUVector}=nothing)
accumulate(op, v::AbstractGPUVector; init, inclusive::Bool=true,
block_size::Int=128,
temp_v::Union{Nothing, AbstractGPUVector}=nothing,
temp_flags::Union{Nothing, AbstractGPUVector}=nothing)
Example computing an inclusive prefix sum (the typical GPU "scan"):
import AcceleratedKernels as AK
using oneAPI
v = oneAPI.ones(Int32, 100_000)
AK.accumulate!(+, v, init=0)
The temporaries temp_v
and temp_flags
should both have at least (length(v) + 2 * block_size - 1) ÷ (2 * block_size)
elements; eltype(v) === eltype(temp_v)
; the elements in temp_flags
can be any integers, but Int8
is used by default to reduce memory usage.