sort
and friends
Sorting algorithms with similar interface and default settings as the Julia Base ones, on GPUs:
sort!
(in-place),sort
(out-of-place)sortperm!
,sortperm
Other names:
sort
,sort_team
,sort_team_by_key
,stable_sort
or variations in Kokkos, RAJA, Thrust that I know of.
Function signature:
sort!(v::AbstractGPUVector;
lt=isless, by=identity, rev::Bool=false, order::Base.Order.Ordering=Base.Order.Forward,
block_size::Int=128, temp::Union{Nothing, AbstractGPUVector}=nothing)
sortperm!(ix::AbstractGPUVector, v::AbstractGPUVector;
lt=isless, by=identity, rev::Bool=false, order::Base.Order.Ordering=Base.Order.Forward,
block_size::Int=128, temp::Union{Nothing, AbstractGPUVector}=nothing)
Specific implementations that the interfaces above forward to:
merge_sort!
(in-place),merge_sort
(out-of-place) - sort arbitrary objects with custom comparisons.merge_sort_by_key!
,merge_sort_by_key
- sort a vector of keys along with a "payload", a vector of corresponding values.merge_sortperm!
,merge_sortperm
,merge_sortperm_lowmem!
,merge_sortperm_lowmem
- compute a sorting index permutation.
Function signature:
merge_sort!(v::AbstractGPUVector;
lt=(<), by=identity, rev::Bool=false, order::Ordering=Forward,
block_size::Int=128, temp::Union{Nothing, AbstractGPUVector}=nothing)
merge_sort_by_key!(keys::AbstractGPUVector, values::AbstractGPUVector;
lt=(<), by=identity, rev::Bool=false, order::Ordering=Forward,
block_size::Int=128,
temp_keys::Union{Nothing, AbstractGPUVector}=nothing,
temp_values::Union{Nothing, AbstractGPUVector}=nothing)
merge_sortperm!(ix::AbstractGPUVector, v::AbstractGPUVector;
lt=(<), by=identity, rev::Bool=false, order::Ordering=Forward,
inplace::Bool=false, block_size::Int=128,
temp_ix::Union{Nothing, AbstractGPUVector}=nothing,
temp_v::Union{Nothing, AbstractGPUVector}=nothing)
merge_sortperm_lowmem!(ix::AbstractGPUVector, v::AbstractGPUVector;
lt=(<), by=identity, rev::Bool=false, order::Ordering=Forward,
block_size::Int=128,
temp::Union{Nothing, AbstractGPUVector}=nothing)
Example:
import AcceleratedKernels as AK
using AMDGPU
v = ROCArray(rand(Int32, 100_000))
AK.sort!(v)
As GPU memory is more expensive, all functions in AcceleratedKernels.jl expose any temporary arrays they will use (the temp
argument); you can supply your own buffers to make the algorithms not allocate additional GPU storage, e.g.:
v = ROCArray(rand(Float32, 100_000))
temp = similar(v)
AK.sort!(v, temp=temp)