Multithreaded Task Partitioning
AcceleratedKernels.TaskPartitioner
— Typestruct TaskPartitioner
Partitioning num_elems
elements / jobs over maximum max_tasks
tasks with minimum min_elems
elements per task.
Methods
TaskPartitioner(num_elems, max_tasks=Threads.nthreads(), min_elems=1)
Fields
num_elems::Int64
max_tasks::Int64
min_elems::Int64
num_tasks::Int64
task_istarts::Vector{Int64}
Examples
using AcceleratedKernels: TaskPartitioner
# Divide 10 elements between 4 tasks
tp = TaskPartitioner(10, 4)
for i in 1:tp.num_tasks
@show tp[i]
end
# output
tp[i] = 1:3
tp[i] = 4:6
tp[i] = 7:8
tp[i] = 9:10
using AcceleratedKernels: TaskPartitioner
# Divide 20 elements between 6 tasks with minimum 5 elements per task.
# Not all tasks will be required
tp = TaskPartitioner(20, 6, 5)
for i in 1:tp.num_tasks
@show tp[i]
end
# output
tp[i] = 1:5
tp[i] = 6:10
tp[i] = 11:15
tp[i] = 16:20
AcceleratedKernels.task_partition
— Functiontask_partition(f, num_elems, max_tasks=Threads.nthreads(), min_elems=1)
task_partition(f, tp::TaskPartitioner)
Partition num_elems
jobs across at most num_tasks
parallel tasks with at least min_elems
per task, calling f(start_index:end_index)
, where the indices are between 1 and num_elems
.
Examples
A toy example showing outputs:
num_elems = 4
task_partition(println, num_elems)
# Output, possibly in a different order due to threading order
1:1
4:4
2:2
3:3
This function is probably most useful with a do-block, e.g.:
task_partition(4) do irange
some_long_computation(param1, param2, irange)
end