Struct sql_ops::partition::gpu_radix_partition::GpuRadixPartitioner
source · [−]pub struct GpuRadixPartitioner { /* private fields */ }
Implementations
sourceimpl GpuRadixPartitioner
impl GpuRadixPartitioner
sourcepub fn new(
prefix_sum_algorithm: GpuHistogramAlgorithm,
partition_algorithm: GpuRadixPartitionAlgorithm,
radix_bits: RadixBits,
grid_size: &GridSize,
block_size: &BlockSize,
dmem_buffer_bytes: usize
) -> Result<Self>
pub fn new(
prefix_sum_algorithm: GpuHistogramAlgorithm,
partition_algorithm: GpuRadixPartitionAlgorithm,
radix_bits: RadixBits,
grid_size: &GridSize,
block_size: &BlockSize,
dmem_buffer_bytes: usize
) -> Result<Self>
Creates a new CPU radix partitioner.
sourcepub fn prefix_sum<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_attr: LaunchableSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
pub fn prefix_sum<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_attr: LaunchableSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
Computes the prefix sum.
The prefix sum performs a scan over all partitioning keys. It first computes a histogram. The prefix sum is computed from this histogram.
The prefix sum serves two main purposes:
- The prefix sums are used in
partition
as offsets in an array for the output partitions. - The prefix sum can also be used to detect skew in the data.
Parallelism
The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.
sourcepub fn prefix_sum_and_copy_with_payload<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
src_partition_attr: LaunchableSlice<'_, T>,
src_payload_attr: LaunchableSlice<'_, T>,
dst_partition_attr: LaunchableMutSlice<'_, T>,
dst_payload_attr: LaunchableMutSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
pub fn prefix_sum_and_copy_with_payload<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
src_partition_attr: LaunchableSlice<'_, T>,
src_payload_attr: LaunchableSlice<'_, T>,
dst_partition_attr: LaunchableMutSlice<'_, T>,
dst_payload_attr: LaunchableMutSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
Computes the prefix sum on a partitioned relation, and copies the data.
The typical partitioning workflow first calls prefix_sum
, and then
calls partition
. However, if performed over an interconnect, this
workflow transfers the data twice.
With prefix_sum_and_copy_with_payload
, the data can be copied to GPU
memory and thus the data transfer occurs only once.
Parallelism
The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.
Limitations
Currently only the Contiguous
histogram algorithm is supported.
The reason is that prefix_sum_and_copy_with_payload
is typically used
for small relations that fit into GPU memory. Thus the next step in the
workflow is a SQL operator (e.g., join), which only takes a contiguous
relation as input.
sourcepub fn prefix_sum_and_transform<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_id: u32,
src_relation: &PartitionedRelation<Tuple<T, T>>,
dst_partition_attr: LaunchableMutSlice<'_, T>,
dst_payload_attr: LaunchableMutSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
pub fn prefix_sum_and_transform<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_id: u32,
src_relation: &PartitionedRelation<Tuple<T, T>>,
dst_partition_attr: LaunchableMutSlice<'_, T>,
dst_payload_attr: LaunchableMutSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
Computes the prefix sum on a partitioned relation, and transforms to a columnar format.
Layout transformation
Multi-pass partitioning requires the data in a columnar format. However, the first partitioning pass stores each partition in a row format.
prefix_sum_and_transform
transforms the row format into a column
format, in addition to computing the prefix sum.
Chunk concatenation
The transform concatenates chunked partitions into contiguous partitions.
partition
and SQL operators (e.g., join) expect contiguous input. This
design reduces the number of operator variants required from
(layouts * operators) to (layouts + operators).
Parallelism
The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.
Limitations
Currently only the Contiguous
histogram algorithm is supported. See
prefix_sum_and_copy_with_payload
for details.
sourcepub fn preallocate_partition_state<T: GpuRadixPartitionable>(
&mut self,
pass: RadixPass
) -> Result<()>
pub fn preallocate_partition_state<T: GpuRadixPartitionable>(
&mut self,
pass: RadixPass
) -> Result<()>
Preallocates the internal state of partition
Some partitioning variants use GPU memory buffers to hold internal state (e.g., HSSWWC). This state is lazy-allocated by the function and cached internally between function calls.
preallocate_partition_state
allows eager allocation of the state for
optimization purposes. Specifically, it’s sometimes possible to overlap
the memory allocation with prefix_sum
computation.
sourcepub fn partition<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_attr: LaunchableSlice<'_, T>,
payload_attr: LaunchableSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
partitioned_relation: &mut PartitionedRelation<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
pub fn partition<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_attr: LaunchableSlice<'_, T>,
payload_attr: LaunchableSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
partitioned_relation: &mut PartitionedRelation<Tuple<T, T>>,
stream: &Stream
) -> Result<()>
Radix-partitions a relation by its key attribute.
See the module-level documentation for details on the algorithm.
Post-conditions
partition_offsets
becomes uninitialized due to memory swap. However, can be reused forprefix_sum
.
Trait Implementations
Auto Trait Implementations
impl RefUnwindSafe for GpuRadixPartitioner
impl Send for GpuRadixPartitioner
impl Sync for GpuRadixPartitioner
impl Unpin for GpuRadixPartitioner
impl UnwindSafe for GpuRadixPartitioner
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcepub fn borrow_mut(&mut self) -> &mut T
pub fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more