Struct sql_ops::partition::gpu_radix_partition::GpuRadixPartitioner

source · [−]

pub struct GpuRadixPartitioner { /* private fields */ }

Implementations

source

impl GpuRadixPartitioner

source

pub fn new(
 prefix_sum_algorithm: GpuHistogramAlgorithm,
 partition_algorithm: GpuRadixPartitionAlgorithm,
 radix_bits: RadixBits,
 grid_size: &GridSize,
 block_size: &BlockSize,
 dmem_buffer_bytes: usize
) -> Result<Self>

Creates a new CPU radix partitioner.

source

pub fn prefix_sum<T: DeviceCopy + GpuRadixPartitionable>(
 &mut self,
 pass: RadixPass,
 partition_attr: LaunchableSlice<'_, T>,
 partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
 stream: &Stream
) -> Result<()>

Computes the prefix sum.

The prefix sum performs a scan over all partitioning keys. It first computes a histogram. The prefix sum is computed from this histogram.

The prefix sum serves two main purposes:

The prefix sums are used in partition as offsets in an array for the output partitions.
The prefix sum can also be used to detect skew in the data.

Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

source

pub fn prefix_sum_and_copy_with_payload<T: DeviceCopy + GpuRadixPartitionable>(
 &mut self,
 pass: RadixPass,
 src_partition_attr: LaunchableSlice<'_, T>,
 src_payload_attr: LaunchableSlice<'_, T>,
 dst_partition_attr: LaunchableMutSlice<'_, T>,
 dst_payload_attr: LaunchableMutSlice<'_, T>,
 partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
 stream: &Stream
) -> Result<()>

Computes the prefix sum on a partitioned relation, and copies the data.

The typical partitioning workflow first calls prefix_sum, and then calls partition. However, if performed over an interconnect, this workflow transfers the data twice.

With prefix_sum_and_copy_with_payload, the data can be copied to GPU memory and thus the data transfer occurs only once.

Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

Currently only the Contiguous histogram algorithm is supported. The reason is that prefix_sum_and_copy_with_payload is typically used for small relations that fit into GPU memory. Thus the next step in the workflow is a SQL operator (e.g., join), which only takes a contiguous relation as input.

source

pub fn prefix_sum_and_transform<T: DeviceCopy + GpuRadixPartitionable>(
 &mut self,
 pass: RadixPass,
 partition_id: u32,
 src_relation: &PartitionedRelation<Tuple<T, T>>,
 dst_partition_attr: LaunchableMutSlice<'_, T>,
 dst_payload_attr: LaunchableMutSlice<'_, T>,
 partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
 stream: &Stream
) -> Result<()>

Computes the prefix sum on a partitioned relation, and transforms to a columnar format.

Layout transformation

Multi-pass partitioning requires the data in a columnar format. However, the first partitioning pass stores each partition in a row format.

prefix_sum_and_transform transforms the row format into a column format, in addition to computing the prefix sum.

Chunk concatenation

The transform concatenates chunked partitions into contiguous partitions. partition and SQL operators (e.g., join) expect contiguous input. This design reduces the number of operator variants required from (layouts * operators) to (layouts + operators).

Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

Limitations

Currently only the Contiguous histogram algorithm is supported. See prefix_sum_and_copy_with_payload for details.

source

pub fn preallocate_partition_state<T: GpuRadixPartitionable>(
&mut self,
pass: RadixPass
) -> Result<()>

Preallocates the internal state of partition

Some partitioning variants use GPU memory buffers to hold internal state (e.g., HSSWWC). This state is lazy-allocated by the function and cached internally between function calls.

preallocate_partition_state allows eager allocation of the state for optimization purposes. Specifically, it’s sometimes possible to overlap the memory allocation with prefix_sum computation.

source

pub fn partition<T: DeviceCopy + GpuRadixPartitionable>(
 &mut self,
 pass: RadixPass,
 partition_attr: LaunchableSlice<'_, T>,
 payload_attr: LaunchableSlice<'_, T>,
 partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
 partitioned_relation: &mut PartitionedRelation<Tuple<T, T>>,
 stream: &Stream
) -> Result<()>

Radix-partitions a relation by its key attribute.

See the module-level documentation for details on the algorithm.

Post-conditions

partition_offsets becomes uninitialized due to memory swap. However, can be reused for prefix_sum.

Trait Implementations

source

impl Debug for GpuRadixPartitioner

source

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations

impl RefUnwindSafe for GpuRadixPartitioner

impl Send for GpuRadixPartitioner

impl Sync for GpuRadixPartitioner

impl Unpin for GpuRadixPartitioner

pub fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> Pointable for T

pub const ALIGN: usize

The alignment of pointer.

type Init = T

The type for initializers.

pub unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

pub unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

pub unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

source

impl<T, U> TryFrom for T where
U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

const: unstable · source

pub fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

source

impl<T, U> TryInto for T where
U: TryFrom<T>,

type Error = >::Error

The type returned in the event of a conversion error.

const: unstable · source

pub fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Struct sql_ops::partition::gpu_radix_partition::GpuRadixPartitioner

Implementations

impl GpuRadixPartitioner

pub fn new( prefix_sum_algorithm: GpuHistogramAlgorithm, partition_algorithm: GpuRadixPartitionAlgorithm, radix_bits: RadixBits, grid_size: &GridSize, block_size: &BlockSize, dmem_buffer_bytes: usize) -> Result<Self>

pub fn prefix_sum<T: DeviceCopy + GpuRadixPartitionable>( &mut self, pass: RadixPass, partition_attr: LaunchableSlice<'_, T>, partition_offsets: &mut PartitionOffsets<Tuple<T, T>>, stream: &Stream) -> Result<()>

pub fn preallocate_partition_state<T: GpuRadixPartitionable>( &mut self, pass: RadixPass) -> Result<()>

Trait Implementations

impl Debug for GpuRadixPartitioner

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations

impl RefUnwindSafe for GpuRadixPartitioner

impl Send for GpuRadixPartitioner

impl Sync for GpuRadixPartitioner

impl Unpin for GpuRadixPartitioner

impl UnwindSafe for GpuRadixPartitioner

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

pub fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

pub fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

pub fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

pub fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

pub fn into(self) -> U

impl<T> Pointable for T

pub const ALIGN: usize

type Init = T

pub unsafe fn init(init: <T as Pointable>::Init) -> usize

pub unsafe fn deref<'a>(ptr: usize) -> &'a T

pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

pub unsafe fn drop(ptr: usize)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

pub fn new(
prefix_sum_algorithm: GpuHistogramAlgorithm,
partition_algorithm: GpuRadixPartitionAlgorithm,
radix_bits: RadixBits,
grid_size: &GridSize,
block_size: &BlockSize,
dmem_buffer_bytes: usize
) -> Result<Self>

pub fn prefix_sum<T: DeviceCopy + GpuRadixPartitionable>(
&mut self,
pass: RadixPass,
partition_attr: LaunchableSlice<'_, T>,
partition_offsets: &mut PartitionOffsets<Tuple<T, T>>,
stream: &Stream
) -> Result<()>

pub fn preallocate_partition_state<T: GpuRadixPartitionable>(
&mut self,
pass: RadixPass
) -> Result<()>

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,