Enum sql_ops::partition::gpu_radix_partition::GpuRadixPartitionAlgorithm

source · [−]

pub enum GpuRadixPartitionAlgorithm {
    NC,
    LASWWC,
    SSWWCv2,
    SSWWCv2G,
    HSSWWCv4,
}

Expand description

Specifies the radix partition algorithm.

Variants

`NC`

Non-caching radix partition.

This is a standard, parallel radix partition algorithm.

`LASWWC`

Radix partitioning with look-ahead software write combining.

This algorithm reorders tuples in shared memory before writing them out to device memory. The purpose is to coalesce as many writes as possible, which can lead to higher throughput.

This algorithm was first described by Stehle and Jacobsen in “A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs”. It is also implemented by Sioulas et al. for “Hardware-conscious Hash-Joins on GPUs”, although the authors do not mention or cite it in the paper.

`SSWWCv2`

Radix partitioning with shared software write combining, version 2.

In version 1, a warp can block all other warps by holding locks on more than one buffer (i.e., leader candidates).

Version 2 tries to avoid blocking other warps by releasing all locks except one (i.e., the leader’s buffer lock).

`SSWWCv2G`

Radix partitioning with shared software write combining, version 2G.

Version 2G is exactly the same algorithm as version 2, but stores the SWWC buffers in device memory instead of in shared memory. The fanout can thus be scaled higher.

Ideally, the buffers are retained in the GPU L2 cache. Storing buffers in the L1 cache is infeasible for the targeted use-case of high fanouts, as the buffers are too large. To avoid polluting the L2 cache, reading input and writing output uses streaming load and store instructions.

`HSSWWCv4`

Radix partitioning with hierarchical shared software write combining, version 4.

Version 4 performs the buffer flush from dmem to memory asynchronously with double-buffering.

Double-buffering means that there are fanout + #warps dmem buffers. Thus each warp owns a spare buffer. When the dmem buffer of a partition is full, the warp that holds the lock exchanges the full dmem buffer for its empty spare buffer, and releases the lock. Only then does the warp flush the dmem buffer to CPU memory.

Double-buffering enables all warps to always make progress during the dmem flush, because there is always a (partially-) empty dmem buffer available.

Enum sql_ops::partition::gpu_radix_partition::GpuRadixPartitionAlgorithm

Variants

NC

LASWWC

SSWWCv2

SSWWCv2G

HSSWWCv4

Trait Implementations

impl Clone for GpuRadixPartitionAlgorithm

fn clone(&self) -> GpuRadixPartitionAlgorithm

fn clone_from(&mut self, source: &Self)

impl Debug for GpuRadixPartitionAlgorithm

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Copy for GpuRadixPartitionAlgorithm

Auto Trait Implementations

impl RefUnwindSafe for GpuRadixPartitionAlgorithm

impl Send for GpuRadixPartitionAlgorithm

impl Sync for GpuRadixPartitionAlgorithm

impl Unpin for GpuRadixPartitionAlgorithm

impl UnwindSafe for GpuRadixPartitionAlgorithm

Blanket Implementations

impl<T> Any for T where T: 'static + ?Sized,

pub fn type_id(&self) -> TypeId

impl<T> Borrow<T> for T where T: ?Sized,

pub fn borrow(&self) -> &T

impl<T> BorrowMut<T> for T where T: ?Sized,

pub fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

pub fn from(t: T) -> T

impl<T, U> Into<U> for T where U: From<T>,

pub fn into(self) -> U

impl<T> Pointable for T

pub const ALIGN: usize

type Init = T

pub unsafe fn init(init: <T as Pointable>::Init) -> usize

pub unsafe fn deref<'a>(ptr: usize) -> &'a T

pub unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

pub unsafe fn drop(ptr: usize)

impl<T> ToOwned for T where T: Clone,

type Owned = T

pub fn to_owned(&self) -> T

pub fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for T where U: Into<T>,

type Error = Infallible

pub fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for T where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

pub fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

`NC`

`LASWWC`

`SSWWCv2`

`SSWWCv2G`

`HSSWWCv4`

impl<T> Any for T where
T: 'static + ?Sized,

impl<T> Borrow<T> for T where
T: ?Sized,

impl<T> BorrowMut<T> for T where
T: ?Sized,

impl<T, U> Into<U> for T where
U: From<T>,

impl<T> ToOwned for T where
T: Clone,

impl<T, U> TryFrom<U> for T where
U: Into<T>,

impl<T, U> TryInto<U> for T where
U: TryFrom<T>,