pub struct GpuRadixPartitioner { /* private fields */ }

Implementations

Creates a new CPU radix partitioner.

Computes the prefix sum.

The prefix sum performs a scan over all partitioning keys. It first computes a histogram. The prefix sum is computed from this histogram.

The prefix sum serves two main purposes:

  1. The prefix sums are used in partition as offsets in an array for the output partitions.
  2. The prefix sum can also be used to detect skew in the data.
Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

Computes the prefix sum on a partitioned relation, and copies the data.

The typical partitioning workflow first calls prefix_sum, and then calls partition. However, if performed over an interconnect, this workflow transfers the data twice.

With prefix_sum_and_copy_with_payload, the data can be copied to GPU memory and thus the data transfer occurs only once.

Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

Limitations

Currently only the Contiguous histogram algorithm is supported. The reason is that prefix_sum_and_copy_with_payload is typically used for small relations that fit into GPU memory. Thus the next step in the workflow is a SQL operator (e.g., join), which only takes a contiguous relation as input.

Computes the prefix sum on a partitioned relation, and transforms to a columnar format.

Layout transformation

Multi-pass partitioning requires the data in a columnar format. However, the first partitioning pass stores each partition in a row format.

prefix_sum_and_transform transforms the row format into a column format, in addition to computing the prefix sum.

Chunk concatenation

The transform concatenates chunked partitions into contiguous partitions. partition and SQL operators (e.g., join) expect contiguous input. This design reduces the number of operator variants required from (layouts * operators) to (layouts + operators).

Parallelism

The function is internally parallelized by the GPU. The function is not thread-safe for multiple callers.

Limitations

Currently only the Contiguous histogram algorithm is supported. See prefix_sum_and_copy_with_payload for details.

Preallocates the internal state of partition

Some partitioning variants use GPU memory buffers to hold internal state (e.g., HSSWWC). This state is lazy-allocated by the function and cached internally between function calls.

preallocate_partition_state allows eager allocation of the state for optimization purposes. Specifically, it’s sometimes possible to overlap the memory allocation with prefix_sum computation.

Radix-partitions a relation by its key attribute.

See the module-level documentation for details on the algorithm.

Post-conditions
  • partition_offsets becomes uninitialized due to memory swap. However, can be reused for prefix_sum.

Trait Implementations

Formats the value using the given formatter. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The alignment of pointer.

The type for initializers.

Initializes a with the given initializer. Read more

Dereferences the given pointer. Read more

Mutably dereferences the given pointer. Read more

Drops the object pointed to by the given pointer. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.