Module sql_ops::join::cuda_radix_join

Expand description

Radix join operators for GPUs.

The radix join is specified to join two relations. The current method results in a set of aggregates, i.e., one aggregate per thread. These must be summed up by the caller, e.g., on the CPU.

Specifically, the current join implementation performs:

SELECT SUM(s.value)
FROM r
JOIN s ON r.key = s.key

Note that r.value is inserted into the hash table in order to read all four columns. However, this wouldn’t be strictly necessary to correctly answer the query.

Hashing schemes

Perfect hashing (i.e., join key as array index) and bucket chaining are implemented. Linear probing is currently not implemented. Bucket chaining was used because the radix joins by Balkesen et al. and Sioulas et al. also use this scheme.

According to our measurements, the hashing scheme doesn’t affect the join performance (see the Triton join paper). This might change if a higher fanout is required to fit the hash table into shared memory, e.g., due to the load factor of linear probing.

Skew handling

One call to CudaRadixJoin::join processes all partitions. Before the join starts, we calculate how large each partition is and assign partitions evenly among the thread blocks, using a greedy algorithm. Then, each thread block processes the partitions assigned to it in parallel.

This assignment method thus handles skew, as long as none of the hash tables exceeds the shared memory capacity.

Handling a high degree of skew would require dynamic recursive partitioning. I.e., if a partition exceeds the shared memory capacity, it should be recursively partitioned until all subpartitions fit into shared memory. Alternatively, spilling (parts of) the hash table to GPU memory would be possible as well.

Structs

CudaRadixJoin

GPU radix join implementation in CUDA.

Traits

CudaRadixJoinable

Specifies that the implementing type can be used as a join key in CudaRadixJoin.