pub struct CudaUnifiedIterator2<'a, R: Copy + DeviceCopy, S: Copy + DeviceCopy> { /* private fields */ }
Expand description
CUDA iterator for two mutable unified memory inputs.
Prefetches data from main-memory to device memory on a chunk-sized granularity.
Preconditions
All inputs are required to have the same length.
Thread safety
Only one CPU thread is used within the iterator, thus thread-safety only
applies to the CUDA kernel. See the fold()
documentation for details.
Implementations
sourceimpl<'a, R: Copy + DeviceCopy, S: Copy + DeviceCopy> CudaUnifiedIterator2<'a, R, S>
impl<'a, R: Copy + DeviceCopy, S: Copy + DeviceCopy> CudaUnifiedIterator2<'a, R, S>
sourcepub fn fold<F>(&mut self, f: F) -> Result<CudaTransferStrategyMeasurement> where
F: FnMut((LaunchableSlice<'_, R>, LaunchableSlice<'_, S>), &Stream) -> Result<()>,
pub fn fold<F>(&mut self, f: F) -> Result<CudaTransferStrategyMeasurement> where
F: FnMut((LaunchableSlice<'_, R>, LaunchableSlice<'_, S>), &Stream) -> Result<()>,
Apply a GPU function that produces a single, final value.
fold()
takes two arguments: a data value, and a CUDA stream. In the
case of CudaUnifiedIterator2
, the data value is specified as a
two-tuple of launchable slices. The slices are guaranteed to have the
same length.
The function passed to fold()
is meant to launch a CUDA kernel function
on the given CUDA stream.
In contrast to Rust’s standard library fold()
iterator, the state in
this iterator is implicit in GPU memory.
Thread safety
Prefetching and kernel execution are asynchronous operations. They are performed on two or more CUDA streams to achieve parallelism, i.e.. prefetching and execution overlap.
However, Rust cannot guarantee thread-safety of CUDA kernels. Thus, the user must ensure that the CUDA kernels are safe to execute on multiple CUDA streams, e.g. by using atomic operations when accessing device memory.
Trait Implementations
Auto Trait Implementations
impl<'a, R, S> RefUnwindSafe for CudaUnifiedIterator2<'a, R, S> where
R: RefUnwindSafe,
S: RefUnwindSafe,
impl<'a, R, S> Send for CudaUnifiedIterator2<'a, R, S>
impl<'a, R, S> !Sync for CudaUnifiedIterator2<'a, R, S>
impl<'a, R, S> Unpin for CudaUnifiedIterator2<'a, R, S>
impl<'a, R, S> !UnwindSafe for CudaUnifiedIterator2<'a, R, S>
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcepub fn borrow_mut(&mut self) -> &mut T
pub fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more