Kokkos Node API and Local Linear Algebra Kernels Version of the Day
"First" pass of applying KokkosNodeTsqr's implicit Q factor. More...
|ApplyFirstPass (const ApplyType &applyType, const ConstMatView< LocalOrdinal, Scalar > &Q, const std::vector< std::vector< Scalar > > &tauArrays, const std::vector< MatView< LocalOrdinal, Scalar > > &topBlocks, const MatView< LocalOrdinal, Scalar > &C, const CacheBlockingStrategy< LocalOrdinal, Scalar > &strategy, const int numPartitions, const bool explicitQ=false, const bool contiguousCacheBlocks=false)|
|void||execute (const int partitionIndex)|
|First pass of applying intranode TSQR's implicit Q factor. |
"First" pass of applying KokkosNodeTsqr's implicit Q factor.
We call this ApplyFirstPass as a reminder that this algorithm has the same form as FactorFirstPass and uses the results of the latter, even though ApplyFirstPass is really the last pass of applying the implicit Q factor.
|TSQR::details::ApplyFirstPass< LocalOrdinal, Scalar >::ApplyFirstPass||(||const ApplyType &||applyType,|
|const ConstMatView< LocalOrdinal, Scalar > &||Q,|
|const std::vector< std::vector< Scalar > > &||tauArrays,|
|const std::vector< MatView< LocalOrdinal, Scalar > > &||topBlocks,|
|const MatView< LocalOrdinal, Scalar > &||C,|
|const CacheBlockingStrategy< LocalOrdinal, Scalar > &||strategy,|
|const bool||explicitQ =
|const bool||contiguousCacheBlocks =
|applyType||[in] Whether we are applying Q, Q^T, or Q^H.|
|A||[in/out] On input: View of the matrix to factor. On output: (Part of) the implicitly stored Q factor. (The other part is tauArrays.)|
|tauArrays||[in] Where to write the "TAU" arrays (implicit factorization results) for each cache block. (TAU is what LAPACK's QR factorization routines call this array; see the LAPACK documentation for an explanation.) Indexed by the cache block index; one TAU array per cache block.|
|strategy||[in] Cache blocking strategy to use.|
|numPartitions||[in] Number of partitions (positive integer), and therefore the maximum parallelism available to the algorithm. Oversubscribing processors is OK, but should not be done to excess. This is an int, and not a LocalOrdinal, because it is the argument to Kokkos' parallel_for.|
|contiguousCacheBlocks||[in] Whether the cache blocks of A are stored contiguously.|
|void TSQR::details::ApplyFirstPass< LocalOrdinal, Scalar >::execute||(||const int||partitionIndex||)||
First pass of applying intranode TSQR's implicit Q factor.
Invoked by Kokkos' parallel_for template method. This routine parallelizes over contiguous partitions of the C matrix. Each partition in turn contains cache blocks. We take care not to break up the cache blocks among partitions; this ensures that the cache blocking scheme is the same as SequentialTsqr uses. (However, the implicit Q factor is not compatible with that of SequentialTsqr.)
|partitionIndex||[in] Zero-based index of the partition which this instance of ApplyFirstPass is currently processing. If greater than or equal to the number of partitions, this routine does nothing.|