Kokkos Node API and Local Linear Algebra Kernels Version of the Day
First pass of KokkosNodeTsqr's factorization. More...
|FactorFirstPass (const MatView< LocalOrdinal, Scalar > &A, std::vector< std::vector< Scalar > > &tauArrays, std::vector< MatView< LocalOrdinal, Scalar > > &topBlocks, const CacheBlockingStrategy< LocalOrdinal, Scalar > &strategy, const int numPartitions, const bool contiguousCacheBlocks=false)|
|void||execute (const int partitionIndex)|
|First pass of intranode TSQR factorization. |
First pass of KokkosNodeTsqr's factorization.
|TSQR::details::FactorFirstPass< LocalOrdinal, Scalar >::FactorFirstPass||(||const MatView< LocalOrdinal, Scalar > &||A,|
|std::vector< std::vector< Scalar > > &||tauArrays,|
|std::vector< MatView< LocalOrdinal, Scalar > > &||topBlocks,|
|const CacheBlockingStrategy< LocalOrdinal, Scalar > &||strategy,|
|const bool||contiguousCacheBlocks =
|A||[in/out] On input: View of the matrix to factor. On output: (Part of) the implicitly stored Q factor. (The other part is tauArrays.)|
|tauArrays||[out] Where to write the "TAU" arrays (implicit factorization results) for each cache block. (TAU is what LAPACK's QR factorization routines call this array; see the LAPACK documentation for an explanation.) Indexed by the cache block index; one TAU array per cache block.|
|strategy||[in] Cache blocking strategy to use.|
|numPartitions||[in] Number of partitions (positive integer), and therefore the maximum parallelism available to the algorithm. Oversubscribing processors is OK, but should not be done to excess. This is an int, and not a LocalOrdinal, because it is the argument to Kokkos' parallel_for.|
|contiguousCacheBlocks||[in] Whether the cache blocks of A are stored contiguously.|
|void TSQR::details::FactorFirstPass< LocalOrdinal, Scalar >::execute||(||const int||partitionIndex||)||
First pass of intranode TSQR factorization.
Invoked by Kokkos' parallel_for template method. This routine parallelizes over contiguous partitions of the matrix. Each partition in turn contains cache blocks. Partitions do not break up cache blocks. (This ensures that the cache blocking scheme is the same as that used by SequentialTsqr, as long as the cache blocking strategies are the same. However, the implicit Q factor is not compatible with that of SequentialTsqr.)
This method also saves a view of the top block of the partition in the topBlocks_ array. This is useful for the next factorization pass.
|partitionIndex||[in] Zero-based index of the partition. If greater than or equal to the number of partitions, this routine does nothing.|