Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Member Functions
TSQR::details::FactorFirstPass< LocalOrdinal, Scalar > Class Template Reference

First pass of KokkosNodeTsqr's factorization. More...

#include <Tsqr_KokkosNodeTsqr.hpp>

List of all members.

Public Member Functions

 FactorFirstPass (const MatView< LocalOrdinal, Scalar > &A, std::vector< std::vector< Scalar > > &tauArrays, std::vector< MatView< LocalOrdinal, Scalar > > &topBlocks, const CacheBlockingStrategy< LocalOrdinal, Scalar > &strategy, const int numPartitions, const bool contiguousCacheBlocks=false)
void execute (const int partitionIndex)
 First pass of intranode TSQR factorization.

Detailed Description

template<class LocalOrdinal, class Scalar>
class TSQR::details::FactorFirstPass< LocalOrdinal, Scalar >

First pass of KokkosNodeTsqr's factorization.

Mark Hoemmen

Definition at line 184 of file Tsqr_KokkosNodeTsqr.hpp.

Constructor & Destructor Documentation

template<class LocalOrdinal , class Scalar >
TSQR::details::FactorFirstPass< LocalOrdinal, Scalar >::FactorFirstPass ( const MatView< LocalOrdinal, Scalar > &  A,
std::vector< std::vector< Scalar > > &  tauArrays,
std::vector< MatView< LocalOrdinal, Scalar > > &  topBlocks,
const CacheBlockingStrategy< LocalOrdinal, Scalar > &  strategy,
const int  numPartitions,
const bool  contiguousCacheBlocks = false 
) [inline]


A[in/out] On input: View of the matrix to factor. On output: (Part of) the implicitly stored Q factor. (The other part is tauArrays.)
tauArrays[out] Where to write the "TAU" arrays (implicit factorization results) for each cache block. (TAU is what LAPACK's QR factorization routines call this array; see the LAPACK documentation for an explanation.) Indexed by the cache block index; one TAU array per cache block.
strategy[in] Cache blocking strategy to use.
numPartitions[in] Number of partitions (positive integer), and therefore the maximum parallelism available to the algorithm. Oversubscribing processors is OK, but should not be done to excess. This is an int, and not a LocalOrdinal, because it is the argument to Kokkos' parallel_for.
contiguousCacheBlocks[in] Whether the cache blocks of A are stored contiguously.

Definition at line 339 of file Tsqr_KokkosNodeTsqr.hpp.

Member Function Documentation

template<class LocalOrdinal , class Scalar >
void TSQR::details::FactorFirstPass< LocalOrdinal, Scalar >::execute ( const int  partitionIndex) [inline]

First pass of intranode TSQR factorization.

Invoked by Kokkos' parallel_for template method. This routine parallelizes over contiguous partitions of the matrix. Each partition in turn contains cache blocks. Partitions do not break up cache blocks. (This ensures that the cache blocking scheme is the same as that used by SequentialTsqr, as long as the cache blocking strategies are the same. However, the implicit Q factor is not compatible with that of SequentialTsqr.)

This method also saves a view of the top block of the partition in the topBlocks_ array. This is useful for the next factorization pass.

partitionIndex[in] Zero-based index of the partition. If greater than or equal to the number of partitions, this routine does nothing.
This routine almost certainly won't work in CUDA. If it does, it won't be efficient. If you are interested in a GPU TSQR routine, please contact the author (Mark Hoemmen <mhoemme@sandia.gov>) of this code to discuss the possibilities. For this reason, we have not added the KERNEL_PREFIX method prefix.
Unlike typical Kokkos work-data pairs (WDPs) passed into parallel_for, this one is not declared inline. This method is heavyweight enough that an inline declaration is unlikely to improve performance.

Definition at line 393 of file Tsqr_KokkosNodeTsqr.hpp.

The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends