Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Member Functions | Protected Member Functions
TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType > Class Template Reference

Common interface and functionality for intranode TSQR. More...

#include <Tsqr_NodeTsqr.hpp>

Inheritance diagram for TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 NodeTsqr ()
 Constructor.
virtual ~NodeTsqr ()
 Virtual destructor, for memory safety of derived classes.
virtual size_t TEUCHOS_DEPRECATED cache_block_size () const =0
 Cache size hint (in bytes) used for the factorization.
virtual size_t cache_size_hint () const =0
 Cache size hint (in bytes) used for the factorization.
virtual std::string description () const
 One-line description of this object.
virtual factor_output_type factor (const Ordinal nrows, const Ordinal ncols, Scalar A[], const Ordinal lda, Scalar R[], const Ordinal ldr, const bool contiguousCacheBlocks) const =0
 Compute the QR factorization of A.
virtual void apply (const ApplyType &applyType, const Ordinal nrows, const Ordinal ncols_Q, const Scalar Q[], const Ordinal ldq, const FactorOutputType &factorOutput, const Ordinal ncols_C, Scalar C[], const Ordinal ldc, const bool contiguousCacheBlocks) const =0
 Apply the implicit Q factor from factor() to C.
virtual void explicit_Q (const Ordinal nrows, const Ordinal ncols_Q, const Scalar Q[], const Ordinal ldq, const factor_output_type &factorOutput, const Ordinal ncols_C, Scalar C[], const Ordinal ldc, const bool contiguousCacheBlocks) const =0
 Compute the explicit Q factor from the result of factor().
virtual void cache_block (const Ordinal nrows, const Ordinal ncols, Scalar A_out[], const Scalar A_in[], const Ordinal lda_in) const =0
 Cache block A_in into A_out.
virtual void un_cache_block (const Ordinal nrows, const Ordinal ncols, Scalar A_out[], const Ordinal lda_out, const Scalar A_in[]) const =0
 Un - cache block A_in into A_out.
virtual void Q_times_B (const Ordinal nrows, const Ordinal ncols, Scalar Q[], const Ordinal ldq, const Scalar B[], const Ordinal ldb, const bool contiguousCacheBlocks) const =0
 Compute Q*B.
virtual void fill_with_zeros (const Ordinal nrows, const Ordinal ncols, Scalar A[], const Ordinal lda, const bool contiguousCacheBlocks) const =0
 Fill the nrows by ncols matrix A with zeros.
template<class MatrixViewType >
MatrixViewType top_block (const MatrixViewType &C, const bool contiguous_cache_blocks) const
 Return view of topmost cache block of C.
virtual bool QR_produces_R_factor_with_nonnegative_diagonal () const =0
 Does factor() compute R with nonnegative diagonal?
Ordinal reveal_R_rank (const Ordinal ncols, Scalar R[], const Ordinal ldr, Scalar U[], const Ordinal ldu, const typename Teuchos::ScalarTraits< Scalar >::magnitudeType tol) const
 Reveal rank of TSQR's R factor.
Ordinal reveal_rank (const Ordinal nrows, const Ordinal ncols, Scalar Q[], const Ordinal ldq, Scalar R[], const Ordinal ldr, const typename Teuchos::ScalarTraits< Scalar >::magnitudeType tol, const bool contiguousCacheBlocks) const
 Compute rank-revealing decomposition.

Protected Member Functions

virtual ConstMatView< Ordinal,
Scalar > 
const_top_block (const ConstMatView< Ordinal, Scalar > &C, const bool contiguousCacheBlocks) const =0
 Return view of topmost cache block of C.

Detailed Description

template<class Ordinal, class Scalar, class FactorOutputType>
class TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >

Common interface and functionality for intranode TSQR.

NodeTsqr provides a generic interface for TSQR operations within a node ("intranode"). It also implements rank-revealing functionality used by all intranode TSQR implementations.

Template Parameters:
OrdinalThe (local) Ordinal type; the type of indices into a matrix on a node
ScalarTthe type of elements stored in the matrix
FactorOutputTypeThe type returned by factor().

We template on FactorOutputType for compile-time polymorphism. This lets subclasses define the factor() method, without constraining them to inherit their particular FactorOutputType from a common abstract base class. FactorOutputType is meant to be either just a simple composition of std::pair and std::vector, or a simple struct. Its contents are specific to each intranode TSQR implementation. and are not intended to be polymorphic, so it would not make sense for all the different FactorOutputType types to inherit from a common base class.

Templating on FactorOutputType means that we can't use run-time polymorphism to swap between NodeTsqr subclasses, since the latter are really subclasses of different NodeTsqr instantiations (i.e., different FactorOutputType types). However, inheriting from different specializations of NodeTsqr does enforce correct compile-time polymorphism in a syntactic way. It also avoids repeated code for common functionality. Full run-time polymorphism of different NodeTsqr subclasses would not be useful. This is because ultimately each subclass is bound to a Kokkos Node type, and those only use compile-time polymorphism.

Definition at line 82 of file Tsqr_NodeTsqr.hpp.


Constructor & Destructor Documentation

template<class Ordinal, class Scalar, class FactorOutputType>
TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::NodeTsqr ( ) [inline]

Constructor.

Definition at line 89 of file Tsqr_NodeTsqr.hpp.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::~NodeTsqr ( ) [inline, virtual]

Virtual destructor, for memory safety of derived classes.

Definition at line 92 of file Tsqr_NodeTsqr.hpp.


Member Function Documentation

template<class Ordinal, class Scalar, class FactorOutputType>
virtual size_t TEUCHOS_DEPRECATED TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::cache_block_size ( ) const [pure virtual]

Cache size hint (in bytes) used for the factorization.

This method is deprecated, because the name is misleading. Please call cache_size_hint() instead.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual size_t TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::cache_size_hint ( ) const [pure virtual]

Cache size hint (in bytes) used for the factorization.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual std::string TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::description ( ) const [inline, virtual]

One-line description of this object.

This implements Teuchos::Describable::description(). Subclasses should override this to provide a more specific description of their implementation. Subclasses may also implement Teuchos::Describable::describe(), which for this class has a simple default implementation that calls description() with appropriate indenting.

Reimplemented from Teuchos::Describable.

Reimplemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

Definition at line 111 of file Tsqr_NodeTsqr.hpp.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual factor_output_type TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::factor ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  A[],
const Ordinal  lda,
Scalar  R[],
const Ordinal  ldr,
const bool  contiguousCacheBlocks 
) const [pure virtual]

Compute the QR factorization of A.

The resulting Q factor is stored implicitly in two parts. The first part is stored in place in the A matrix, and thus overwrites the input matrix. The second part is stored in the returned factor_output_type object. Both parts must be passed into apply() or explicit_Q().

Parameters:
nrows[in] Number of rows in the matrix A to factor.
ncols[in] Number of columns in the matrix A to factor.
A[in/out] On input: the matrix to factor. It is stored either in column-major order with leading dimension (a.k.a. stride) lda, or with contiguous cache blocks (if contiguousCacheBlocks is true) according to the prevailing cache blocking strategy. Use the cache_block() method to convert a matrix in column-major order to the latter format, and the un_cache_block() method to convert it back. On output: part of the implicit representation of the Q factor. (The returned object is the other part of that representation.)
lda[in] Leading dimension (a.k.a. stride) of the matrix A to factor.
R[out] The ncols x ncols R factor.
ldr[in] leading dimension (a.k.a. stride) of the R factor.
contiguousCacheBlocks[in] Whether the cache blocks of A are stored contiguously. If you don't know what this means, put "false" here.
Returns:
Part of the implicit representation of the Q factor. The other part is the A matrix on output.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::apply ( const ApplyType applyType,
const Ordinal  nrows,
const Ordinal  ncols_Q,
const Scalar  Q[],
const Ordinal  ldq,
const FactorOutputType &  factorOutput,
const Ordinal  ncols_C,
Scalar  C[],
const Ordinal  ldc,
const bool  contiguousCacheBlocks 
) const [pure virtual]

Apply the implicit Q factor from factor() to C.

Parameters:
applyType[in] Whether to apply Q, Q^T, or Q^H to C.
nrows[in] Number of rows in Q and C.
ncols[in] Number of columns in in Q.
Q[in] Part of the implicit representation of the Q factor; the A matrix output of factor(). See the factor() documentation for details.
ldq[in] Leading dimension (a.k.a. stride) of Q, if Q is stored in column-major order (not contiguously cache blocked).
factorOutput[in] Return value of factor(), corresponding to Q.
ncols_C[in] Number of columns in the matrix C. This may be different than the number of columns in Q. There is no restriction on this value, but we optimize performance for the case ncols_C == ncols_Q.
C[in/out] On input: Matrix to which to apply the Q factor. On output: Result of applying the Q factor (or Q^T, or Q^H, depending on applyType) to C.
ldc[in] leading dimension (a.k.a. stride) of C, if C is stored in column-major order (not contiguously cache blocked).
contiguousCacheBlocks[in] Whether the cache blocks of Q and C are stored contiguously. If you don't know what this means, put "false" here.
template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::explicit_Q ( const Ordinal  nrows,
const Ordinal  ncols_Q,
const Scalar  Q[],
const Ordinal  ldq,
const factor_output_type &  factorOutput,
const Ordinal  ncols_C,
Scalar  C[],
const Ordinal  ldc,
const bool  contiguousCacheBlocks 
) const [pure virtual]

Compute the explicit Q factor from the result of factor().

This is equivalent to calling apply() on the first ncols_C columns of the identity matrix (suitably cache-blocked, if applicable).

Parameters:
nrows[in] Number of rows in Q and C.
ncols[in] Number of columns in in Q.
Q[in] Part of the implicit representation of the Q factor; the A matrix output of factor(). See the factor() documentation for details.
ldq[in] Leading dimension (a.k.a. stride) of Q, if Q is stored in column-major order (not contiguously cache blocked).
factorOutput[in] Return value of factor(), corresponding to Q.
ncols_C[in] Number of columns in the matrix C. This may be different than the number of columns in Q, in which case that number of columns of the Q factor will be computed. There is no restriction on this value, but we optimize performance for the case ncols_C == ncols_Q.
C[out] The first ncols_C columns of the Q factor.
ldc[in] leading dimension (a.k.a. stride) of C, if C is stored in column-major order (not contiguously cache blocked).
contiguousCacheBlocks[in] Whether the cache blocks of Q and C are stored contiguously. If you don't know what this means, put "false" here.
template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::cache_block ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  A_out[],
const Scalar  A_in[],
const Ordinal  lda_in 
) const [pure virtual]

Cache block A_in into A_out.

Parameters:
nrows[in] Number of rows in A_in and A_out.
ncols[in] Number of columns in A_in and A_out.
A_out[out] Result of cache-blocking A_in.
A_in[in] Matrix to cache block, stored in column-major order with leading dimension lda_in.
lda_in[in] Leading dimension of A_in. (See the LAPACK documentation for a definition of "leading dimension.") lda_in >= nrows.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::un_cache_block ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  A_out[],
const Ordinal  lda_out,
const Scalar  A_in[] 
) const [pure virtual]

Un - cache block A_in into A_out.

A_in is a matrix produced by cache_block(). It is organized as contiguously stored cache blocks. This method reorganizes A_in into A_out as an ordinary matrix stored in column-major order with leading dimension lda_out.

Parameters:
nrows[in] Number of rows in A_in and A_out.
ncols[in] Number of columns in A_in and A_out.
A_out[out] Result of un-cache-blocking A_in. Matrix stored in column-major order with leading dimension lda_out.
lda_out[in] Leading dimension of A_out. (See the LAPACK documentation for a definition of "leading dimension.") lda_out >= nrows.
A_in[in] Matrix to un-cache-block.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::Q_times_B ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  Q[],
const Ordinal  ldq,
const Scalar  B[],
const Ordinal  ldb,
const bool  contiguousCacheBlocks 
) const [pure virtual]

Compute Q*B.

Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual void TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::fill_with_zeros ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  A[],
const Ordinal  lda,
const bool  contiguousCacheBlocks 
) const [pure virtual]

Fill the nrows by ncols matrix A with zeros.

Fill the matrix A with zeros, in a way that respects the cache blocking scheme.

Parameters:
nrows[in] Number of rows in A
ncols[in] Number of columns in A
A[out] nrows by ncols column-major-order dense matrix with leading dimension lda
lda[in] Leading dimension of A: lda >= nrows
contiguousCacheBlocks[in] Whether the cache blocks in A are stored contiguously.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual ConstMatView<Ordinal, Scalar> TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::const_top_block ( const ConstMatView< Ordinal, Scalar > &  C,
const bool  contiguousCacheBlocks 
) const [protected, pure virtual]

Return view of topmost cache block of C.

Parameters:
C[in] Matrix (view), supporting the usual nrows(), ncols(), get(), lda() interface.
contiguousCacheBlocks[in] Whether the cache blocks in C are stored contiguously.

Return a view of the topmost cache block (on this node) of the given matrix C. This is not necessarily square, though it must have at least as many rows as columns. For a square ncols by ncols block, as needed by Tsqr::apply(), do as follows:

 MatrixViewType top = this->top_block (C, contig);
 MatView< Ordinal, Scalar > square (ncols, ncols, top.get(), top.lda());

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType>
template<class MatrixViewType >
MatrixViewType TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::top_block ( const MatrixViewType &  C,
const bool  contiguous_cache_blocks 
) const [inline]

Return view of topmost cache block of C.

Parameters:
C[in] View of a matrix C.
contiguousCacheBlocks[in] Whether the cache blocks in C are stored contiguously.

Return a view of the topmost cache block (on this node) of the given matrix C. This is not necessarily square, though it must have at least as many rows as columns. For a view of the first C.ncols() rows of that block, which methods like Tsqr::apply() need, do the following:

 MatrixViewType top = this->top_block (C, contig);
 MatView<Ordinal, Scalar> square (ncols, ncols, top.get(), top.lda());

Models for MatrixViewType are MatView and ConstMatView. MatrixViewType must have member functions nrows(), ncols(), get(), and lda(), and its constructor must take the same four arguments as the constructor of ConstMatView.

Definition at line 356 of file Tsqr_NodeTsqr.hpp.

template<class Ordinal, class Scalar, class FactorOutputType>
virtual bool TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::QR_produces_R_factor_with_nonnegative_diagonal ( ) const [pure virtual]

Does factor() compute R with nonnegative diagonal?

When using a QR factorization to orthogonalize a block of vectors, computing an R factor with nonnegative diagonal ensures that in exact arithmetic, the result of the orthogonalization (orthogonalized vectors Q and their coefficients R) are the same as would be produced by Gram-Schmidt orthogonalization.

This distinction is important because LAPACK's QR factorization (_GEQRF) may (and does, in practice) compute an R factor with negative diagonal entries.

Implemented in TSQR::KokkosNodeTsqr< LocalOrdinal, Scalar, NodeType >, and TSQR::SequentialTsqr< LocalOrdinal, Scalar >.

template<class Ordinal, class Scalar, class FactorOutputType >
Ordinal TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::reveal_R_rank ( const Ordinal  ncols,
Scalar  R[],
const Ordinal  ldr,
Scalar  U[],
const Ordinal  ldu,
const typename Teuchos::ScalarTraits< Scalar >::magnitudeType  tol 
) const

Reveal rank of TSQR's R factor.

Compute the singular value decomposition (SVD) $R = U \Sigma V^*$. This is done not in place, so that the original R is not affected. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Parameters:
ncols[in] Number of (rows and) columns in R.
R[in/out] ncols x ncols upper triangular matrix, stored in column-major order with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
U[out] Left singular vectors of the matrix R; an ncols x ncols matrix with leading dimension ldu.
ldu[in] Leading dimension of the matrix U.
tol[in] Numerical rank tolerance; relative to the largest nonzero singular value of R.
Returns:
Numerical rank of R: 0 <= rank <= ncols.

Definition at line 450 of file Tsqr_NodeTsqr.hpp.

template<class Ordinal, class Scalar, class FactorOutputType >
Ordinal TSQR::NodeTsqr< Ordinal, Scalar, FactorOutputType >::reveal_rank ( const Ordinal  nrows,
const Ordinal  ncols,
Scalar  Q[],
const Ordinal  ldq,
Scalar  R[],
const Ordinal  ldr,
const typename Teuchos::ScalarTraits< Scalar >::magnitudeType  tol,
const bool  contiguousCacheBlocks 
) const

Compute rank-revealing decomposition.

Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R ( $R = U \Sigma V^*$). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

Definition at line 618 of file Tsqr_NodeTsqr.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends