Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Types | Public Member Functions
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType > Class Template Reference

Parallel Tall Skinny QR (TSQR) factorization. More...

#include <Tsqr.hpp>

List of all members.

Public Types

typedef dist_tsqr_type::rank_type rank_type
 "Rank" here means MPI rank, not linear algebra rank.
typedef std::pair< NodeOutput,
DistOutput > 
FactorOutput
 Return value of factor().

Public Member Functions

 Tsqr (const node_tsqr_ptr &nodeTsqr, const dist_tsqr_ptr &distTsqr)
 Constructor.
size_t cache_size_hint () const
 Cache size hint in bytes used by the intranode part of TSQR.
size_t TEUCHOS_DEPRECATED cache_block_size () const
 Cache size hint in bytes used by the intranode part of TSQR.
bool QR_produces_R_factor_with_nonnegative_diagonal () const
 Does the R factor have a nonnegative diagonal?
template<class NodeType >
void factorExplicit (Kokkos::MultiVector< Scalar, NodeType > &A, Kokkos::MultiVector< Scalar, NodeType > &Q, Teuchos::SerialDenseMatrix< LocalOrdinal, Scalar > &R, const bool contiguousCacheBlocks)
 Compute QR factorization with explicit Q factor.
FactorOutput factor (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local[], const LocalOrdinal lda_local, Scalar R[], const LocalOrdinal ldr, const bool contiguousCacheBlocks=false)
 Compute QR factorization of the global dense matrix A.
void apply (const std::string &op, const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q, const Scalar Q_local[], const LocalOrdinal ldq_local, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C_local[], const LocalOrdinal ldc_local, const bool contiguousCacheBlocks=false)
 Apply Q factor to the global dense matrix C.
void explicit_Q (const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q_in, const Scalar Q_local_in[], const LocalOrdinal ldq_local_in, const FactorOutput &factorOutput, const LocalOrdinal ncols_Q_out, Scalar Q_local_out[], const LocalOrdinal ldq_local_out, const bool contiguousCacheBlocks=false)
 Compute the explicit Q factor from factor()
void Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguousCacheBlocks=false) const
 Compute Q*B.
LocalOrdinal reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type &tol) const
 Reveal the rank of the R factor, using the SVD.
template<class NodeType >
LocalOrdinal revealRank (Kokkos::MultiVector< Scalar, NodeType > &Q, Teuchos::SerialDenseMatrix< LocalOrdinal, Scalar > &R, const magnitude_type &tol, const bool contiguousCacheBlocks=false) const
 Rank-revealing decomposition.
void cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const Scalar A_local_in[], const LocalOrdinal lda_local_in) const
 Cache-block A_in into A_out.
void un_cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const LocalOrdinal lda_local_out, const Scalar A_local_in[]) const
 Un-cache-block A_in into A_out.

Detailed Description

template<class LocalOrdinal, class Scalar, class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
class TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >

Parallel Tall Skinny QR (TSQR) factorization.

This class computes the parallel Tall Skinny QR (TSQR) factorization of a matrix distributed in block rows across one or more MPI processes. The parallel critical path length for TSQR is independent of the number of columns in the matrix, unlike ScaLAPACK's comparable QR factorization (P_GEQR2), Modified Gram-Schmidt, or Classical Gram-Schmidt.

Template Parameters:
LocalOrdinalIndex type that can address all elements of a matrix, when treated as a 1-D array. That is, for A[i + LDA*j], the index i + LDA*j must fit in a LocalOrdinal.
ScalarThe type of the matrix entries.
NodeTsqrTypeThe intranode (single-node) part of TSQR. Defaults to SequentialTsqr, which provides a sequential cache-blocked implementation. Any class implementing the same compile-time interface is valid. We provide NodeTsqr as an archetype of the "NodeTsqrType" concept, but it is not necessary that NodeTsqrType derive from that abstract base class. Inheriting from NodeTsqr is useful, though, because it provides default implementations of some routines that are not performance-critical.
Note:
TSQR only needs to know about the local ordinal type (used to index matrix entries on a single node), not about the global ordinal type (used to index matrix entries globally, i.e., over all nodes). For some distributed linear algebra libraries, such as Epetra, the local and global ordinal types are the same (int, in the case of Epetra). For other distributed linear algebra libraries, such as Tpetra, the local and global ordinal types may be different.

Definition at line 88 of file Tsqr.hpp.


Member Typedef Documentation

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::rank_type

"Rank" here means MPI rank, not linear algebra rank.

Definition at line 105 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::FactorOutput

Return value of factor().

Part of the implicit representation of the Q factor returned by factor(). The other part of that representation is stored in the A matrix on output.

Definition at line 116 of file Tsqr.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::Tsqr ( const node_tsqr_ptr nodeTsqr,
const dist_tsqr_ptr distTsqr 
) [inline]

Constructor.

Parameters:
nodeTsqr[in/out] Previously initialized NodeTsqrType object. This takes care of the intranode part of TSQR.
distTsqr[in/out] Previously initialized DistTsqrType object. This takes care of the internode part of TSQR.

Definition at line 125 of file Tsqr.hpp.


Member Function Documentation

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
size_t TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::cache_size_hint ( ) const [inline]

Cache size hint in bytes used by the intranode part of TSQR.

This value may differ from the cache size hint given to the constructor of the NodeTsqrType object, since that constructor input is merely a suggestion.

Definition at line 136 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
size_t TEUCHOS_DEPRECATED TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::cache_block_size ( ) const [inline]

Cache size hint in bytes used by the intranode part of TSQR.

This method is deprecated; please call cache_size_hint() instead.

Definition at line 142 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
bool TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::QR_produces_R_factor_with_nonnegative_diagonal ( ) const [inline]

Does the R factor have a nonnegative diagonal?

Tsqr implements a QR factorization (of a distributed matrix). Some, but not all, QR factorizations produce an R factor whose diagonal may include negative entries. This Boolean tells you whether Tsqr promises to compute an R factor whose diagonal entries are all nonnegative.

Definition at line 154 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
template<class NodeType >
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::factorExplicit ( Kokkos::MultiVector< Scalar, NodeType > &  A,
Kokkos::MultiVector< Scalar, NodeType > &  Q,
Teuchos::SerialDenseMatrix< LocalOrdinal, Scalar > &  R,
const bool  contiguousCacheBlocks 
) [inline]

Compute QR factorization with explicit Q factor.

This method computes the "thin" QR factorization, like Matlab's [Q,R] = qr(A,0), of a matrix A with more rows than columns. Like Matlab, it computes the Q factor "explicitly," that is, as a matrix represented in the same format as in the input matrix A (rather than the implicit representation returned by factor()).

Calling this method may be faster than calling factor() and explicit_Q() in sequence, if you know that you only want the explicit version of the Q factor. This method is especially intended for orthogonalizing the columns of a Tpetra::MultiVector. It can also be used for an Epetra_MultiVector, if you put each node's data in a Kokkos::MultiVector first. (This does not require copying.)

Parameters:
A[in/out] On input: my node's part of the matrix to factor; the matrix is distributed over the participating processors. If contiguousCacheBlocks is false, my node's part of the matrix is stored in column-major order. Otherwise, it is stored in the contiguously cache-blocked format that would be computed by cache_block(). On output: overwritten with garbage (part of the implicit Q factor, which is not useful in this case).
Q[out] On output: my node's part of the explicit Q factor, stored in the same format as the input matrix A.
R[out] On output: the R factor, which is square with the same number of rows and columns as the number of columns in A. The R factor is replicated on all nodes.
contiguousCacheBlocks[in] Whether cache blocks of A (on input) and Q (on output) are stored contiguously. If you don't know what this means, set it to false.

Definition at line 201 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
FactorOutput TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::factor ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local[],
const LocalOrdinal  lda_local,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguousCacheBlocks = false 
) [inline]

Compute QR factorization of the global dense matrix A.

Compute the QR factorization of the tall and skinny dense matrix A. The matrix A is distributed in a row block layout over all the MPI processes. A_local contains the matrix data for this process.

Parameters:
nrows_local[in] Number of rows of this node's local component (A_local) of the matrix. May differ on different nodes. Precondition: nrows_local >= ncols.
ncols[in] Number of columns in the matrix to factor. Should be the same on all nodes. Precondition: nrows_local >= ncols.
A_local[in,out] On input, this node's local component of the matrix, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor.
lda_local[in] Leading dimension of A_local. Precondition: lda_local >= nrows_local.
R[out] The final R factor of the QR factorization of the global matrix A. An ncols by ncols upper triangular matrix with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
contiguousCacheBlocks[in] Whether or not cache blocks of A_local are stored contiguously. The default value of false means that A_local uses ordinary column-major (Fortran-style) order. Otherwise, the details of the format depend on the specific NodeTsqrType. Tsqr's cache_block() and un_cache_block() methods may be used to convert between cache-blocked and non-cache-blocked (column-major order) formats.
Returns:
Part of the representation of the implicitly stored Q factor. It should be passed into apply() or explicit_Q() as the "factorOutput" parameter. The other part of the implicitly stored Q factor is stored in A_local (the input is overwritten). Both parts go together.

Definition at line 354 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::apply ( const std::string &  op,
const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols_Q,
const Scalar  Q_local[],
const LocalOrdinal  ldq_local,
const FactorOutput factor_output,
const LocalOrdinal  ncols_C,
Scalar  C_local[],
const LocalOrdinal  ldc_local,
const bool  contiguousCacheBlocks = false 
) [inline]

Apply Q factor to the global dense matrix C.

Apply the Q factor (computed by factor() and represented implicitly) to the global dense matrix C, consisting of all nodes' C_local matrices stacked on top of each other.

Parameters:
[in]If"N", compute Q*C. If "T", compute Q^T * C. If "H" or "C", compute Q^H * C. (The last option may not be implemented in all cases.)
nrows_local[in] Number of rows of this node's local component (C_local) of the matrix C. Should be the same on this node as the nrows_local argument with which factor() was called Precondition: nrows_local >= ncols.
ncols_Q[in] Number of columns in Q. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q.
Q_local[in] Same as A_local output of factor()
ldq_local[in] Same as lda_local of factor()
factor_output[in] Return value of factor()
ncols_C[in] Number of columns in C. Should be the same on all nodes. Precondition: nrows_local >= ncols_C.
C_local[in,out] On input, this node's local component of the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with this node's component of op(Q)*C, where op(Q) = Q, Q^T, or Q^H.
ldc_local[in] Leading dimension of C_local. Precondition: ldc_local >= nrows_local.
contiguousCacheBlocks[in] Whether or not the cache blocks of Q and C are stored contiguously.

Definition at line 411 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::explicit_Q ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols_Q_in,
const Scalar  Q_local_in[],
const LocalOrdinal  ldq_local_in,
const FactorOutput factorOutput,
const LocalOrdinal  ncols_Q_out,
Scalar  Q_local_out[],
const LocalOrdinal  ldq_local_out,
const bool  contiguousCacheBlocks = false 
) [inline]

Compute the explicit Q factor from factor()

Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_local_in and factor_output).

Parameters:
nrows_local[in] Number of rows of this node's local component (Q_local_in) of the matrix Q_local_in. Also, the number of rows of this node's local component (Q_local_out) of the output matrix. Should be the same on this node as the nrows_local argument with which factor() was called. Precondition: nrows_local >= ncols_Q_in.
ncols_Q_in[in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q_in.
Q_local_in[in] Same as A_local output of factor().
ldq_local_in[in] Same as lda_local of factor()
factorOutput[in] Return value of factor().
ncols_Q_out[in] Number of columns of the explicit Q factor to compute. Should be the same on all nodes.
Q_local_out[out] This node's component of the Q factor (in explicit form).
ldq_local_out[in] Leading dimension of Q_local_out.
contiguousCacheBlocks[in] Whether or not cache blocks in Q_local_in and Q_local_out are stored contiguously.

Definition at line 524 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::Q_times_B ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
const Scalar  B[],
const LocalOrdinal  ldb,
const bool  contiguousCacheBlocks = false 
) const [inline]

Compute Q*B.

Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.

Definition at line 565 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::reveal_R_rank ( const LocalOrdinal  ncols,
Scalar  R[],
const LocalOrdinal  ldr,
Scalar  U[],
const LocalOrdinal  ldu,
const magnitude_type &  tol 
) const [inline]

Reveal the rank of the R factor, using the SVD.

Compute the singular value decomposition (SVD) of the R factor: $R = U \Sigma V^*$, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Returns:
Numerical rank of R: 0 <= rank <= ncols.

Definition at line 592 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
template<class NodeType >
LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::revealRank ( Kokkos::MultiVector< Scalar, NodeType > &  Q,
Teuchos::SerialDenseMatrix< LocalOrdinal, Scalar > &  R,
const magnitude_type &  tol,
const bool  contiguousCacheBlocks = false 
) const [inline]

Rank-revealing decomposition.

Using the R factor and explicit Q factor from factorExplicit(), compute the singular value decomposition (SVD) of R ( $R = U \Sigma V^*$). If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Parameters:
Q[in/out] On input: explicit Q factor computed by factorExplicit(). (Must be an orthogonal resp. unitary matrix.) On output: If R is of full numerical rank with respect to the tolerance tol, Q is unmodified. Otherwise, Q is updated so that the first rank columns of Q are a basis for the column space of A (the original matrix whose QR factorization was computed by factorExplicit()). The remaining columns of Q are a basis for the null space of A.
R[in/out] On input: ncols by ncols upper triangular matrix with leading dimension ldr >= ncols. On output: if input is full rank, R is unchanged on output. Otherwise, if $R = U \Sigma V^*$ is the SVD of R, on output R is overwritten with $ V^*$. This is also an ncols by ncols matrix, but may not necessarily be upper triangular.
tol[in] Relative tolerance for computing the numerical rank of the matrix R.
contiguousCacheBlocks[in] Whether or not the cache blocks of Q are stored contiguously. Defaults to false, which means that Q uses the ordinary column-major layout on each MPI process.
Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

Definition at line 649 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::cache_block ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local_out[],
const Scalar  A_local_in[],
const LocalOrdinal  lda_local_in 
) const [inline]

Cache-block A_in into A_out.

Cache-block the given A_in matrix, writing the results to A_out.

Definition at line 699 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr<LocalOrdinal, Scalar>>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType >::un_cache_block ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local_out[],
const LocalOrdinal  lda_local_out,
const Scalar  A_local_in[] 
) const [inline]

Un-cache-block A_in into A_out.

"Un"-cache-block the given A_in matrix, writing the results to A_out.

Definition at line 715 of file Tsqr.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends