Anasazi Version of the Day
Public Member Functions
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType > Class Template Reference

Parallel Tall Skinny QR (TSQR) factorization. More...

#include <Tsqr.hpp>

List of all members.

Public Member Functions

 Tsqr (const node_tsqr_ptr &nodeTsqr, const dist_tsqr_ptr &distTsqr)
size_t cache_block_size () const
 Cache block size in bytes.
bool QR_produces_R_factor_with_nonnegative_diagonal () const
FactorOutput factor (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local[], const LocalOrdinal lda_local, Scalar R[], const LocalOrdinal ldr, const bool contiguousCacheBlocks=false)
 Compute QR factorization of the global dense matrix A.
void apply (const std::string &op, const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q, const Scalar Q_local[], const LocalOrdinal ldq_local, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C_local[], const LocalOrdinal ldc_local, const bool contiguousCacheBlocks=false)
 Apply Q factor to the global dense matrix C.
void explicit_Q (const LocalOrdinal nrows_local, const LocalOrdinal ncols_Q_in, const Scalar Q_local_in[], const LocalOrdinal ldq_local_in, const FactorOutput &factorOutput, const LocalOrdinal ncols_Q_out, Scalar Q_local_out[], const LocalOrdinal ldq_local_out, const bool contiguousCacheBlocks=false)
 Compute the explicit Q factor from factor()
void Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguousCacheBlocks=false) const
 Compute Q*B.
LocalOrdinal reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type tol) const
LocalOrdinal reveal_rank (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, Scalar R[], const LocalOrdinal ldr, const magnitude_type tol, const bool contiguousCacheBlocks=false) const
 Rank-revealing decomposition.
void cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const Scalar A_local_in[], const LocalOrdinal lda_local_in) const
 Cache-block A_in into A_out.
void un_cache_block (const LocalOrdinal nrows_local, const LocalOrdinal ncols, Scalar A_local_out[], const LocalOrdinal lda_local_out, const Scalar A_local_in[]) const
 Un-cache-block A_in into A_out.

Detailed Description

template<class LocalOrdinal, class Scalar, class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
class TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >

Parallel Tall Skinny QR (TSQR) factorization.

Parallel Tall Skinny QR (TSQR) factorization of a matrix distributed in block rows across one or more MPI processes. The parallel critical path length for TSQR is independent of the number of columns in the matrix, unlike ScaLAPACK's comparable QR factorization (P_GEQR2), Modified Gram-Schmidt, or Classical Gram-Schmidt.

LocalOrdinal: index type that can address all elements of a matrix (when treated as a 1-D array, so for A[i + LDA*j], the number i + LDA*j must fit in a LocalOrdinal).

Scalar: the type of the matrix entries.

NodeTsqrType: the intranode (single-node) part of Tsqr. Defaults to sequential cache-blocked TSQR. Any class implementing the same compile-time interface is valid. We provide NodeTsqr.hpp as an archetype of the "NodeTsqrType" concept, but it is not necessary that NodeTsqrType derive from that abstract base class.

DistTsqrType: the internode (across nodes) part of Tsqr. Any class implementing the same compile-time interface as the default template parameter class is valid.

Note:
TSQR only needs to know about the local ordinal type (used to index matrix entries on a single node), not about the global ordinal type (used to index matrix entries globally, i.e., over all nodes).

Definition at line 81 of file Tsqr.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::Tsqr ( const node_tsqr_ptr nodeTsqr,
const dist_tsqr_ptr distTsqr 
) [inline]

Constructor

Parameters:
nodeTsqr[in/out] Previously initialized NodeTsqrType object. This takes care of the intranode part of TSQR.
distTsqr[in/out] Previously initialized DistTsqrType object. This takes care of the internode part of TSQR.

Definition at line 108 of file Tsqr.hpp.


Member Function Documentation

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
size_t TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::cache_block_size ( ) const [inline]

Cache block size in bytes.

Cache block size (in bytes) used by the underlying intranode TSQR implementation.

Note:
This value may differ from the cache block size given to the constructor of the NodeTsqrType object, since that constructor input is merely a suggestion.

Definition at line 120 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
bool TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::QR_produces_R_factor_with_nonnegative_diagonal ( ) const [inline]

Whether or not all diagonal entries of the R factor computed by the QR factorization are guaranteed to be nonnegative.

Note:
This property holds if all QR factorization steps (both intranode and internode) produce an R factor with a nonnegative diagonal.

Definition at line 128 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
FactorOutput TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::factor ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local[],
const LocalOrdinal  lda_local,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguousCacheBlocks = false 
) [inline]

Compute QR factorization of the global dense matrix A.

Compute the QR factorization of the tall and skinny dense matrix A. The matrix A is distributed in a row block layout over all the MPI processes. A_local contains the matrix data for this process.

Parameters:
nrows_local[in] Number of rows of this node's local component (A_local) of the matrix. May differ on different nodes. Precondition: nrows_local >= ncols.
ncols[in] Number of columns in the matrix to factor. Should be the same on all nodes. Precondition: nrows_local >= ncols.
A_local[in,out] On input, this node's local component of the matrix, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor.
lda_local[in] Leading dimension of A_local. Precondition: lda_local >= nrows_local.
R[out] The final R factor of the QR factorization of the global matrix A. An ncols by ncols upper triangular matrix with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
contiguousCacheBlocks[in] Whether or not cache blocks of A_local are stored contiguously. The default value of false means that A_local uses ordinary column-major (Fortran-style) order. Otherwise, the details of the format depend on the specific NodeTsqrType. Tsqr's cache_block() and un_cache_block() methods may be used to convert between cache-blocked and non-cache-blocked (column-major order) formats.
Returns:
Part of the representation of the implicitly stored Q factor. It should be passed into apply() or explicit_Q() as the "factorOutput" parameter. The other part of the implicitly stored Q factor is stored in A_local (the input is overwritten). Both parts go together.

Definition at line 217 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::apply ( const std::string &  op,
const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols_Q,
const Scalar  Q_local[],
const LocalOrdinal  ldq_local,
const FactorOutput &  factor_output,
const LocalOrdinal  ncols_C,
Scalar  C_local[],
const LocalOrdinal  ldc_local,
const bool  contiguousCacheBlocks = false 
) [inline]

Apply Q factor to the global dense matrix C.

Apply the Q factor (computed by factor() and represented implicitly) to the global dense matrix C, consisting of all nodes' C_local matrices stacked on top of each other.

Parameters:
[in]If"N", compute Q*C. If "T", compute Q^T * C. If "H" or "C", compute Q^H * C. (The last option may not be implemented in all cases.)
nrows_local[in] Number of rows of this node's local component (C_local) of the matrix C. Should be the same on this node as the nrows_local argument with which factor() was called Precondition: nrows_local >= ncols.
ncols_Q[in] Number of columns in Q. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q.
Q_local[in] Same as A_local output of factor()
ldq_local[in] Same as lda_local of factor()
factor_output[in] Return value of factor()
ncols_C[in] Number of columns in C. Should be the same on all nodes. Precondition: nrows_local >= ncols_C.
C_local[in,out] On input, this node's local component of the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with this node's component of op(Q)*C, where op(Q) = Q, Q^T, or Q^H.
ldc_local[in] Leading dimension of C_local. Precondition: ldc_local >= nrows_local.
contiguousCacheBlocks[in] Whether or not the cache blocks of Q and C are stored contiguously.

Definition at line 274 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::explicit_Q ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols_Q_in,
const Scalar  Q_local_in[],
const LocalOrdinal  ldq_local_in,
const FactorOutput &  factorOutput,
const LocalOrdinal  ncols_Q_out,
Scalar  Q_local_out[],
const LocalOrdinal  ldq_local_out,
const bool  contiguousCacheBlocks = false 
) [inline]

Compute the explicit Q factor from factor()

Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_local_in and factor_output).

Parameters:
nrows_local[in] Number of rows of this node's local component (Q_local_in) of the matrix Q_local_in. Also, the number of rows of this node's local component (Q_local_out) of the output matrix. Should be the same on this node as the nrows_local argument with which factor() was called. Precondition: nrows_local >= ncols_Q_in.
ncols_Q_in[in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Should be the same on all nodes. Precondition: nrows_local >= ncols_Q_in.
Q_local_in[in] Same as A_local output of factor().
ldq_local_in[in] Same as lda_local of factor()
factorOutput[in] Return value of factor().
ncols_Q_out[in] Number of columns of the explicit Q factor to compute. Should be the same on all nodes.
Q_local_out[out] This node's component of the Q factor (in explicit form).
ldq_local_out[in] Leading dimension of Q_local_out.
contiguousCacheBlocks[in] Whether or not cache blocks in Q_local_in and Q_local_out are stored contiguously.

Definition at line 387 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::Q_times_B ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
const Scalar  B[],
const LocalOrdinal  ldb,
const bool  contiguousCacheBlocks = false 
) const [inline]

Compute Q*B.

Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.

Definition at line 427 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::reveal_R_rank ( const LocalOrdinal  ncols,
Scalar  R[],
const LocalOrdinal  ldr,
Scalar  U[],
const LocalOrdinal  ldu,
const magnitude_type  tol 
) const [inline]

Compute SVD $R = U \Sigma V^*$, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Returns:
Numerical rank of R: 0 <= rank <= ncols.

Definition at line 451 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
LocalOrdinal TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::reveal_rank ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
Scalar  R[],
const LocalOrdinal  ldr,
const magnitude_type  tol,
const bool  contiguousCacheBlocks = false 
) const [inline]

Rank-revealing decomposition.

Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R ( $R = U \Sigma V^*$). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Parameters:
R[in/out] On input: ncols by ncols upper triangular matrix with leading dimension ldr >= ncols. On output: if input is full rank, R is unchanged on output. Otherwise, if $R = U \Sigma V^*$ is the SVD of R, on output R is overwritten with $ V^*$. This is also an ncols by ncols matrix, but may not necessarily be upper triangular.
Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

Definition at line 486 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::cache_block ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local_out[],
const Scalar  A_local_in[],
const LocalOrdinal  lda_local_in 
) const [inline]

Cache-block A_in into A_out.

Cache-block the given A_in matrix, writing the results to A_out.

Definition at line 533 of file Tsqr.hpp.

template<class LocalOrdinal , class Scalar , class NodeTsqrType = SequentialTsqr< LocalOrdinal, Scalar >, class DistTsqrType = DistTsqr< LocalOrdinal, Scalar >>
void TSQR::Tsqr< LocalOrdinal, Scalar, NodeTsqrType, DistTsqrType >::un_cache_block ( const LocalOrdinal  nrows_local,
const LocalOrdinal  ncols,
Scalar  A_local_out[],
const LocalOrdinal  lda_local_out,
const Scalar  A_local_in[] 
) const [inline]

Un-cache-block A_in into A_out.

"Un"-cache-block the given A_in matrix, writing the results to A_out.

Definition at line 549 of file Tsqr.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends