Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Member Functions | Protected Member Functions
TSQR::SequentialTsqr< LocalOrdinal, Scalar > Class Template Reference

Sequential cache-blocked TSQR factorization. More...

#include <Tsqr_SequentialTsqr.hpp>

Inheritance diagram for TSQR::SequentialTsqr< LocalOrdinal, Scalar >:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 SequentialTsqr (const size_t cacheSizeHint=0, const size_t sizeOfScalar=sizeof(Scalar))
 The standard constructor.
 SequentialTsqr (const CacheBlockingStrategy< LocalOrdinal, Scalar > &strategy)
 Alternate constructor for a given cache blocking strategy.
std::string description () const
 One-line description of this object.
bool QR_produces_R_factor_with_nonnegative_diagonal () const
 Does factor() compute R with nonnegative diagonal?
size_t cache_size_hint () const
 Cache size hint (in bytes) used for the factorization.
size_t TEUCHOS_DEPRECATED cache_block_size () const
 Cache size hint (in bytes) used for the factorization.
FactorOutput factor (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, const bool contiguous_cache_blocks) const
 Compute QR factorization (implicitly stored Q factor) of A.
void extract_R (const LocalOrdinal nrows, const LocalOrdinal ncols, const Scalar A[], const LocalOrdinal lda, Scalar R[], const LocalOrdinal ldr, const bool contiguous_cache_blocks) const
 Extract R factor from factor() results.
FactorOutput factor (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, Scalar R[], const LocalOrdinal ldr, const bool contiguous_cache_blocks) const
 Compute the QR factorization of the matrix A.
LocalOrdinal factor_num_cache_blocks (const LocalOrdinal nrows, const LocalOrdinal ncols, const Scalar A[], const LocalOrdinal lda, const bool contiguous_cache_blocks) const
 The number of cache blocks that factor() would use.
void apply (const ApplyType &apply_type, const LocalOrdinal nrows, const LocalOrdinal ncols_Q, const Scalar Q[], const LocalOrdinal ldq, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C[], const LocalOrdinal ldc, const bool contiguous_cache_blocks) const
 Apply the implicit Q factor to the matrix C.
void explicit_Q (const LocalOrdinal nrows, const LocalOrdinal ncols_Q, const Scalar Q[], const LocalOrdinal ldq, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C[], const LocalOrdinal ldc, const bool contiguous_cache_blocks) const
 Compute the explicit Q factor from the result of factor().
void Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguous_cache_blocks) const
 Compute Q := Q*B.
void cache_block (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A_out[], const Scalar A_in[], const LocalOrdinal lda_in) const
 Cache block A_in into A_out.
void un_cache_block (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A_out[], const LocalOrdinal lda_out, const Scalar A_in[]) const
 Un - cache block A_in into A_out.
void fill_with_zeros (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, const bool contiguous_cache_blocks) const
 Fill the nrows by ncols matrix A with zeros.
virtual void apply (const ApplyType &applyType, const LocalOrdinalnrows, const LocalOrdinalncols_Q, const Scalar Q[], const LocalOrdinalldq, const std::vector< std::vector< Scalar > > &factorOutput, const LocalOrdinalncols_C, Scalar C[], const LocalOrdinalldc, const bool contiguousCacheBlocks) const =0
 Apply the implicit Q factor from factor() to C.
virtual void explicit_Q (const LocalOrdinalnrows, const LocalOrdinalncols_Q, const Scalar Q[], const LocalOrdinalldq, const factor_output_type &factorOutput, const LocalOrdinalncols_C, Scalar C[], const LocalOrdinalldc, const bool contiguousCacheBlocks) const =0
 Compute the explicit Q factor from the result of factor().
MatrixViewType top_block (const MatrixViewType &C, const bool contiguous_cache_blocks) const
 Return view of topmost cache block of C.
LocalOrdinal reveal_R_rank (const LocalOrdinalncols, Scalar R[], const LocalOrdinalldr, Scalar U[], const LocalOrdinalldu, const typename Teuchos::ScalarTraits< Scalar >::magnitudeType tol) const
 Reveal rank of TSQR's R factor.
LocalOrdinal reveal_rank (const LocalOrdinalnrows, const LocalOrdinalncols, Scalar Q[], const LocalOrdinalldq, Scalar R[], const LocalOrdinalldr, const typename Teuchos::ScalarTraits< Scalar >::magnitudeType tol, const bool contiguousCacheBlocks) const
 Compute rank-revealing decomposition.

Protected Member Functions

ConstMatView< LocalOrdinal,
Scalar > 
const_top_block (const ConstMatView< LocalOrdinal, Scalar > &C, const bool contiguous_cache_blocks) const
 Return the topmost cache block of the matrix C.

Detailed Description

template<class LocalOrdinal, class Scalar>
class TSQR::SequentialTsqr< LocalOrdinal, Scalar >

Sequential cache-blocked TSQR factorization.

Author:
Mark Hoemmen

TSQR (Tall Skinny QR) is a collection of different algorithms for computing the QR factorization of a "tall and skinny" matrix (with many more rows than columns). We use it in Trilinos as an orthogonalization method for Epetra_MultiVector and Tpetra::MultiVector. (In this context, TSQR is provided as an "OrthoManager" in Anasazi and Belos; you do not have to use it directly.) For details, see e.g., our 2008 University of California Berkeley technical report (Demmel, Grigori, Hoemmen, and Langou), or our Supercomputing 2009 paper (Demmel, Hoemmen, Mohiyuddin, and Yelick).

SequentialTsqr implements the "sequential TSQR" algorithm of the aforementioned 2008 technical report. It breaks up the matrix by rows into "cache blocks," and iterates over consecutive cache blocks. The input matrix may be in either the conventional LAPACK-style column-major layout, or in a "cache-blocked" layout. We provide conversion routines between these two formats. Users should not attempt to construct a matrix in the latter format themselves. In our experience, the performance difference between the two formats is not significant, but this may be different on different architectures.

SequentialTsqr is designed to be used as the "intranode TSQR" part of the full TSQR implementation in Tsqr. The Tsqr class can use any of various intranode TSQR implementations. SequentialTsqr is an appropriate choice when running in MPI-only mode. Other intranode TSQR implementations, such as TbbTsqr, are appropriate for hybrid parallelism (MPI + threads).

SequentialTsqr is unlikely to benefit from a multithreaded BLAS implementation. In fact, implementations of LAPACK's QR factorization generally do not show performance benefits from multithreading when factoring tall skinny matrices. (See our Supercomputing 2009 paper and my IPDPS 2011 paper.) This is why we built other intranode TSQR factorizations that do effectively exploit thread-level parallelism, such as TbbTsqr.

Note:
To implementers: SequentialTsqr cannot currently be a Teuchos::ParameterListAcceptorDefaultBase, because the latter uses RCP, and RCPs (more specifically, their reference counts) are not currently thread safe. TbbTsqr uses SequentialTsqr in parallel to implement each thread's cache-blocked TSQR. This can be fixed as soon as RCPs are made thread safe.

Definition at line 106 of file Tsqr_SequentialTsqr.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal, class Scalar>
TSQR::SequentialTsqr< LocalOrdinal, Scalar >::SequentialTsqr ( const size_t  cacheSizeHint = 0,
const size_t  sizeOfScalar = sizeof(Scalar) 
) [inline]

The standard constructor.

Parameters:
cacheSizeHint[in] Cache size hint in bytes to use in the sequential TSQR factorization. If 0, the implementation will pick a reasonable size. Good nondefault choices are the amount of per-CPU highest-level private cache, or the amount of lowest-level shared cache divided by the number of CPU cores sharing it. We recommend experimenting to find the best value. Too large a value is worse than too small a value, though an excessively small value will result in extra computation and may also cause a slow down.
sizeOfScalar[in] The number of bytes required to store a Scalar value. This is used to compute the dimensions of cache blocks. If sizeof(Scalar) correctly reports the size of the representation of Scalar in memory, you can use the default. The default is correct for float, double, and any of various fixed-length structs (like double-double and quad-double). It should also work for std::complex<T> where T is anything in the previous sentence's list. It does <it>not</it> work for arbitrary-precision types whose storage is dynamically allocated, even if the amount of storage is a constant. In the latter case, you should specify a nondefault value.
Note:
sizeOfScalar affects performance, not correctness (more or less -- it should never be zero, for example). It's OK for it to be a slight overestimate. Being much too big may affect performance by underutilizing the cache. Being too small may also affect performance by thrashing the cache.
If Scalar is an arbitrary-precision type whose representation length can change at runtime, you should construct a new SequentialTsqr object whenever the representation length changes.

Definition at line 252 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
TSQR::SequentialTsqr< LocalOrdinal, Scalar >::SequentialTsqr ( const CacheBlockingStrategy< LocalOrdinal, Scalar > &  strategy) [inline]

Alternate constructor for a given cache blocking strategy.

The cache blocking strategy stores the same information as would be passed into the standard constructor: the cache block size, and the size of the Scalar type.

Parameters:
strategy[in] Cache blocking strategy to use (copied).

Definition at line 265 of file Tsqr_SequentialTsqr.hpp.


Member Function Documentation

template<class LocalOrdinal, class Scalar>
std::string TSQR::SequentialTsqr< LocalOrdinal, Scalar >::description ( ) const [inline, virtual]

One-line description of this object.

This implements Teuchos::Describable::description(). For now, SequentialTsqr uses the default implementation of Teuchos::Describable::describe().

Reimplemented from TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 274 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
bool TSQR::SequentialTsqr< LocalOrdinal, Scalar >::QR_produces_R_factor_with_nonnegative_diagonal ( ) const [inline, virtual]

Does factor() compute R with nonnegative diagonal?

See the NodeTsqr documentation for details.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 285 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
size_t TSQR::SequentialTsqr< LocalOrdinal, Scalar >::cache_size_hint ( ) const [inline, virtual]

Cache size hint (in bytes) used for the factorization.

This may be different than the cache size hint argument specified in the constructor. SequentialTsqr treats that as a hint, not a command.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 295 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
size_t TEUCHOS_DEPRECATED TSQR::SequentialTsqr< LocalOrdinal, Scalar >::cache_block_size ( ) const [inline, virtual]

Cache size hint (in bytes) used for the factorization.

This method is deprecated, because the name is misleading. Please call cache_size_hint() instead.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 303 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
FactorOutput TSQR::SequentialTsqr< LocalOrdinal, Scalar >::factor ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A[],
const LocalOrdinal  lda,
const bool  contiguous_cache_blocks 
) const [inline]

Compute QR factorization (implicitly stored Q factor) of A.

Compute the QR factorization in place of the nrows by ncols matrix A, with nrows >= ncols. The matrix A is stored either in column-major order (the default) or with contiguous column-major cache blocks, with leading dimension lda >= nrows. Write the resulting R factor to the top block of A (in place). (You can get a view of this via the top_block() method.) Everything below the upper triangle of A is overwritten with part of the implicit representation of the Q factor. The other part of that representation is returned.

Parameters:
nrows[in] Number of rows in the matrix A.
ncols[in] Number of columns in the matrix A.
A[in/out] On input: the nrows by ncols matrix to factor. On output: part of the representation of the implicitly stored Q factor.
lda[in] Leading dimension of A, if A is stored in column-major order. Otherwise its value doesn't matter.
contiguous_cache_blocks[in] Whether the matrix A is stored in a contiguously cache-blocked format.
Returns:
Part of the representation of the implicitly stored Q factor. The complete representation includes A (on output). The FactorOutput and A go together.

Definition at line 333 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::extract_R ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const Scalar  A[],
const LocalOrdinal  lda,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguous_cache_blocks 
) const [inline]

Extract R factor from factor() results.

The five-argument version of factor() leaves the R factor in place in the matrix A. This method copies the R factor out of A into a separate matrix R in column-major order (regardless of whether A was stored with contiguous cache blocks).

Definition at line 380 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
FactorOutput TSQR::SequentialTsqr< LocalOrdinal, Scalar >::factor ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A[],
const LocalOrdinal  lda,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguous_cache_blocks 
) const [inline, virtual]

Compute the QR factorization of the matrix A.

See the NodeTsqr documentation for details. This version of factor() is more useful than the five-argument version, when using SequentialTsqr as the intranode TSQR implementation in Tsqr. The five-argument version is more useful when using SequentialTsqr inside of another intranode TSQR implementation, such as TbbTsqr.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 409 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::SequentialTsqr< LocalOrdinal, Scalar >::factor_num_cache_blocks ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const Scalar  A[],
const LocalOrdinal  lda,
const bool  contiguous_cache_blocks 
) const [inline]

The number of cache blocks that factor() would use.

The factor() method breaks the input matrix A into one or more cache blocks. This method reports how many cache blocks factor() would use, without actually factoring the matrix.

Parameters:
nrows[in] Number of rows in the matrix A.
ncols[in] Number of columns in the matrix A.
A[in] The matrix A. If contiguous_cache_blocks is false, A is stored in column-major order; otherwise, A is stored with contiguous cache blocks (as the cache_block() method would do).
lda[in] If the matrix A is stored in column-major order: the leading dimension (a.k.a. stride) of A. Otherwise, the value of this parameter doesn't matter.
contiguous_cache_blocks[in] Whether the cache blocks in the matrix A are stored contiguously.
Returns:
Number of cache blocks in the matrix A: a positive integer.

Definition at line 477 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::apply ( const ApplyType apply_type,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const FactorOutput &  factor_output,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguous_cache_blocks 
) const [inline]

Apply the implicit Q factor to the matrix C.

See the NodeTsqr documentation for details.

Definition at line 505 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::explicit_Q ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const FactorOutput &  factor_output,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguous_cache_blocks 
) const [inline]

Compute the explicit Q factor from the result of factor().

See the NodeTsqr documentation for details.

Definition at line 602 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::Q_times_B ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
const Scalar  B[],
const LocalOrdinal  ldb,
const bool  contiguous_cache_blocks 
) const [inline, virtual]

Compute Q := Q*B.

See the NodeTsqr documentation for details.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 638 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::cache_block ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A_out[],
const Scalar  A_in[],
const LocalOrdinal  lda_in 
) const [inline, virtual]

Cache block A_in into A_out.

Parameters:
nrows[in] Number of rows in A_in and A_out.
ncols[in] Number of columns in A_in and A_out.
A_out[out] Result of cache-blocking A_in.
A_in[in] Matrix to cache block, stored in column-major order with leading dimension lda_in.
lda_in[in] Leading dimension of A_in. (See the LAPACK documentation for a definition of "leading dimension.") lda_in >= nrows.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 692 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::un_cache_block ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A_out[],
const LocalOrdinal  lda_out,
const Scalar  A_in[] 
) const [inline, virtual]

Un - cache block A_in into A_out.

A_in is a matrix produced by cache_block(). It is organized as contiguously stored cache blocks. This method reorganizes A_in into A_out as an ordinary matrix stored in column-major order with leading dimension lda_out.

Parameters:
nrows[in] Number of rows in A_in and A_out.
ncols[in] Number of columns in A_in and A_out.
A_out[out] Result of un-cache-blocking A_in. Matrix stored in column-major order with leading dimension lda_out.
lda_out[in] Leading dimension of A_out. (See the LAPACK documentation for a definition of "leading dimension.") lda_out >= nrows.
A_in[in] Matrix to un-cache-block.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 719 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
void TSQR::SequentialTsqr< LocalOrdinal, Scalar >::fill_with_zeros ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A[],
const LocalOrdinal  lda,
const bool  contiguous_cache_blocks 
) const [inline, virtual]

Fill the nrows by ncols matrix A with zeros.

Fill the matrix A with zeros, in a way that respects the cache blocking scheme.

Parameters:
nrows[in] Number of rows in A
ncols[in] Number of columns in A
A[out] nrows by ncols column-major-order dense matrix with leading dimension lda
lda[in] Leading dimension of A: lda >= nrows
contiguous_cache_blocks[in] Whether the cache blocks in A are stored contiguously.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 742 of file Tsqr_SequentialTsqr.hpp.

template<class LocalOrdinal, class Scalar>
ConstMatView<LocalOrdinal, Scalar> TSQR::SequentialTsqr< LocalOrdinal, Scalar >::const_top_block ( const ConstMatView< LocalOrdinal, Scalar > &  C,
const bool  contiguous_cache_blocks 
) const [inline, protected, virtual]

Return the topmost cache block of the matrix C.

NodeTsqr's top_block() method must be implemented using subclasses' const_top_block() method, since top_block() is a template method and template methods cannot be virtual.

Parameters:
C[in] View of a matrix, with at least as many rows as columns.
contiguous_cache_blocks[in] Whether the cache blocks of C are stored contiguously.
Returns:
View of the topmost cache block of the matrix C.

Implements TSQR::NodeTsqr< LocalOrdinal, Scalar, std::vector< std::vector< Scalar > > >.

Definition at line 767 of file Tsqr_SequentialTsqr.hpp.

virtual void TSQR::NodeTsqr< LocalOrdinal , Scalar, std::vector< std::vector< Scalar > > >::apply ( const ApplyType applyType,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const std::vector< std::vector< Scalar > > &  factorOutput,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguousCacheBlocks 
) const [pure virtual, inherited]

Apply the implicit Q factor from factor() to C.

Parameters:
applyType[in] Whether to apply Q, Q^T, or Q^H to C.
nrows[in] Number of rows in Q and C.
ncols[in] Number of columns in in Q.
Q[in] Part of the implicit representation of the Q factor; the A matrix output of factor(). See the factor() documentation for details.
ldq[in] Leading dimension (a.k.a. stride) of Q, if Q is stored in column-major order (not contiguously cache blocked).
factorOutput[in] Return value of factor(), corresponding to Q.
ncols_C[in] Number of columns in the matrix C. This may be different than the number of columns in Q. There is no restriction on this value, but we optimize performance for the case ncols_C == ncols_Q.
C[in/out] On input: Matrix to which to apply the Q factor. On output: Result of applying the Q factor (or Q^T, or Q^H, depending on applyType) to C.
ldc[in] leading dimension (a.k.a. stride) of C, if C is stored in column-major order (not contiguously cache blocked).
contiguousCacheBlocks[in] Whether the cache blocks of Q and C are stored contiguously. If you don't know what this means, put "false" here.
virtual void TSQR::NodeTsqr< LocalOrdinal , Scalar, std::vector< std::vector< Scalar > > >::explicit_Q ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const factor_output_type &  factorOutput,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguousCacheBlocks 
) const [pure virtual, inherited]

Compute the explicit Q factor from the result of factor().

This is equivalent to calling apply() on the first ncols_C columns of the identity matrix (suitably cache-blocked, if applicable).

Parameters:
nrows[in] Number of rows in Q and C.
ncols[in] Number of columns in in Q.
Q[in] Part of the implicit representation of the Q factor; the A matrix output of factor(). See the factor() documentation for details.
ldq[in] Leading dimension (a.k.a. stride) of Q, if Q is stored in column-major order (not contiguously cache blocked).
factorOutput[in] Return value of factor(), corresponding to Q.
ncols_C[in] Number of columns in the matrix C. This may be different than the number of columns in Q, in which case that number of columns of the Q factor will be computed. There is no restriction on this value, but we optimize performance for the case ncols_C == ncols_Q.
C[out] The first ncols_C columns of the Q factor.
ldc[in] leading dimension (a.k.a. stride) of C, if C is stored in column-major order (not contiguously cache blocked).
contiguousCacheBlocks[in] Whether the cache blocks of Q and C are stored contiguously. If you don't know what this means, put "false" here.
MatrixViewType TSQR::NodeTsqr< LocalOrdinal , Scalar, std::vector< std::vector< Scalar > > >::top_block ( const MatrixViewType &  C,
const bool  contiguous_cache_blocks 
) const [inline, inherited]

Return view of topmost cache block of C.

Parameters:
C[in] View of a matrix C.
contiguousCacheBlocks[in] Whether the cache blocks in C are stored contiguously.

Return a view of the topmost cache block (on this node) of the given matrix C. This is not necessarily square, though it must have at least as many rows as columns. For a view of the first C.ncols() rows of that block, which methods like Tsqr::apply() need, do the following:

 MatrixViewType top = this->top_block (C, contig);
 MatView<Ordinal, Scalar> square (ncols, ncols, top.get(), top.lda());

Models for MatrixViewType are MatView and ConstMatView. MatrixViewType must have member functions nrows(), ncols(), get(), and lda(), and its constructor must take the same four arguments as the constructor of ConstMatView.

Definition at line 356 of file Tsqr_NodeTsqr.hpp.

LocalOrdinal TSQR::NodeTsqr< LocalOrdinal , Scalar, std::vector< std::vector< Scalar > > >::reveal_R_rank ( const LocalOrdinal  ncols,
Scalar  R[],
const LocalOrdinal  ldr,
Scalar  U[],
const LocalOrdinal  ldu,
const typename Teuchos::ScalarTraits< Scalar >::magnitudeType  tol 
) const [inherited]

Reveal rank of TSQR's R factor.

Compute the singular value decomposition (SVD) $R = U \Sigma V^*$. This is done not in place, so that the original R is not affected. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Parameters:
ncols[in] Number of (rows and) columns in R.
R[in/out] ncols x ncols upper triangular matrix, stored in column-major order with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
U[out] Left singular vectors of the matrix R; an ncols x ncols matrix with leading dimension ldu.
ldu[in] Leading dimension of the matrix U.
tol[in] Numerical rank tolerance; relative to the largest nonzero singular value of R.
Returns:
Numerical rank of R: 0 <= rank <= ncols.
LocalOrdinal TSQR::NodeTsqr< LocalOrdinal , Scalar, std::vector< std::vector< Scalar > > >::reveal_rank ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
Scalar  R[],
const LocalOrdinal  ldr,
const typename Teuchos::ScalarTraits< Scalar >::magnitudeType  tol,
const bool  contiguousCacheBlocks 
) const [inherited]

Compute rank-revealing decomposition.

Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R ( $R = U \Sigma V^*$). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends