Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Types | Public Member Functions | Static Public Member Functions
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType > Class Template Reference

Intranode TSQR, parallelized with Intel TBB. More...

#include <TbbTsqr.hpp>

Inheritance diagram for TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >:
Inheritance graph
[legend]

List of all members.

Public Types

typedef TbbParallelTsqr
< LocalOrdinal, Scalar,
TimerType >::FactorOutput 
FactorOutput
 Type of partial output of TBB TSQR.

Public Member Functions

size_t ncores () const
 (Max) number of cores used for the factorization.
size_t cache_size_hint () const
 Cache size hint (in bytes) used for the factorization.
size_t TEUCHOS_DEPRECATED cache_block_size () const
 Cache size hint (in bytes) used for the factorization.
 TbbTsqr (const size_t numCores, const size_t cacheSizeHint=0)
 Constructor; sets up tuning parameters.
std::string description () const
 One-line description of this object.
FactorOutput factor (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, Scalar R[], const LocalOrdinal ldr, const bool contiguous_cache_blocks) const
 Compute QR factorization of the dense matrix A.
void apply (const ApplyType &apply_type, const LocalOrdinal nrows, const LocalOrdinal ncols_Q, const Scalar Q[], const LocalOrdinal ldq, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C[], const LocalOrdinal ldc, const bool contiguous_cache_blocks) const
 Apply Q factor to the global dense matrix C.
void explicit_Q (const LocalOrdinal nrows, const LocalOrdinal ncols_Q_in, const Scalar Q_in[], const LocalOrdinal ldq_in, const FactorOutput &factor_output, const LocalOrdinal ncols_Q_out, Scalar Q_out[], const LocalOrdinal ldq_out, const bool contiguous_cache_blocks) const
 Compute the explicit Q factor from factor()
void Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguous_cache_blocks) const
 Compute Q*B.
LocalOrdinal reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type tol) const
LocalOrdinal reveal_rank (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, Scalar R[], const LocalOrdinal ldr, const magnitude_type tol, const bool contiguous_cache_blocks) const
 Rank-revealing decomposition.

Static Public Member Functions

static bool QR_produces_R_factor_with_nonnegative_diagonal ()

Detailed Description

template<class LocalOrdinal, class Scalar, class TimerType = Teuchos::Time>
class TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >

Intranode TSQR, parallelized with Intel TBB.

TSQR factorization for a dense, tall and skinny matrix stored on a single node. Parallelized using Intel's Threading Building Blocks.

Note:
TSQR only needs to know about the local ordinal type (LocalOrdinal), not about the global ordinal type. TimerType may be any class with the same interface as TrivialTimer; it times the divide-and-conquer base cases (the operations on each CPU core within the thread-parallel implementation).

Definition at line 65 of file TbbTsqr.hpp.


Member Typedef Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::FactorOutput

Type of partial output of TBB TSQR.

If you don't have TBB available, you can test this class by substituting in "typename TbbRecursiveTsqr<LocalOrdinal, Scalar>::FactorOutput" for the typedef's definition. If you do this, you should also change the type of impl_ above.

Definition at line 96 of file TbbTsqr.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::TbbTsqr ( const size_t  numCores,
const size_t  cacheSizeHint = 0 
) [inline]

Constructor; sets up tuning parameters.

Parameters:
numCores[in] Maximum number of processing cores to use when factoring the matrix. Fewer cores may be used if the matrix is not big enough to justify their use.
cacheSizeHint[in] Cache block size hint (in bytes) to use in the sequential part of TSQR. If zero or not specified, a reasonable default is used. If each CPU core has a private cache, that cache's size (minus a little wiggle room) would be the appropriate value for this parameter. Set to zero for the implementation to choose a reasonable default.

Definition at line 125 of file TbbTsqr.hpp.


Member Function Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::ncores ( ) const [inline]

(Max) number of cores used for the factorization.

Definition at line 99 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::cache_size_hint ( ) const [inline]

Cache size hint (in bytes) used for the factorization.

Definition at line 102 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TEUCHOS_DEPRECATED TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::cache_block_size ( ) const [inline]

Cache size hint (in bytes) used for the factorization.

This method is deprecated, because the name is misleading. Please call cache_size_hint() instead.

Definition at line 108 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
static bool TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::QR_produces_R_factor_with_nonnegative_diagonal ( ) [inline, static]

Whether or not this QR factorization produces an R factor with all nonnegative diagonal entries.

Definition at line 137 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
std::string TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::description ( ) const [inline, virtual]

One-line description of this object.

This implements Teuchos::Describable::description(). For now, SequentialTsqr uses the default implementation of Teuchos::Describable::describe().

Reimplemented from Teuchos::Describable.

Definition at line 147 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
FactorOutput TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::factor ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A[],
const LocalOrdinal  lda,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguous_cache_blocks 
) const [inline]

Compute QR factorization of the dense matrix A.

Compute the QR factorization of the dense matrix A.

Parameters:
nrows[in] Number of rows of A. Precondition: nrows >= ncols.
ncols[in] Number of columns of A. Precondition: nrows >= ncols.
A[in,out] On input, the matrix to factor, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor.
lda[in] Leading dimension of A. Precondition: lda >= nrows.
R[out] The final R factor of the QR factorization of the matrix A. An ncols by ncols upper triangular matrix stored in column-major order, with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
b_contiguous_cache_blocks[in] Whether cache blocks are stored contiguously in the input matrix A and the output matrix Q (of explicit_Q()). If not and you want them to be, you should use the cache_block() method to copy them into that format. You may use the un_cache_block() method to copy them out of that format into the usual column-oriented format.
Returns:
FactorOutput struct, which together with the data in A form an implicit representation of the Q factor. They should be passed into the apply() and explicit_Q() functions as the "factor_output" parameter.

Definition at line 240 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::apply ( const ApplyType apply_type,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const FactorOutput factor_output,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguous_cache_blocks 
) const [inline]

Apply Q factor to the global dense matrix C.

Apply the Q factor (computed by factor() and represented implicitly) to the dense matrix C.

Parameters:
apply_type[in] Whether to compute Q*C, Q^T * C, or Q^H * C.
nrows[in] Number of rows of the matrix C and the matrix Q. Precondition: nrows >= ncols_Q, ncols_C.
ncols_Q[in] Number of columns of Q
Q[in] Same as the "A" output of factor()
ldq[in] Same as the "lda" input of factor()
factor_output[in] Return value of factor()
ncols_C[in] Number of columns in C. Precondition: nrows_local >= ncols_C.
C[in,out] On input, the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with op(Q)*C, where op(Q) = Q or Q^T.
ldc[in] Leading dimension of C. Precondition: ldc_local >= nrows_local. Not applicable if C is cache-blocked in place.
contiguous_cache_blocks[in] Whether or not cache blocks of Q and C are stored contiguously (default: false).

Definition at line 287 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::explicit_Q ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q_in,
const Scalar  Q_in[],
const LocalOrdinal  ldq_in,
const FactorOutput factor_output,
const LocalOrdinal  ncols_Q_out,
Scalar  Q_out[],
const LocalOrdinal  ldq_out,
const bool  contiguous_cache_blocks 
) const [inline]

Compute the explicit Q factor from factor()

Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_in and factor_output).

Parameters:
nrows[in] Number of rows of the matrix Q_in. Also, the number of rows of the output matrix Q_out. Precondition: nrows >= ncols_Q_in.
ncols_Q_in[in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Precondition: nrows >= ncols_Q_in.
Q_local_in[in] Same as A output of factor().
ldq_local_in[in] Same as lda input of factor()
ncols_Q_out[in] Number of columns of the explicit Q factor to compute.
Q_out[out] The explicit representation of the Q factor.
ldq_out[in] Leading dimension of Q_out.
factor_output[in] Return value of factor().

Definition at line 331 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::Q_times_B ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
const Scalar  B[],
const LocalOrdinal  ldb,
const bool  contiguous_cache_blocks 
) const [inline]

Compute Q*B.

Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.

Definition at line 352 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_R_rank ( const LocalOrdinal  ncols,
Scalar  R[],
const LocalOrdinal  ldr,
Scalar  U[],
const LocalOrdinal  ldu,
const magnitude_type  tol 
) const [inline]

Compute SVD $R = U \Sigma V^*$, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Returns:
Numerical rank of R: 0 <= rank <= ncols.

Definition at line 371 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_rank ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
Scalar  R[],
const LocalOrdinal  ldr,
const magnitude_type  tol,
const bool  contiguous_cache_blocks 
) const [inline]

Rank-revealing decomposition.

Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R ( $R = U \Sigma V^*$). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

Definition at line 393 of file TbbTsqr.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends