Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Types | Public Member Functions | Static Public Member Functions
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType > Class Template Reference

Intranode TSQR, parallelized with Intel TBB. More...

#include <TbbTsqr.hpp>

Inheritance diagram for TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >:
Inheritance graph
[legend]

List of all members.

Public Types

typedef TbbParallelTsqr
< LocalOrdinal, Scalar,
TimerType >::FactorOutput 
FactorOutput
 Type of partial output of TBB TSQR.

Public Member Functions

 TbbTsqr (const size_t numCores, const size_t cacheSizeHint=0)
 Constructor.
 TbbTsqr (const Teuchos::RCP< Teuchos::ParameterList > &plist)
 Constructor (that takes a parameter list).
 TbbTsqr ()
 Constructor (that uses default parameters).
size_t ntasks () const
 Number of tasks that TSQR will use to solve the problem.
size_t TEUCHOS_DEPRECATED ncores () const
 Number of tasks that TSQR will use to solve the problem.
size_t cache_size_hint () const
 Cache size hint (in bytes) used for the factorization.
size_t TEUCHOS_DEPRECATED cache_block_size () const
 Cache size hint (in bytes) used for the factorization.
bool ready () const
 Whether this object is ready to perform computations.
std::string description () const
 One-line description of this object.
FactorOutput factor (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar A[], const LocalOrdinal lda, Scalar R[], const LocalOrdinal ldr, const bool contiguous_cache_blocks) const
 Compute QR factorization of the dense matrix A.
void apply (const ApplyType &apply_type, const LocalOrdinal nrows, const LocalOrdinal ncols_Q, const Scalar Q[], const LocalOrdinal ldq, const FactorOutput &factor_output, const LocalOrdinal ncols_C, Scalar C[], const LocalOrdinal ldc, const bool contiguous_cache_blocks) const
 Apply Q factor to the global dense matrix C.
void explicit_Q (const LocalOrdinal nrows, const LocalOrdinal ncols_Q_in, const Scalar Q_in[], const LocalOrdinal ldq_in, const FactorOutput &factor_output, const LocalOrdinal ncols_Q_out, Scalar Q_out[], const LocalOrdinal ldq_out, const bool contiguous_cache_blocks) const
 Compute the explicit Q factor from factor()
void Q_times_B (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, const Scalar B[], const LocalOrdinal ldb, const bool contiguous_cache_blocks) const
 Compute Q*B.
LocalOrdinal reveal_R_rank (const LocalOrdinal ncols, Scalar R[], const LocalOrdinal ldr, Scalar U[], const LocalOrdinal ldu, const magnitude_type tol) const
LocalOrdinal reveal_rank (const LocalOrdinal nrows, const LocalOrdinal ncols, Scalar Q[], const LocalOrdinal ldq, Scalar R[], const LocalOrdinal ldr, const magnitude_type tol, const bool contiguous_cache_blocks) const
 Rank-revealing decomposition.

Static Public Member Functions

static bool QR_produces_R_factor_with_nonnegative_diagonal ()

Detailed Description

template<class LocalOrdinal, class Scalar, class TimerType = Teuchos::Time>
class TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >

Intranode TSQR, parallelized with Intel TBB.

TSQR factorization for a dense, tall and skinny matrix stored on a single node. Parallelized using Intel's Threading Building Blocks.

Note:
TSQR only needs to know about the local ordinal type (LocalOrdinal), not about the global ordinal type. TimerType may be any class with the same interface as TrivialTimer; it times the divide-and-conquer base cases (the operations on each CPU core within the thread-parallel implementation).

Definition at line 78 of file TbbTsqr.hpp.


Member Typedef Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::FactorOutput

Type of partial output of TBB TSQR.

If you don't have TBB available, you can test this class by substituting in "typename TbbRecursiveTsqr<LocalOrdinal, Scalar>::FactorOutput" for the typedef's definition. If you do this, you should also change the type of impl_ above.

Definition at line 122 of file TbbTsqr.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::TbbTsqr ( const size_t  numCores,
const size_t  cacheSizeHint = 0 
) [inline]

Constructor.

Parameters:
numCores[in] Maximum number of processing cores to use when factoring the matrix. Fewer cores may be used if the matrix is not big enough to justify their use.
cacheSizeHint[in] Cache block size hint (in bytes) to use in the sequential part of TSQR. If zero or not specified, a reasonable default is used. If each CPU core has a private cache, that cache's size (minus a little wiggle room) would be the appropriate value for this parameter. Set to zero for the implementation to choose a reasonable default.

Definition at line 137 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::TbbTsqr ( const Teuchos::RCP< Teuchos::ParameterList > &  plist) [inline]

Constructor (that takes a parameter list).

Parameters:
plist[in/out] On input: list of TbbTsqr parameters. On output: missing parameters are filled in with default values.

For a list of accepted parameters and thei documentation, see the parameter list returned by getValidParameters().

Definition at line 155 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::TbbTsqr ( ) [inline]

Constructor (that uses default parameters).

Parameters:
plist[in/out] On input: list of TbbTsqr parameters. On output: missing parameters are filled in with default values.

For a list of accepted parameters and thei documentation, see the parameter list returned by getValidParameters().

Definition at line 172 of file TbbTsqr.hpp.


Member Function Documentation

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::ntasks ( ) const [inline]

Number of tasks that TSQR will use to solve the problem.

This is the number of subproblems into which to divide the main problem, in order to solve it in parallel.

Definition at line 197 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TEUCHOS_DEPRECATED TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::ncores ( ) const [inline]

Number of tasks that TSQR will use to solve the problem.

This is the number of subproblems into which to divide the main problem, in order to solve it in parallel.

This method is deprecated, because the name is misleading. Please call ntasks() instead.

Definition at line 206 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::cache_size_hint ( ) const [inline]

Cache size hint (in bytes) used for the factorization.

Definition at line 209 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
size_t TEUCHOS_DEPRECATED TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::cache_block_size ( ) const [inline]

Cache size hint (in bytes) used for the factorization.

This method is deprecated, because the name is misleading. Please call cache_size_hint() instead.

Definition at line 215 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
static bool TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::QR_produces_R_factor_with_nonnegative_diagonal ( ) [inline, static]

Whether or not this QR factorization produces an R factor with all nonnegative diagonal entries.

Definition at line 221 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
bool TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::ready ( ) const [inline]

Whether this object is ready to perform computations.

Definition at line 227 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
std::string TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::description ( ) const [inline, virtual]

One-line description of this object.

This implements Teuchos::Describable::description(). For now, SequentialTsqr uses the default implementation of Teuchos::Describable::describe().

Reimplemented from Teuchos::Describable.

Definition at line 236 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
FactorOutput TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::factor ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  A[],
const LocalOrdinal  lda,
Scalar  R[],
const LocalOrdinal  ldr,
const bool  contiguous_cache_blocks 
) const [inline]

Compute QR factorization of the dense matrix A.

Compute the QR factorization of the dense matrix A.

Parameters:
nrows[in] Number of rows of A. Precondition: nrows >= ncols.
ncols[in] Number of columns of A. Precondition: nrows >= ncols.
A[in,out] On input, the matrix to factor, stored as a general dense matrix in column-major order. On output, overwritten with an implicit representation of the Q factor.
lda[in] Leading dimension of A. Precondition: lda >= nrows.
R[out] The final R factor of the QR factorization of the matrix A. An ncols by ncols upper triangular matrix stored in column-major order, with leading dimension ldr.
ldr[in] Leading dimension of the matrix R.
b_contiguous_cache_blocks[in] Whether cache blocks are stored contiguously in the input matrix A and the output matrix Q (of explicit_Q()). If not and you want them to be, you should use the cache_block() method to copy them into that format. You may use the un_cache_block() method to copy them out of that format into the usual column-oriented format.
Returns:
FactorOutput struct, which together with the data in A form an implicit representation of the Q factor. They should be passed into the apply() and explicit_Q() functions as the "factor_output" parameter.

Definition at line 329 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::apply ( const ApplyType apply_type,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q,
const Scalar  Q[],
const LocalOrdinal  ldq,
const FactorOutput factor_output,
const LocalOrdinal  ncols_C,
Scalar  C[],
const LocalOrdinal  ldc,
const bool  contiguous_cache_blocks 
) const [inline]

Apply Q factor to the global dense matrix C.

Apply the Q factor (computed by factor() and represented implicitly) to the dense matrix C.

Parameters:
apply_type[in] Whether to compute Q*C, Q^T * C, or Q^H * C.
nrows[in] Number of rows of the matrix C and the matrix Q. Precondition: nrows >= ncols_Q, ncols_C.
ncols_Q[in] Number of columns of Q
Q[in] Same as the "A" output of factor()
ldq[in] Same as the "lda" input of factor()
factor_output[in] Return value of factor()
ncols_C[in] Number of columns in C. Precondition: nrows_local >= ncols_C.
C[in,out] On input, the matrix C, stored as a general dense matrix in column-major order. On output, overwritten with op(Q)*C, where op(Q) = Q or Q^T.
ldc[in] Leading dimension of C. Precondition: ldc_local >= nrows_local. Not applicable if C is cache-blocked in place.
contiguous_cache_blocks[in] Whether or not cache blocks of Q and C are stored contiguously (default: false).

Definition at line 376 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::explicit_Q ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols_Q_in,
const Scalar  Q_in[],
const LocalOrdinal  ldq_in,
const FactorOutput factor_output,
const LocalOrdinal  ncols_Q_out,
Scalar  Q_out[],
const LocalOrdinal  ldq_out,
const bool  contiguous_cache_blocks 
) const [inline]

Compute the explicit Q factor from factor()

Compute the explicit version of the Q factor computed by factor() and represented implicitly (via Q_in and factor_output).

Parameters:
nrows[in] Number of rows of the matrix Q_in. Also, the number of rows of the output matrix Q_out. Precondition: nrows >= ncols_Q_in.
ncols_Q_in[in] Number of columns in the original matrix A, whose explicit Q factor we are computing. Precondition: nrows >= ncols_Q_in.
Q_local_in[in] Same as A output of factor().
ldq_local_in[in] Same as lda input of factor()
ncols_Q_out[in] Number of columns of the explicit Q factor to compute.
Q_out[out] The explicit representation of the Q factor.
ldq_out[in] Leading dimension of Q_out.
factor_output[in] Return value of factor().

Definition at line 420 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
void TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::Q_times_B ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
const Scalar  B[],
const LocalOrdinal  ldb,
const bool  contiguous_cache_blocks 
) const [inline]

Compute Q*B.

Compute matrix-matrix product Q*B, where Q is nrows by ncols and B is ncols by ncols. Respect cache blocks of Q.

Definition at line 441 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_R_rank ( const LocalOrdinal  ncols,
Scalar  R[],
const LocalOrdinal  ldr,
Scalar  U[],
const LocalOrdinal  ldu,
const magnitude_type  tol 
) const [inline]

Compute SVD $R = U \Sigma V^*$, not in place. Use the resulting singular values to compute the numerical rank of R, with respect to the relative tolerance tol. If R is full rank, return without modifying R. If R is not full rank, overwrite R with $\Sigma \cdot V^*$.

Returns:
Numerical rank of R: 0 <= rank <= ncols.

Definition at line 460 of file TbbTsqr.hpp.

template<class LocalOrdinal , class Scalar , class TimerType = Teuchos::Time>
LocalOrdinal TSQR::TBB::TbbTsqr< LocalOrdinal, Scalar, TimerType >::reveal_rank ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
Scalar  Q[],
const LocalOrdinal  ldq,
Scalar  R[],
const LocalOrdinal  ldr,
const magnitude_type  tol,
const bool  contiguous_cache_blocks 
) const [inline]

Rank-revealing decomposition.

Using the R factor from factor() and the explicit Q factor from explicit_Q(), compute the SVD of R ( $R = U \Sigma V^*$). R. If R is full rank (with respect to the given relative tolerance tol), don't change Q or R. Otherwise, compute $Q := Q \cdot U$ and $R := \Sigma V^*$ in place (the latter may be no longer upper triangular).

Returns:
Rank $r$ of R: $ 0 \leq r \leq ncols$.

Definition at line 482 of file TbbTsqr.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends