Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Namespaces | Classes | Functions
TSQR Namespace Reference

Implementation of the Tall Skinny QR (TSQR) factorization. More...

Namespaces

namespace  Test
 

Accuracy and performance tests for TSQR.


Classes

class  ApplyType
 NoTranspose, Transpose, or ConjugateTranspose. More...
class  BLAS
 Wrappers for BLAS routines used by the Tall Skinny QR factorization. More...
class  CacheBlocker
 Break a tall skinny matrix by rows into cache blocks. More...
class  CacheBlockRangeIterator
 Bidirectional iterator over a contiguous range of cache blocks. More...
class  CacheBlockRange
 Collection of cache blocks with a contiguous range of indices. More...
class  CacheBlockingStrategy
 Tells CacheBlocker how to block up a tall skinny matrix. More...
class  Combine
 TSQR's six computational kernels. More...
class  CombineDefault
 Default copy-in, copy-out implementation of TSQR::Combine. More...
class  CombineFortran
 Interface to Fortran 9x back end of TSQR::Combine. More...
class  CombineNative
 Interface to C++ back end of TSQR::Combine. More...
class  CombineNative< Ordinal, Scalar, false >
class  CombineNative< Ordinal, Scalar, true >
class  KokkosNodeTsqrFactorOutput
 Part of KokkosNodeTsqr's implicit Q representation. More...
class  KokkosNodeTsqr
 Intranode TSQR parallelized using the Kokkos Node API. More...
class  Matrix
 A column-oriented dense matrix. More...
class  MatView
class  ConstMatView
class  NodeTsqr
 Common interface and functionality for intranode TSQR. More...
class  NodeTsqrFactory
 Factory for creating an instance of the right NodeTsqr subclass. More...
class  ScalarTraits
 Map from Scalar type to its arithmetic properties. More...
class  SequentialCholeskyQR
 Cache-blocked sequential implementation of CholeskyQR. More...
class  SequentialTsqr
 Sequential cache-blocked TSQR factorization. More...
class  StatTimeMonitor
 Like Teuchos::TimeMonitor, but collects running stats. More...
class  TimeStats
 Collect running statistics. More...
class  TrivialTimer
 Satisfies TimerType concept trivially. More...
class  ScalarPrinter
 Print a Scalar value to the given output stream. More...
class  Tsqr
 Parallel Tall Skinny QR (TSQR) factorization. More...
class  DistTsqr
 Internode part of TSQR. More...
class  DistTsqrHelper
 Implementation of the internode part of TSQR. More...
class  DistTsqrRB
 Reduce-and-Broadcast (RB) version of DistTsqr. More...
class  GlobalSummer
class  MessengerBase
class  MGS
 Distributed-memory parallel implementation of Modified Gram-Schmidt. More...
class  RMessenger
 Send, receive, and broadcast square R factors. More...
class  TeuchosMessenger
 Communication object for TSQR. More...
class  TrivialMessenger
 Noncommunicating "communication" object for TSQR. More...

Functions

template<class Ordinal , class Scalar >
std::vector< typename
ScalarTraits< Scalar >
::magnitude_type > 
local_verify (const Ordinal nrows, const Ordinal ncols, const Scalar *const A, const Ordinal lda, const Scalar *const Q, const Ordinal ldq, const Scalar *const R, const Ordinal ldr)
TimeStats globalTimeStats (const Teuchos::RCP< MessengerBase< double > > &comm, const TimeStats &localStats)
template<class MatrixViewType , class ConstMatrixViewType >
void scatterStack (const ConstMatrixViewType &R_stack, MatrixViewType &R_local, const Teuchos::RCP< MessengerBase< typename MatrixViewType::scalar_type > > &messenger)
 Distribute a stack of R factors.

Detailed Description

Implementation of the Tall Skinny QR (TSQR) factorization.

This namespace contains a full hybrid-parallel (MPI + Kokkos) implementation of the Tall Skinny QR (TSQR) factorization. The following paper describes the implementation:

Mark Hoemmen. "A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method." IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2011.

For further details, see the following:

Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Kathy Yelick. "Minimizing Communication in Sparse Matrix Solvers." In Proceedings of Supercomputing 2009, November 2009.

James Demmel, Laura Grigori, Mark Frederick Hoemmen, and Julien Langou. "Communication-optimal parallel and sequential QR and LU factorizations." Technical report, UCB/EECS-2008-89, August 2008.


Function Documentation

template<class Ordinal , class Scalar >
std::vector< typename ScalarTraits< Scalar >::magnitude_type > TSQR::local_verify ( const Ordinal  nrows,
const Ordinal  ncols,
const Scalar *const  A,
const Ordinal  lda,
const Scalar *const  Q,
const Ordinal  ldq,
const Scalar *const  R,
const Ordinal  ldr 
)

Test accuracy of the computed QR factorization of the matrix A

Parameters:
nrows[in] Number of rows in the A and Q matrices; nrows >= ncols >= 1
ncols[in] Number of columns in the A, Q, and R matrices; nrows >= ncols >= 1
A[in] Column-oriented nrows by ncols matrix with leading dimension lda
lda[in] Leading dimension of the matrix A; lda >= nrows
Q[in] Column-oriented nrows by ncols matrix with leading dimension ldq; computed Q factor of A
ldq[in] Leading dimension of the matrix Q; ldq >= nrows
R[in] Column-oriented upper triangular ncols by ncols matrix with leading dimension ldr; computed R factor of A
ldr[in] Leading dimension of the matrix R; ldr >= ncols
Returns:
$\| A - Q R \|_F$, $\| I - Q^* Q \|_F$, and $\|A\|_F$. The first is the residual of the QR factorization, the second a measure of the orthogonality of the resulting Q factor, and the third an appropriate scaling factor if we want to compute the relative residual. All are measured in the Frobenius (square root of (sum of squares of the matrix entries) norm.
Note:
The reason for the elaborate "magnitude_type" construction is because this function returns norms, and norms always have real-valued type. Scalar may be complex. We could simply set the imaginary part to zero, but it seems more sensible to enforce the norm's value property in the type system. Besides, one could imagine more elaborate Scalars (like rational functions, which do form a field) that have different plausible definitions of magnitude -- this is not just a problem for complex numbers (that are isomorphic to pairs of real numbers).

Definition at line 312 of file Tsqr_LocalVerify.hpp.

TimeStats TSQR::globalTimeStats ( const Teuchos::RCP< MessengerBase< double > > &  comm,
const TimeStats &  localStats 
)

Produce global time statistics out of all the local ones.

Parameters:
comm[in] Encapsulation of the interprocess communicator
localStats[in] Local (to this process) time statistics
Returns:
Global (over all processes) time statistics

Definition at line 53 of file Tsqr_GlobalTimeStats.cpp.

template<class MatrixViewType , class ConstMatrixViewType >
void TSQR::scatterStack ( const ConstMatrixViewType &  R_stack,
MatrixViewType &  R_local,
const Teuchos::RCP< MessengerBase< typename MatrixViewType::scalar_type > > &  messenger 
)

Distribute a stack of R factors.

Parameters:
R_stack[in] nprocs*ncols by ncols stack of square upper triangular matrices. The whole stack is stored in column-major order.
R_local[out] ncols by ncols upper triangular matrix, stored in column-major order (in unpacked form).
messenger[in/out] Object that handles communication

Definition at line 190 of file Tsqr_RMessenger.hpp.

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends