|
Kokkos Node API and Local Linear Algebra Kernels Version of the Day
|
Implementation of the Tall Skinny QR (TSQR) factorization. More...
Namespaces | |
| namespace | Test |
Accuracy and performance tests for TSQR. | |
Classes | |
| class | ApplyType |
| NoTranspose, Transpose, or ConjugateTranspose. More... | |
| class | BLAS |
| Wrappers for BLAS routines used by the Tall Skinny QR factorization. More... | |
| class | CacheBlocker |
| Break a tall skinny matrix by rows into cache blocks. More... | |
| class | CacheBlockRangeIterator |
| Bidirectional iterator over a contiguous range of cache blocks. More... | |
| class | CacheBlockRange |
| Collection of cache blocks with a contiguous range of indices. More... | |
| class | CacheBlockingStrategy |
| Tells CacheBlocker how to block up a tall skinny matrix. More... | |
| class | Combine |
| TSQR's six computational kernels. More... | |
| class | CombineDefault |
Default copy-in, copy-out implementation of TSQR::Combine. More... | |
| class | CombineFortran |
Interface to Fortran 9x back end of TSQR::Combine. More... | |
| class | CombineNative |
| Interface to C++ back end of TSQR::Combine. More... | |
| class | CombineNative< Ordinal, Scalar, false > |
| class | CombineNative< Ordinal, Scalar, true > |
| class | KokkosNodeTsqrFactorOutput |
| Part of KokkosNodeTsqr's implicit Q representation. More... | |
| class | KokkosNodeTsqr |
| Intranode TSQR parallelized using the Kokkos Node API. More... | |
| class | Matrix |
| A column-oriented dense matrix. More... | |
| class | MatView |
| class | ConstMatView |
| class | NodeTsqr |
| Common interface and functionality for intranode TSQR. More... | |
| class | NodeTsqrFactory |
Factory for creating an instance of the right NodeTsqr subclass. More... | |
| class | ScalarTraits |
| Map from Scalar type to its arithmetic properties. More... | |
| class | SequentialCholeskyQR |
| Cache-blocked sequential implementation of CholeskyQR. More... | |
| class | SequentialTsqr |
| Sequential cache-blocked TSQR factorization. More... | |
| class | StatTimeMonitor |
| Like Teuchos::TimeMonitor, but collects running stats. More... | |
| class | TimeStats |
| Collect running statistics. More... | |
| class | TrivialTimer |
| Satisfies TimerType concept trivially. More... | |
| class | ScalarPrinter |
| Print a Scalar value to the given output stream. More... | |
| class | Tsqr |
| Parallel Tall Skinny QR (TSQR) factorization. More... | |
| class | DistTsqr |
| Internode part of TSQR. More... | |
| class | DistTsqrHelper |
| Implementation of the internode part of TSQR. More... | |
| class | DistTsqrRB |
| Reduce-and-Broadcast (RB) version of DistTsqr. More... | |
| class | GlobalSummer |
| class | MessengerBase |
| class | MGS |
| Distributed-memory parallel implementation of Modified Gram-Schmidt. More... | |
| class | RMessenger |
| Send, receive, and broadcast square R factors. More... | |
| class | TeuchosMessenger |
| Communication object for TSQR. More... | |
| class | TrivialMessenger |
| Noncommunicating "communication" object for TSQR. More... | |
Functions | |
| template<class Ordinal , class Scalar > | |
| std::vector< typename ScalarTraits< Scalar > ::magnitude_type > | local_verify (const Ordinal nrows, const Ordinal ncols, const Scalar *const A, const Ordinal lda, const Scalar *const Q, const Ordinal ldq, const Scalar *const R, const Ordinal ldr) |
| TimeStats | globalTimeStats (const Teuchos::RCP< MessengerBase< double > > &comm, const TimeStats &localStats) |
| template<class MatrixViewType , class ConstMatrixViewType > | |
| void | scatterStack (const ConstMatrixViewType &R_stack, MatrixViewType &R_local, const Teuchos::RCP< MessengerBase< typename MatrixViewType::scalar_type > > &messenger) |
| Distribute a stack of R factors. | |
Implementation of the Tall Skinny QR (TSQR) factorization.
This namespace contains a full hybrid-parallel (MPI + Kokkos) implementation of the Tall Skinny QR (TSQR) factorization. The following paper describes the implementation:
Mark Hoemmen. "A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method." IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2011.
For further details, see the following:
Marghoob Mohiyuddin, Mark Hoemmen, James Demmel, and Kathy Yelick. "Minimizing Communication in Sparse Matrix Solvers." In Proceedings of Supercomputing 2009, November 2009.
James Demmel, Laura Grigori, Mark Frederick Hoemmen, and Julien Langou. "Communication-optimal parallel and sequential QR and LU factorizations." Technical report, UCB/EECS-2008-89, August 2008.
| std::vector< typename ScalarTraits< Scalar >::magnitude_type > TSQR::local_verify | ( | const Ordinal | nrows, |
| const Ordinal | ncols, | ||
| const Scalar *const | A, | ||
| const Ordinal | lda, | ||
| const Scalar *const | Q, | ||
| const Ordinal | ldq, | ||
| const Scalar *const | R, | ||
| const Ordinal | ldr | ||
| ) |
Test accuracy of the computed QR factorization of the matrix A
| nrows | [in] Number of rows in the A and Q matrices; nrows >= ncols >= 1 |
| ncols | [in] Number of columns in the A, Q, and R matrices; nrows >= ncols >= 1 |
| A | [in] Column-oriented nrows by ncols matrix with leading dimension lda |
| lda | [in] Leading dimension of the matrix A; lda >= nrows |
| Q | [in] Column-oriented nrows by ncols matrix with leading dimension ldq; computed Q factor of A |
| ldq | [in] Leading dimension of the matrix Q; ldq >= nrows |
| R | [in] Column-oriented upper triangular ncols by ncols matrix with leading dimension ldr; computed R factor of A |
| ldr | [in] Leading dimension of the matrix R; ldr >= ncols |
Definition at line 312 of file Tsqr_LocalVerify.hpp.
| TimeStats TSQR::globalTimeStats | ( | const Teuchos::RCP< MessengerBase< double > > & | comm, |
| const TimeStats & | localStats | ||
| ) |
Produce global time statistics out of all the local ones.
| comm | [in] Encapsulation of the interprocess communicator |
| localStats | [in] Local (to this process) time statistics |
Definition at line 53 of file Tsqr_GlobalTimeStats.cpp.
| void TSQR::scatterStack | ( | const ConstMatrixViewType & | R_stack, |
| MatrixViewType & | R_local, | ||
| const Teuchos::RCP< MessengerBase< typename MatrixViewType::scalar_type > > & | messenger | ||
| ) |
Distribute a stack of R factors.
| R_stack | [in] nprocs*ncols by ncols stack of square upper triangular matrices. The whole stack is stored in column-major order. |
| R_local | [out] ncols by ncols upper triangular matrix, stored in column-major order (in unpacked form). |
| messenger | [in/out] Object that handles communication |
Definition at line 190 of file Tsqr_RMessenger.hpp.
1.7.4