Kokkos Node API and Local Linear Algebra Kernels Version of the Day
Public Member Functions
TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar > Class Template Reference

Tells CacheBlocker how to block up a tall skinny matrix. More...

#include <Tsqr_CacheBlockingStrategy.hpp>

Inheritance diagram for TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 CacheBlockingStrategy (const size_t cacheSizeHint=0, const size_t sizeOfScalar=sizeof(Scalar))
 Constructor.
 CacheBlockingStrategy (const CacheBlockingStrategy &rhs)
 Copy constructor.
CacheBlockingStrategyoperator= (const CacheBlockingStrategy &rhs)
 Assignment operator.
size_t TEUCHOS_DEPRECATED cache_block_size () const
 The cache size hint in bytes.
size_t cache_size_hint () const
 The cache size hint in bytes.
size_t size_of_scalar () const
 Size of a Scalar object in bytes.
bool operator== (const CacheBlockingStrategy &rhs) const
 True if and only if the two strategies are the same.
bool operator!= (const CacheBlockingStrategy &rhs) const
 True if and only if the two strategies are not the same.
LocalOrdinal cache_block_offset (const LocalOrdinal index, const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal nrows_cache_block, const bool contiguous_cache_blocks) const
 Pointer offset for the cache block with the given index.
LocalOrdinal cache_block_stride (const LocalOrdinal index, const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal lda, const LocalOrdinal nrows_cache_block, const bool contiguous_cache_blocks) const
 Leading dimension (a.k.a. stride) of the cache block.
std::pair< LocalOrdinal,
LocalOrdinal > 
cache_block (const LocalOrdinal index, const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal nrows_cache_block) const
 Start and size of cache block number index.
std::vector< LocalOrdinal > cache_block_details (const LocalOrdinal index, const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal lda, const LocalOrdinal nrows_cache_block, const bool contiguous_cache_blocks) const
 Complete description of the cache block.
LocalOrdinal num_cache_blocks (const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal nrows_cache_block) const
 Total number of cache blocks in the matrix.
LocalOrdinal top_block_split_nrows (const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal nrows_cache_block) const
 Number of rows in the top cache block.
LocalOrdinal bottom_block_split_nrows (const LocalOrdinal nrows, const LocalOrdinal ncols, const LocalOrdinal nrows_cache_block) const
 Number of rows in the bottom cache block.
size_t default_cache_size_hint (const size_t suggested_cache_size, const size_t sizeOfScalar) const
 Default or revised cache size hint in bytes.
LocalOrdinal cache_block_num_rows (const LocalOrdinal ncols) const
 "Typical" number of rows per cache block.

Detailed Description

template<class LocalOrdinal, class Scalar>
class TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >

Tells CacheBlocker how to block up a tall skinny matrix.

Author:
Mark Hoemmen

This "strategy object" helps CacheBlocker decide how to block up a given tall skinny matrix by row into cache blocks. It knows how to find the location (row index) and number of rows of any cache block in the matrix. You can use this either for parallelization (e.g., partitioning the matrix among processors in a way that respects cache blocks) or for SequentialTsqr (whose factor() routine iterates top-down through cache blocks, and whose apply() and explicit_Q() routines iterate bottom-up through cache blocks).

The cache blocking strategy is formulated in terms of SequentialTsqr. All intranode parallel TSQR implementations use either SequentialTsqr or an algorithm like it, so this formulation is general enough for our needs.

Definition at line 73 of file Tsqr_CacheBlockingStrategy.hpp.


Constructor & Destructor Documentation

template<class LocalOrdinal, class Scalar>
TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::CacheBlockingStrategy ( const size_t  cacheSizeHint = 0,
const size_t  sizeOfScalar = sizeof(Scalar) 
) [inline]

Constructor.

The cache blocking strategy asks for a cache size hint. The appropriate cache level to use depends on the bandwidth of each cache level, whether it is shared among cores, and other hardware-specific features that are hard for us to model or measure. In general, though, if each CPU core has its own L2 cache, it would be appropriate to use that cache's size.

The cache size need not be exact, but picking too small or too large a value will make TSQR slower. In practice, TSQR is not sensitive to this value. The sizeOfScalar parameter affects performance, not correctness (more or less -- it should never be zero, for example). It's OK for it to be a slight overestimate. Being much too big may affect performance in the same way as an excessively small cache size hint, and being much too small may affect performance in the same way as an excessively big cache size hint.

Parameters:
cacheSizeHint[in] Cache size hint in bytes. This is used to pick the number of rows in a cache block. If zero, we guess a reasonable value. This is a hint only, not a command; the strategy may revise this, but it will not change the revised value (that is, cache_size_hint() is a constant for this instance's lifetime).
sizeOfScalar[in] The number of bytes required to store a Scalar value. This is used to compute the dimensions of cache blocks. If sizeof(Scalar) correctly reports the size of the representation of Scalar in memory, you can use the default. The default is correct for float, double, and any of various fixed-length structs (like double-double and quad-double). It should also work for std::complex<T> where T is anything in the previous sentence's list. It does <it>not</it> work for arbitrary-precision types whose storage is dynamically allocated, even if the amount of storage is a constant. In the latter case, you should specify a nondefault value.
Note:
If Scalar is an arbitrary-precision type whose representation length can change at runtime, you should construct a new CacheBlockingStrategy object whenever the representation length changes.

Definition at line 118 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::CacheBlockingStrategy ( const CacheBlockingStrategy< LocalOrdinal, Scalar > &  rhs) [inline]

Copy constructor.

Definition at line 125 of file Tsqr_CacheBlockingStrategy.hpp.


Member Function Documentation

template<class LocalOrdinal, class Scalar>
CacheBlockingStrategy& TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::operator= ( const CacheBlockingStrategy< LocalOrdinal, Scalar > &  rhs) [inline]

Assignment operator.

Definition at line 131 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
size_t TEUCHOS_DEPRECATED TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block_size ( ) const [inline]

The cache size hint in bytes.

Same as cache_size_hint(). This method is deprecated; use cache_size_hint() instead (which returns the same thing.)

Definition at line 141 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
size_t TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_size_hint ( ) const [inline]

The cache size hint in bytes.

This may not necessarily equal the suggested cache size (input to the constructor). We treat that as a hint rather than a command.

Note:
It may make sense to vary the cache size hint at run time, for automatic performance tuning (trying to guess the optimal value) or adaptivity to varying load. However, the cache blocking strategy object must not change the cache size hint during the object's lifetime, since that would prevent correct manipulation of matrices with contiguously stored cache blocks. This is because cache block parameters depend on the cache size hint. Thus, client code may assume that this method always returns the same value for the lifetime of the strategy object.

Definition at line 161 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
size_t TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::size_of_scalar ( ) const [inline]

Size of a Scalar object in bytes.

See the constructor documentation for an explanation of why this may not necessarily be sizeof(Scalar). It should be in most cases, however.

Definition at line 168 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
bool TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::operator== ( const CacheBlockingStrategy< LocalOrdinal, Scalar > &  rhs) const [inline]

True if and only if the two strategies are the same.

Definition at line 171 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
bool TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::operator!= ( const CacheBlockingStrategy< LocalOrdinal, Scalar > &  rhs) const [inline]

True if and only if the two strategies are not the same.

Definition at line 177 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block_offset ( const LocalOrdinal  index,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  nrows_cache_block,
const bool  contiguous_cache_blocks 
) const [inline]

Pointer offset for the cache block with the given index.

The pointer offset depends on whether cache blocks are stored contiguously in the matrix. If the cache block index is out of range, the returned result is undefined.

Parameters:
index[in] Zero-based index of the cache block.
nrows[in] Number of rows in the matrix.
ncols[in] Number of columns in the matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
contiguous_cache_blocks[in] Whether the cache blocks in the matrix are stored contiguously.

Definition at line 196 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block_stride ( const LocalOrdinal  index,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  lda,
const LocalOrdinal  nrows_cache_block,
const bool  contiguous_cache_blocks 
) const [inline]

Leading dimension (a.k.a. stride) of the cache block.

If cache blocks are stored contiguously, their leading dimension may vary. Otherwise, their leading dimension is just that of the whole matrix.

Parameters:
index[in] Zero-based index of the cache block.
nrows[in] Number of rows in the matrix.
ncols[in] Number of columns in the matrix.
lda[in] Leading dimension (a.k.a. stride) of the matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
contiguous_cache_blocks[in] Whether the cache blocks in the matrix are stored contiguously.

Definition at line 228 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
std::pair<LocalOrdinal, LocalOrdinal> TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block ( const LocalOrdinal  index,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  nrows_cache_block 
) const [inline]

Start and size of cache block number index.

Parameters:
index[in] Zero-based index of the cache block.
nrows[in] Number of rows in the whole matrix.
ncols[in] Number of columns in the whole matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
Returns:
If the input index is in range: The starting row index (zero-based) of the cache block, and the number of rows in the cache block. If the input index is out of range, then (nrows, 0).

Definition at line 258 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
std::vector<LocalOrdinal> TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block_details ( const LocalOrdinal  index,
const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  lda,
const LocalOrdinal  nrows_cache_block,
const bool  contiguous_cache_blocks 
) const [inline]

Complete description of the cache block.

"Complete" means that it includes the location as well as the layout of the cache block. This lets you construct a view (e.g., MatView) of the cache block right away, given a view of the whole matrix.

Parameters:
index[in] Zero-based index of the cache block.
nrows[in] Number of rows in the whole matrix.
ncols[in] Number of columns in the whole matrix.
lda[in] Leading dimension (a.k.a. stride) of the whole matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
contiguous_cache_blocks[in] Whether the cache blocks in the matrix are stored contiguously.
Returns:
Four LocalOrdinals: The starting row index, the number of rows in the cache block, the pointer offset of the cache block, and leading dimension of the cache block.
Note:
This method has an $O(1)$ cost, so that parallelization by calling this method repeatedly for a sequence of cache block indices is not expensive.

Definition at line 333 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::num_cache_blocks ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  nrows_cache_block 
) const [inline]

Total number of cache blocks in the matrix.

The input matrix is nrows by ncols. The suggested number of rows per cache block is nrows_cache_block, but some cache blocks may have more or less rows. However, no cache block may have fewer than ncols rows.

Parameters:
nrows[in] Number of rows in the matrix.
ncols[in] Number of columns in the matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
Returns:
Total number of cache blocks in the matrix.

Definition at line 373 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::top_block_split_nrows ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  nrows_cache_block 
) const [inline]

Number of rows in the top cache block.

If we partition the nrows by ncols matrix A into [A_top; A_bot] with A_top being a cache block and A_bot being the rest of the matrix, return the number of rows that A_top should have.

Parameters:
nrows[in] "Current" number of rows in the matrix. We write "current" because this method is meant to be called recursively over the "rest" of the matrix, until the "rest" of the matrix has no more rows (is "empty").
ncols[in] Number of columns in the matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
Returns:
# of rows in top cache block A_top

Definition at line 413 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::bottom_block_split_nrows ( const LocalOrdinal  nrows,
const LocalOrdinal  ncols,
const LocalOrdinal  nrows_cache_block 
) const [inline]

Number of rows in the bottom cache block.

If we partition the nrows by ncols matrix A into [A_top; A_bot] with A_bot being a cache block and A_top being the rest of the matrix, return the number of rows that A_bot should have.

Parameters:
nrows[in] "Current" number of rows in the matrix. We write "current" because this method is meant to be called recursively over the "rest" of the matrix, until the "rest" of the matrix has no more rows (is "empty").
ncols[in] Number of columns in the matrix.
nrows_cache_block[in] The value returned by cache_block_num_rows().
Returns:
# of rows in top cache block A_bot

Definition at line 447 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
size_t TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::default_cache_size_hint ( const size_t  suggested_cache_size,
const size_t  sizeOfScalar 
) const [inline]

Default or revised cache size hint in bytes.

If the input is zero, return a default cache size in bytes. Otherwise, revise the given suggestion based on the size of the Scalar type. The result need not equal the suggested cache size, even if the latter is nonzero. Call cache_size_hint() after calling this method, in order to get the actual cache size that the cache blocking strategy will use.

Parameters:
suggested_cache_size[in] Suggested size of the cache in bytes. A hint, not a command.
sizeOfScalar[in] Size of the Scalar type in bytes.
Returns:
Default or revised cache size hint in bytes.

Definition at line 488 of file Tsqr_CacheBlockingStrategy.hpp.

template<class LocalOrdinal, class Scalar>
LocalOrdinal TSQR::CacheBlockingStrategy< LocalOrdinal, Scalar >::cache_block_num_rows ( const LocalOrdinal  ncols) const [inline]

"Typical" number of rows per cache block.

This is a function of the cache block size and the number of columns in the matrix. Not all cache blocks (in particular, the last one) will have this number of rows, but "most" will (hence "typical"). The returned value applies to SequentialTsqr::factor(), SequentialTsqr::apply(), and SequentialTsqr::explicit_Q(). In particular, we choose the number of rows per cache block so that when applying the implicitly stored Q factor (returned by factor()) to a matrix C with the same number of columns as Q was on input to factor(), then two cache blocks (one of Q and the other of C) will fit in cache. This is the typical case when using TSQR to orthogonalize vectors.

Parameters:
ncols[in] Number of columns in the matrix whose QR factorization is to be computed using an intranode TSQR implementation.

Definition at line 540 of file Tsqr_CacheBlockingStrategy.hpp.


The documentation for this class was generated from the following file:
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends