# Trilinos Release 11.4 Release Notes #
The Trilinos Project is an effort to develop algorithms and enabling
technologies within an object-oriented software framework for the solution of
large-scale, complex multi-physics engineering and scientific problems.
The Trilinos 11.4 general release contains 54 packages: Amesos, Amesos2,
Anasazi, AztecOO, Belos, CTrilinos, Didasko, Epetra, EpetraExt, FEI,
ForTrilinos, Galeri, GlobiPack, Ifpack, Ifpack2, Intrepid, Isorropia, Kokkos,
Komplex, LOCA, Mesquite, ML, Moertel, MOOCHO, NOX, Optika, OptiPack, Pamgen,
Phalanx, Piro, Pliris, PyTrilinos, RTOp, Rythmos, Sacado, SEACAS, Shards,
ShyLU, STK, Stokhos, Stratimikos, Sundance, Teko, Teuchos, ThreadPool, Thyra,
Tpetra, TriKota, TrilinosCouplings, Trios, Triutils, Xpetra, Zoltan, Zoltan2.
Framework Release Notes:
- The following packages have been switched to BSD-compatible licenses:
Didasko, Ifpack, Ifpack2, Moertel, Stokhos, Stratimikos
- This release includes 11 modules or classes of the Epetra package.
- This package is still in its experimental stage and is only supported on AIX.
- Sample configure script are provided in
Trilinos/sampleScripts/aix-fortrilinos-mpif90 for serial and mpi builds
- Because of the object-oriented features used, it requires a XL Fortran
compiler v13.1. The source code can be compiled using the xlf compiler
- Required compiler flags for Fortran include:
-qfixed=72 -qxlines: deals with older Fortran source code in other
Trilinos packages. These flags are used for mpi
builds and must be specified in the configure
-qxlf2003=polymorphic: allows for the use of polymorphism in the source
-qxlf2003=autorealloc: allows the compiler to automatically reallocate the
left hand side with the shape of the right hand side
when using allocatable variables in an assignment.
-qfree=f90: informs the compiler that the source code is free
form and conforms to Fortran 90.
These flags(-qfree=f90 -qxlf2003=polymorphic -qxlf2003=autorealloc) are
hardcoded in Trilinos/packages/ForTrilinos/CMakeLists.txt
- Required compiler flag for xlc++ include:
-qrtti=all: this flag should be included in the configure
- The project is primarily user-driven; so new interfaces are developed at the
request of Trilinos users.
- Relaxation: Use precomputed offsets to extract diagonal
As of this release, Tpetra::CrsMatrix has the ability to to precompute
offsets of diagonal entries, and use them to accelerate extracting a
copy of the diagonal. Relaxation now exploits this feature to speed up
compute() (which extracts a copy of the diagonal of the input matrix).
The optimization only occurs if the input matrix is a CrsMatrix (not
just a RowMatrix) and if it has a const ("static") graph. The latter
is necessary so that we know that the structure can't change between
calls to compute(). (Otherwise we would have to recompute the offsets
each time, which would be no more efficient than what it was doing
- Non-backwards compatible change: Default Kokkos/Tpetra Node type is now
Kokkos::SerialNode User expectation seems to be that the default behavior of
Tpetra is MPI-only. These users are therefore experiencing unexpected
performance when the default node is threaded, as is currently the case if
any of the threading libraries (pthreads, TBB, OpenMP) are enabled.
Therefore, after some discussion among Kokkos/Tpetra developers, it was
decided to change the default Kokkos node (and therefore, the default node
used by Tpetra objects) to Kokkos::SerialNode. This can be over-ridden at
configure time by specifying the following option to CMake when configuring
where node_type is one of the official Kokkos nodes:
Kokkos::SerialNode (current default)
- Added polygon support to allow reading and writing of vtk files containing
polygons and smoothing of meshes containing polygons using the Laplacian
- Rewrote ShapeImprover wrapper determine if mesh to be optimized is
tangled or not. If tangled, wrapper now uses a non-barrier metric and
if not tangled, a barrier metric is used.
- Created a new directory structure underneath meshFiles/3D/vtk and
meshFiles/2D/vtk that arranges the mesh files into subdirectories
based on element type and whether they are tangled or untangled.
- Created new class MeshDomainAssoc to formally associate a Mesh instance
with a Domain instance to verify that the mesh and domain are compatible.
- Productionized the NonGradient solver.
- Added new classes TMetricBarrier and TMetricNonBarrier to TMetric class to
provide a clear division between the barrier and non-barrier target metric
- Added new classes AWMetricBarrier and AWMetricNonBarrier to AWMetric class
for same reason as the TMetric classes.
- Added a new error code "BARRIER_VIOLATED" to the MsgError class that is
issued when a barrier violation is encountered when using a barrier target
- Added warning when MaxTemplate is used with any solver other than
- Made a number of changes to the Quality Summary output to improve
readability and provide additional information.
- Updated the NumPy interface to properly deal with deprecated
code. If PyTrilinos if compiled an older NumPy, it still works,
but if compiled against newer versions of NumPy, the deprecated
code is avoided, as are the warnings.
- Added optional automatic global reductions of pass/fail to Teuchos Unit
Test Harness: Prior to this feature addition, only the result on the root
process of a parallel unit test would determine pass/fail, even if tests on
other proesses failed. This makes it easier to write parallel unit tests
and results in more robust test code. For a discussion, see Trilinos issue
#5909. An example can be found in
the CMakeLists.txt file for how that test is run). NOTE: By default, no
global reductions of pass/fail are done as to maintain perfect backward
- Added new feature to TimeMonitor: You may now enable or disable a timer
(instance of Time) by name. Disabled timers ignore start() and stop()
calls; calling these methods on a disabled timer does not change its elapsed
time or call count. Thus, TimeMonitor's constructor and destructor have no
effect on disabled timers. However, the disabled timers still exist, and
TimeMonitor's summarize() and report() class methods will print statistics
for disabled timers (using their elapsed times and call counts while
enabled). Enabling a timer does not reset its elapsed time or call count.
This feature is useful if you want to time only certain invocations of a
particular function that has an internal timer, without modifying the
function's source code. For an example, see
packages/teuchos/comm/test/Time/TimeMonitor_UnitTests.cpp, line 175
("TimeMonitor, enableTimer" unit test).
- Fixed explicit template instantation system in the generation of
Thyra_XXX.hpp files to *not* include Thyra_XXX_def.hpp when explicit
instantation is turned on. The refactoring of Thyra to use subpackages some
time ago broke the generation of Thyra_XXX.hpp files in that they were
always including Thyra_XXX_def.hpp files. That was bad because it increased
compile time for client code and allowed other includes to get pulled in
silently. Now client code that includes Thyra_XXX.hpp when explicit
instantiation is turned on will will *not* get the include of
Thyra_XXX_def.hpp. This might break some downstream client code that was
not properly including the necessary header files and was accidentally
getting them from the Thyra_XXX_def.hpp files that were being silently
included. However, this technically does not break backward compatibility
since client code should have been including the right headers all along.
For example, when GCC cleaned up their standard C++ header files this
required existing C++ code to add a bunch of missing includes that should
have been there the whole time.
- Performance improvements to fillComplete (CrsGraph and CrsMatrix)
- Performance improvements to Map's global-to-local index conversions
- MPI performance optimizations
Methods that perform communication between (MPI) processes do less
communication than before. This should improve performance,
especially for large process counts, of the following operations:
- Creating a Map
- Creating an Import or Export communication plan
- Executing an Import or Export (e.g., in a distributed sparse
matrix-vector multiply, or in global finite element assembly)
- Calling fillComplete() on a CrsGraph or CrsMatrix
- Restrict a Map's communicator to processes with nonzero elements,
and apply the result to a distributed object
Map now has two new methods. The first, removeEmptyProcesses(),
returns a new Map with a new communicator, which contains only those
processes which have a nonzero number of entries in the original Map.
The second method, replaceCommWithSubset(), returns a new Map whose
communicator is an arbitrary subset of processes of the original Map's
communicator. Distributed objects (subclasses of DistObject) also
have a new removeEmptyProcessesInPlace() method, for applying in place
the new Map created by calling removeEmptyProcesses() on the original
Map over which the object was distributed.
These methods are especially useful for algebraic multigrid. At
coarser levels of the multigrid hierarchy, it is helpful for
performance to "rebalance" the matrices at those levels, so that a
subset of processes share the elements. This leaves the remaining
processes without any elements. Excluding them from the communicator
reduces the cost of all-reduces and other communication operations
necessary for creating the coarser levels of the hierarchy.
- CrsMatrix: Native SOR and Gauss-Seidel kernels
These kernels improve the performance of Ifpack2 and MueLu.
Gauss-Seidel is a special case of SOR (Symmetric Over-Relaxation).
See the documentation of Ifpack2::Relaxation for details on the
algorithm, which is actually a "hybrid" of Jacobi between MPI
processes, and SOR (or Gauss-Seidel) within an MPI process. The
kernels also include the "symmetric" variant (forward and backward
sweeps) of SOR and Gauss-Seidel.
- CrsMatrix: Precompute and reuse offsets of diagonal entries
The (existing) one-argument verison of CrsMatrix's getLocalDiagCopy()
method requires the following operations per row:
1. Convert current local row index to global, using the row Map
2. Convert global index to local column index, using the column Map
3. Search the row for that local column index
Precomputing the offsets of diagonal entries and reusing them skips
all these steps. CrsMatrix has a new method getLocalDiagOffsets() to
precompute the offsets, and a two-argument version of
getLocalDiagCopy() that uses the precomputed offsets. The precomputed
offsets are not meant to be used in any way other than to be given to
the two-argument version of getLocalDiagCopy(). They must be
recomputed whenever the structure of the sparse matrix changes (by
calling insertGlobalValues() or insertLocalValues()) or is optimized
(e.g., by calling fillComplete() for the first time).
- CrsGraph,CrsMatrix: Added "No Nonlocal Changes" parameter to
The fillComplete() method accepts an optional ParameterList which
controls the behavior of fillComplete(), as opposed to behavior of the
object in general. "No Nonlocal Changes" is a bool parameter which is
false by default. Its value must be the same on all processes in the
graph or matrix's communicator. If the parameter is true, the caller
asserts that no entries were inserted in nonowned rows. This lets
fillComplete() skip the global communication that checks whether any
processes inserted any entries in nonowned rows.
- Default Kokkos/Tpetra Node type is now Kokkos::SerialNode
NOTE: This change breaks backwards compatibility.
Users expect that Tpetra by default uses "MPI only" for parallelism,
rather than "MPI plus threads." These users were therefore
experiencing unexpected performance issues when the default Kokkos
Node type is threaded, as was the case if Trilinos' support for any of
the threading libraries (Pthreads, TBB, OpenMP) are enabled. Trilinos
detects and enables support for Pthreads automatically on many
platforms. Therefore, after some discussion among Kokkos and Tpetra
developers, we decided to change the default Kokkos Node type (and
therefore, the default Node used by Tpetra objects) to
Kokkos::SerialNode. This can be overridden at configure time by
specifying the following option to CMake when configuring Trilinos:
where any of the official Kokkos Node types, such as the
- Kokkos::SerialNode (current default)