Zoltan2
Developers' Notes

Contents


Organization of an algorithm

Algorithms in Zoltan2 are namespace methods. They are called by a Zoltan2::Problem. The Problem passes the following information to the algorithm:

The Zoltan2::Environment contains the user's parameter settings, and methods for handling exceptions, profiling and status messages.

Parallel communication is performed with the Teuchos::Comm object. Probably the easiest way to do this is by using the helper functions in Teuchos_CommHelpers.hpp.

The Environment also has a communicator, which is the communicator for the entire application. The communicator passed in as the 2nd argument of the algorithm is the communicator for the problem, which may be different if the problem is being solved over a subset of the entire application.

The Zoltan2::Model is created by the Problem based on the type of input provided by the user and information in the user's parameters. The Model may represent one of the following objects:

A hypergraph model will be added eventually.

The Zoltan2::Solution is created by the Problem. When passed to the algorithm, it may already contain some information about the type of solution desired. In particular, in the case of partitioning, the subclass Zoltan2::PartitioningSolution contains the global number of parts to be created, the number of parts to be created for each process, and the relative part sizes.

The algorithm writes the computed answer back to the Solution object. In particular, a partitioning algorithm writes back the assignment of part numbers to global identifiers. The user can obtain the solution by way of the method Zoltan2::PartitioningProblem::getSolution().

template <typename Adapter>
void myAlgorithm(
  const RCP<const Environment> &env,
  const RCP<Comm<int> > &problemComm,
  const RCP<const GraphModel<typename Adapter::base_adapter_t> > &graph,
  RCP<PartitioningSolution<Adapter> > &solution
)
{
  typedef typename Adapter::lno_t lno_t;     // local ids
  typedef typename Adapter::gno_t gno_t;     // global ids
  typedef typename Adapter::scalar_t scalar_t;   // scalars

  env->debug(DETAILED_STATUS, string("Entering myAlgorithm");

  const Teuchos::ParameterList &pl = env->getParameters();

  size_t numGlobalParts = solution->getTargetGlobalNumberOfParts();
  size_t numLocalParts = solution->getLocalNumberOfParts();

  size_t numGlobalVtx = graph->getGlobalNumVertices();
  size_t numLocalVtx = graph->getLocalNumVertices();

  gno_t *globalNumbers=NULL;
  partId_t *partNumbers=NULL;

  // ... (Allocate storage for solution, and solve the problem.)

  ArrayRCP<const gno_t> gnos = arcp(globalNumbers, 0, numGnos, true);
  ArrayRCP<const partId_t> gnoPart = arcp(partNumbers, 0, numGnos, true);

  solution->setParts(gnos, gnoPart);

  env->debug(DETAILED_STATUS, string("Exiting myAlgorithm");
}

The Environment

The Zoltan2::Environment object is created by the Zoltan2::Problem. It is passed to almost every method in the library. It serves the following roles:

The algorithm uses the Environment to obtain algorithm-specific parameters that may have been set by the user. It also uses Environment methods to print out warnings and status information, handle exceptions, time algorithm steps, and display information about the amount of memory in use.


Global identifiers and data types

Zoltan2 is a package in the Trilinos framework and it uses the linear algebra classes in the templated Tpetra package. The data types permitted for global identifiers in Tpetra are those for which specializations of the Teuchos::OrdinalTraits exist. In particular we use int or long internally for the global identifiers of linear algebra objects.

For historical reasons, Zoltan2 allows the user a broader range of data types for their identifiers. In particular, any data type for which a specialization of Zoltan2::IdentifierTraits is defined is allowed. For example, a global identifier could be a std::pair<int, int> to represent matrix non-zeros (i, j) . Or the global identifier could be an arbitrary string.

If the user's global identifiers are not Teuchos Ordinals, the Zoltan2::Model maps them to new global identifiers which are Teuchos Ordinals.

By convention we use these typedef names:

It is obviously more efficient if Zoltan2 can use the user's global identifiers without having to map them. We have to map them only if:

The Zoltan2::Solution object uses mapping information in the Zoltan2::Model to map the internal gno_t ids back to the user's gid_t ids if necessary. In this way, the algorithm can work with gno_t ids and supply these to the Solution, but the user can retrieve gid_t ids.

The complexity of mapping between user global identifiers and internal global identifiers is encapsulated in the Zoltan2::Model.


Naming

All Zoltan2 symbols are in the namespace Zoltan2.

We follow the Trilinos naming conventions. They are detailed in a document in directory Trilinos/doc/TrilinosCodingDocGuidelines.

Templated code is in .hpp files and non-templated code is in .cpp files.


Error handling

We check for three types of errors:

In addition, a check can be local or global. In a global check, all processes report an error if any process has an error. Local checks are non-synchronizing.

Each check also has a level of complexity. Some can be done with one or two comparisons and some require more work. Also, there are some checks we only want to do in a special debug mode.

The user specifies the level of error checking to do with the error_check_level parameter. It can be set to one of these strings:

The default is basic_assertions. If the user requests complex_assertions, the lengthier checks will be done as well. If the users requests debug_mode_assertions , all checking will be done.

If the user sets error_check_level to no_assertions then none of the checks will be done. In addition, if Zoltan2 is compiled with the Z2_OMIT_ALL_ERROR_CHECKING flag, the error check methods do nothing. So in this case there is not even a comparison of the check level to the requested check level.

The methods that perform the tests are in the Zoltan2::Environment. They are:

The Zoltan2::AssertionLevel enumerator defines the levels that are provided in the calls.


Status and debugging output

The following parameters govern behavior of messages from Zoltan2:

The Zoltan2::Environment::debug() message is used for printing out debug and status information and warnings. The call should specify the Zoltan2::MessageOutputLevel (the verbosity of the message).

The Zoltan2::Environment::getDebugLevel() call should be checked before creating a lengthy message. This will ensure that work is not done for a detailed message if the users debug_level parameter did not request it.


Profiling

The Zoltan2::Environment object which is passed to almost every function has methods that can be used to time sections of code and (on Linux systems) to print out the memory in use by the process. The behavior of these methods depends on the how the user set these parameters:

The memory count is read from /proc/{pid}/statm and output to the designated file or stream while the application is running. The timers are managed by a Teuchos::TimeMonitor object in the Problem. The timer summary is output to the designated file or stream and timers are reset to zero when the application calls Zoltan2::Problem::printTimers().

For timers, the concept of timer_type refers to the concept of macro_timers and micro_timers . In general, the micro timers time subsets of actions that are timed by the macro timers. At run-time you can choose which timers to activate. For developers, there are also test_timers These are timers you may add while doing development or doing performance studies. The expectation is that you will remove them when done.


Use of Teuchos smart pointers

Smart pointers remove the need for the programmer to keep track of when memory must be freed. Teucho smart pointers in addition can do array bounds checking when compiled with the cmake configuration flag Teuchos_ENABLE_ABC (automatic bounds check).

We use the following Teuchos memory management classes in Zoltan2:

Teuchos::RCP and Teuchos::ArrayRCP can also be used for non-persisting data if you want an easy way to ensure that the memory will be freed upon leaving the scope in which the memory was allocated.

In addition, Teuchos::Tuple can be used for small fixed size arrays. Teuchos::ArrayView is used for function arguments that were passed as Teuchos::ArrayRCP (and will not be saved by the function) or Teuchos::Array. (If the function is a class method that will hold on to the array in the class instance, then the function argument should be a reference to a Teuchos::ArrayRCP so it will be counted.)

Using Teuchos memory management classes can increase the time to write code, and make code a little harder to read, but once the code compiles it is very unlikely that a memory error will be found later on.

A detailed report on Teuchos memory management classes can be found in the directory Trilinos/doc/TeuchosMemoryManagementSAND.


How to add a parameter

Zoltan2 uses Teuchos::ParameterList and Teuchos::ParameterEntryValidator. To add another parameter you add it to XML definition of the ParameterList in zoltan2/data/parameters.xml.

At compile time, the program zoltan2/util/xmlToHeaderDefinition.cpp creates the header file {zoltan2_binary_directory}/src/Zoltan2_XML_Parameters.hpp. This header file contains a macro that has the XML string defining the parameters. It is used at run-time in the namespace method Zoltan2::createAllParameters().

When building the Zoltan2 documentation with zoltan2/doc/build_docs, the script zoltan2/scripts/xml2dox.py creates the Doxygen page in zoltan2/doc/parameters.dox. A link appears at the Related Pages tab.

The XML formats for the validator are documented here:

The XML format for the parameter is:

<Parameter
  id="unique number for this parameter" 
  name="name of this parameter" 
  type="data type of value used to set parameter" 
  validatorId="id number for validator for this parameter" 
  value="any valid value for this parameter"
  docString="description of parameter that will appear in documentation"
  />

The docString in the Parameter element does not need to include information (like a list of valid strings) that is included in the Validator element. When parameter documentation is generated, either by Teuchos at run-time or by a script in doc/build_docs, it includes the validator information.


Documenting an algorithm

Each algorithm that can be chosen by a user with the algorithm parameter should be completely documented for users with a file in zoltan2/doc/algorithms. Follow the pattern of the other documentation files in that directory. Then add a link to your page on the introductory page at zoltan2/doc/index.dox.


Example

The block algorithm in Zoltan2_AlgBlock.hpp is a simple example of a Zoltan2 algorithm.