Amesos Package Browser (Single Doxygen Collection) Development
Amesos_Merikos.h
Go to the documentation of this file.
00001 This file is out of date.  It has not been refactored to use Amesos_Status.
00002 
00003 /* 
00004 Task list:
00005     Amesos_Merikos.h
00006     Amesos_Merikos.cpp
00007       Partition the matrix - store L as L^T?
00008       Build tree
00009       Initial data redistribution
00010       Change row and column ownership (pass them up to the parent)
00011     Amesos_Component_Solver.h
00012     Amesos_BTF.h 
00013     Amesos_BTF.cpp
00014 
00015 
00016 
00017   Communications issues/challenges:
00018   **  Redistributing the original matrix to the arrowhead form that we need, options:
00019       1)  Two matrices:  L^T and U
00020       2)  One matrix:  U | L^T 
00021       3)  Intermediate "fat" matrix - the way that I did Scalapack
00022 
00023   **  Adding the Schur complements SC_0 and SC_1 to the original 
00024           trailing matirx A_2, owned by the parent
00025       1)  Precompute the final size and shape of A_2 + SC_0 + SC_1
00026       2)  Perform A_2 + SC_0 + SC_1 in an empty matrix of size n by n 
00027           and extract the non-empty rows.
00028       CHALLENGES:
00029         A)  Only process 0/1 knows the size and map for SC_0/SC_1
00030         B)  It would be nice to allow SC_0 and SC_1 to be sent as soon as 
00031       they are available
00032   C)  It would be nice to have just one copy of the matrix on the
00033             parent.  Hence, it would be nice to know the shape of 
00034       A_2 + SC_0 + SC_1 in advance. 
00035         D)  An import would do the trick provided that we know both maps
00036             in advance.  But, neither map is available to us in advance. 
00037             The original map (which has SC_0 on process 0 and SC_1 on 
00038       process 1) is not known 
00039       QUESTION:
00040         Should the maps be in some global address space or should they be 
00041   in a local address space?
00042   I'd like to keep them in the global address space as long as possible,
00043   but we can't do the import of the A_2 + SC_0 + SC_1 in a global 
00044   address space because that would require a map that changes at each 
00045 
00046   **  Redistributing the right hand side vector, b
00047       If we create a map that reflects the post-pivoting reality, assigning
00048       each row of U and each column of L to the process that owns the diagonal 
00049       entry, we can redistribute the right hand side vector, b, to the 
00050       processes where the values of b will first be used, in a single, efficient, 
00051       import operation.  
00052 
00053 
00054 Observations:
00055 1)  Although this algorithm is recursive, a non-recursive implementation 
00056     might be cleaner.  If it is done recursively, it should be done in place,
00057     i.e. any data movement of the matrix itself should have been done in 
00058     advance.
00059 2)  There are two choices for the basic paradigm for parallelism.  Consider 
00060     a two level bisection of the matrix, yielding seven tasks or diaganol 
00061     blocks::  T0, T1, T01, T2, T3, T23 and T0123.  In both paradigms, 
00062     T0, T1, T2 and T3 would each 
00063     be handled by a single process.  Up until now, we have been assuming 
00064     that T01 would be handled by processes 0 and/or 1 while T23 would be 
00065     handled by processes 2 and/or 3.  The other option is to arrange the 
00066     tasks/diagonal blocks as follows:  T0, T1, T2, T3, T01, T23, T0123 and
00067     treat the last three blocks:  T01, T23 and T0123 as a single block to be
00068     handled by all four processes.  This second paradigm includes an
00069     additional synchronization, but may allow a better partitioning of 
00070     the remaining matrix because the resulting schur complement is now 
00071     known.   This improved partitioning will also improve the refactorization
00072     (i.e. pivotless factorization).  The second paradigm may also allow for 
00073     better load balancing.  For example, after using recursive minimum 
00074     degree bi-section (or any other scheme) to partition the matrix, one could 
00075     run a peephole optimization pass that would look for individuals blocks 
00076     that could be moved from the largest partition to a smaller one.  Finally,
00077     if it is clear that a given partition is going to be the slowest, it might
00078     make sense to shift some rows/columns off of it into the splitter just 
00079     for load balancing.  
00080 3)  It seems possible that Merikos would be a cleaner code if rows
00081     which are shared by multiple processes are split such that each row
00082     resides entirely within a given process.
00083 
00084 4)  Support for pivotless refactorization is important.
00085 5)  There is no mention of the required row and column permutations.
00086 6)  Amesos_Merikos only needs to support the Amesos_Component interface if
00087     it will call itself recursively on the subblocks.  
00088 7)  Perhaps Amesos_Component.h should be an added interface.  Instead 
00089     of replacing Amesos_BaseSolver.h, maybe it should add functionality 
00090     to it.   
00091 */
00092 // @HEADER
00093 // ***********************************************************************
00094 // 
00095 //                Amesos: Direct Sparse Solver Package
00096 //                 Copyright (2004) Sandia Corporation
00097 // 
00098 // Under terms of Contract DE-AC04-94AL85000, there is a non-exclusive
00099 // license for use of this work by or on behalf of the U.S. Government.
00100 // 
00101 // This library is free software; you can redistribute it and/or modify
00102 // it under the terms of the GNU Lesser General Public License as
00103 // published by the Free Software Foundation; either version 2.1 of the
00104 // License, or (at your option) any later version.
00105 //  
00106 // This library is distributed in the hope that it will be useful, but
00107 // WITHOUT ANY WARRANTY; without even the implied warranty of
00108 // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
00109 // Lesser General Public License for more details.
00110 //  
00111 // You should have received a copy of the GNU Lesser General Public
00112 // License along with this library; if not, write to the Free Software
00113 // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
00114 // USA
00115 // Questions? Contact Michael A. Heroux (maherou@sandia.gov) 
00116 // 
00117 // ***********************************************************************
00118 // @HEADER
00119 
00120 #ifndef _AMESOS_MERIKOS_H_
00121 #define _AMESOS_MERIKOS_H_
00122 
00123 #include "Amesos_ConfigDefs.h"
00124 #include "Amesos_BaseSolver.h"
00125 #include "Epetra_LinearProblem.h"
00126 #include "Epetra_Time.h"
00127 #ifdef EPETRA_MPI
00128 #include "Epetra_MpiComm.h"
00129 #else
00130 #include "Epetra_Comm.h"
00131 #endif
00132 #include "Epetra_CrsGraph.h"
00133 
00134 
00136 
00154 class Amesos_Merikos: public Amesos_BaseSolver { 
00155 
00156 public: 
00157 
00159 
00160 
00164   Amesos_Merikos(const Epetra_LinearProblem& LinearProblem );
00165 
00167 
00169   ~Amesos_Merikos(void);
00171 
00173 
00175 
00181     int RedistributeA() ;
00182     int ConvertToScalapack() ;
00183     int PerformNumericFactorization() ;
00184     int SymbolicFactorization() ;
00185 
00187 
00211     int NumericFactorization() ;
00212 
00214 
00234     int LSolve();
00235 
00236 
00238 
00258     int USolve();
00259 
00261 
00290     int Solve();
00291 
00292 
00293 
00295   
00297 
00298     
00300   const Epetra_LinearProblem *GetProblem() const { return(Problem_); };
00301 
00303 
00306   bool MatrixShapeOK() const ;
00307 
00309 
00311   int SetUseTranspose(bool UseTranspose) {UseTranspose_ = UseTranspose; return(0);};
00312 
00314   bool UseTranspose() const {return(UseTranspose_);};
00315 
00317   const Epetra_Comm & Comm() const {return(GetProblem()->GetOperator()->Comm());};
00318 
00320 
00323   int SetParameters( Teuchos::ParameterList &ParameterList )  ;
00324 
00326   int NumSymbolicFact() const { return( NumSymbolicFact_ ); }
00327 
00329   int NumNumericFact() const { return( NumNumericFact_ ); }
00330 
00332   int NumSolve() const { return( NumSolve_ ); }
00333 
00335   void PrintTiming();
00336   
00338   void PrintStatus();
00339   
00341 
00342  protected:
00343 
00344   bool UseTranspose_;
00345   const Epetra_LinearProblem * Problem_;
00346 
00347   Epetra_CrsMatrix *L;
00348   Epetra_CrsMatrix *U;
00349 
00350   bool PrintTiming_;
00351   bool PrintStatus_;
00352   bool ComputeVectorNorms_;
00353   bool ComputeTrueResidual_;
00354   
00355   int verbose_;
00356   int debug_;
00357 
00358   // some timing internal, copied from MUMPS
00359   double ConTime_;                        // time to convert to MERIKOS format
00360   double SymTime_;                        // time for symbolic factorization
00361   double NumTime_;                        // time for numeric factorization
00362   double SolTime_;                        // time for solution
00363   double VecTime_;                        // time to redistribute vectors
00364   double MatTime_;                        // time to redistribute matrix
00365   
00366   int NumSymbolicFact_;
00367   int NumNumericFact_;
00368   int NumSolve_;  
00369 
00370   Epetra_Time * Time_;
00371 
00372 
00373 
00374 //
00375 //  These allow us to use the Scalapack based Merikos code
00376 //
00377   Epetra_Map *ScaLAPACK1DMap_ ;          //  Points to a 1D Map which matches a ScaLAPACK 1D
00378                                          //  blocked (not block cyclic) distribution
00379   Epetra_CrsMatrix *ScaLAPACK1DMatrix_ ; //  Points to a  ScaLAPACK 1D
00380                                          //  blocked (not block cyclic) distribution
00381   Epetra_Map *VectorMap_ ;               //  Points to a Map for vectors X and B
00382   std::vector<double> DenseA_;                //  The data in a ScaLAPACK 1D blocked format
00383   std::vector<int> Ipiv_ ;                    //  ScaLAPACK pivot information
00384   int NumOurRows_ ;
00385   int NumOurColumns_ ;
00386 
00387 
00388   //
00389   //  Control of the data distribution
00390   //
00391   bool TwoD_distribution_;  // True if 2D data distribution is used
00392   int grid_nb_;             // Row and Column blocking factor (only used in 2D distribution)  
00393   int mypcol_;              // Process column in the ScaLAPACK2D grid
00394   int myprow_;              // Process row in the ScaLAPACK2D grid
00395   Epetra_CrsMatrix* FatOut_;//
00396 
00397   //
00398   //  Blocking factors (For both 1D and 2D data distributions)
00399   //
00400   int nb_;
00401   int lda_;
00402 
00403 int iam_;
00404 int nprow_;
00405 int npcol_;
00406 int NumGlobalElements_;
00407 int m_per_p_;
00408 
00409 
00410   
00411 };  // End of  class Amesos_Merikos  
00412 #endif /* _AMESOS_MERIKOS_H_ */
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Defines