GSS(GRUS SPARSE SOLVER)
is a novel sparse solver that can solve million unknowns in 1 minute even on
PC.
The
high performance and generality of GSS has been verified by many commercial
users and large testing sets.
Solve million SPD matrices in seconds!
Faster than other sparse solver!
Stable for ill-conditioned matrices.
High
Performance
In most case,1 minute is enough for the numerical
factorization of 1000,000 unknowns.
The forward/backward substitution time is in seconds.
For unsymmetric matrices,nearly 3 times faster than PARDISO in numerical
factorization(Detail in Experimental Results).
Robust
Handle matrices with high condition number or strange patterns. Some ill-conditioned matrices only can be solved by GSS.
CPU/GPU
hybrid computing
GSS is the first sparse solver that supports Nvidia
CUDA platform.
Good
scalability
Support at most 64 CPUs.
IN-CORE, OUT-CORE
and Hybrid-core
mode
Handle matrices that needs more than 4G memory in 32-bit platform
High
Generality
Support both 32bit and 64 bit Operating System.
Support both Linux and Windows.
Easy
to use
Multiple user modes.
Supports user defined module.
More than 10 parameters with default value.
The performance of GSS has been verified by many commercial users and testing sets with thousands of matrices.
For more details, please contact
MAIL: gsp@grusoft.com
Phone: (+86) 013501997193
QQ: 304718494
The test matrices(Table 1-3) are
all from the UF sparse matrix collection,which also
used in the paper of UMFPACK.
Table 4 lists the time of numerical factorization(in
seconds).
As you see, GSS 2.2 is Nearly 3 times faster than PARDISO in MKL 11.
The testing CPU is INTEL Q8200 with 8G memory. The operating system is Windows Vista 64.
Table 1 symmetric set
|
Name |
n |
Nonzeros(in 1000’s) |
Sym. |
description |
NORRIS |
TORSO2 |
115967 |
1033.5 |
0.992 |
2D human torso, electro-phys finite-diff |
SIMON |
OLAFU |
16146 |
1015.2 |
1.000 |
Structure problem |
SIMON |
VENKAT01 |
62424 |
1717.8 |
1.000 |
Unstructured 2D Euler problem |
BAI |
AF23560 |
23560 |
460.6 |
0.944 |
airfoil |
SIMON |
RAEFSKY3 |
21200 |
1488.8 |
1.000 |
Fluid-structure, turbulence |
ZHAO |
ZHAO1 |
33861 |
166.5 |
0.922 |
electromagnetics |
ZHAO |
ZHAO2 |
33861 |
166.5 |
0.922 |
electromagnetics |
FIDAP |
EX11 |
16614 |
1096.9 |
1.000 |
3D fluid flow, cylinder and plate |
SIMON |
RAEFSKY4 |
19779 |
1316.8 |
1.000 |
Container buckling problem |
WANG |
WANG4 |
26086 |
177.2 |
1.000 |
3D MOSFET semiconductor |
RONIS |
XENON1 |
48600 |
1181.1 |
1.000 |
Zeolite, sodalite crystals |
VANHEUKELUM |
CAGE10 |
11397 |
150.6 |
1.000 |
DNA electrophoresis |
NORRIS |
STOMACH |
213360 |
3021.6 |
0.848 |
3D electro-physical, human duodenum |
Table 2 2-by-2 set
|
Name |
n |
Nonzeros(in 1000’s) |
Sym. |
description |
GOODWIN |
GOODWIN |
7320 |
324.8 |
0.635 |
Fluid mechanics, finite-element |
AVEROUS |
EPB2 |
25228 |
175.0 |
0.670 |
Plate-fin heat exchanger |
GARON |
GARON2 |
13535 |
373.2 |
0.999 |
2D finite-element, Navier-Stokes |
GOODWIN |
RIM |
22560 |
1015.0 |
0.639 |
Fluid mechanics, finite-element |
NORRIS |
HEART2 |
2339 |
680.3 |
1.000 |
Quasi-static FEM, human heart |
AVEROUS |
EPB3 |
84617 |
463.6 |
0.667 |
Plate-fin heat exchanger |
BOVA |
RMA10 |
46835 |
2329.1 |
1.000 |
3D model of Charleston Harbor |
NORRIS |
HEART1 |
3557 |
1385.3 |
1.000 |
Quasi-static FEM, human heart |
HB |
PSMIGR_1 |
3140 |
543.2 |
0.479 |
Population migration |
Table 3 unsymmetric set
|
Name |
n |
Nonzeros(in 1000’s) |
Sym. |
description |
AT&T |
ONETONE2 |
36057 |
222.6 |
0.116 |
Harmonic balance method |
GRAHAM |
GRAHAM1 |
9035 |
335.5 |
0.718 |
Navier-Stokes, finite-element |
MALLYA |
LHR34C |
35152 |
764.0 |
0.002 |
Light hydrocarbon recovery |
SHEN |
E40R0100 |
17281 |
553.6 |
0.308 |
|
MALLYA |
LHR71C |
70304 |
1528.1 |
0.002 |
Light hydrocarbon recovery |
FIDAP |
EX40 |
7740 |
456.2 |
1.000 |
Navier-Stokes, FEM (3D) |
AT&T |
ONETONE1 |
36057 |
335.6 |
0.076 |
Harmonic balance method |
VAVASIS |
AV41092 |
41092 |
1683.9 |
0.001 |
Unstructured finite-element |
AT&T |
TWOTONE |
120750 |
1206.3 |
0.246 |
Harmonic balance method |
HB |
PSMIGR_2 |
3140 |
540.0 |
0.479 |
Population migration |
SIMON |
BBMAT |
38744 |
1771.7 |
0.529 |
2D airfoil, turbulence |
HOLLINGER |
G7JAC200SC |
59310 |
717.6 |
0.025 |
Economic modeling |
HOLLINGER |
MARK3JAC140SC |
64089 |
376.4 |
0.061 |
Economic modeling |
Table 4 comparative testing between GSS and PARDISO
|
Matrix |
PARDISO |
GSS |
GSS/PARDISO |
symmetric |
TORSO2 |
1.09 |
0.83 |
0.76 |
OLAFU |
0.34 |
0.22 |
0.65 |
|
VENKAT01 |
0.59 |
0.41 |
0.69 |
|
AF23560 |
0.75 |
0.42 |
0.56 |
|
RAEFSKY3 |
0.5 |
0.34 |
0.68 |
|
ZHAO1 |
0.48 |
0.23 |
0.48 |
|
ZHAO2 |
1.4 |
0.23 |
0.16 |
|
EX11 |
0.7 |
0.44 |
0.63 |
|
RAEFSKY4 |
0.81 |
0.58 |
0.72 |
|
WANG4 |
0.76 |
0.39 |
0.51 |
|
XENON1 |
1.95 |
1.14 |
0.58 |
|
CAGE10 |
1.78 |
0.42 |
0.24 |
|
STOMACH |
10.5 |
4.1 |
0.39 |
|
2-by-2 |
GOODWIN |
0.06 |
0.14 |
2.33 |
EPB2 |
0.22 |
0.13 |
0.59 |
|
GARON2 |
0.17 |
0.09 |
0.53 |
|
RIM |
0.23 |
0.69 |
3.00 |
|
HEART2 |
0.11 |
0.08 |
0.73 |
|
EPB3 |
0.34 |
0.23 |
0.68 |
|
RMA10 |
0.62 |
0.39 |
0.63 |
|
HEART1 |
0.33 |
0.19 |
0.58 |
|
PSMIGR_1 |
4.96 |
0.81 |
0.16 |
|
unsymmetric |
ONETONE2 |
0.2 |
0.17 |
0.85 |
GRAHAM1 |
0. 09 |
0.28 |
3.11 |
|
LHR34C |
0.44 |
0.25 |
0.57 |
|
E40R0100 |
0.12 |
0.12 |
1.00 |
|
LHR71C |
0.91 |
0.52 |
0.57 |
|
EX40 |
0.25 |
0.27 |
1.08 |
|
ONETONE1 |
0.84 |
0.34 |
0.40 |
|
AV41092 |
2.42 |
0.72 |
0.30 |
|
TWOTONE |
8.36 |
0.91 |
0.11 |
|
PSMIGR_2 |
5.9 |
1.01 |
0.17 |
|
BBMAT |
8.37 |
2.03 |
0.24 |
|
G7JAC200SC |
13.5 |
4.98 |
0.37 |
|
MARK3JAC140SC |
2.29 |
1.56 |
0.68 |
|
sum |
|
72.38 |
25.66 |
0.35 |
History
4/18/2012 GSS 2.3 released.
1. Improved numerical factorization. Faster than GSS 2.2.
2. Need less memory than GSS 2.2.
3. Fix some bugs.
9/19/2009 GSS 2.2 released.
1. Improved numerical factorization for SPD matrices, which is 20% faster than
PARDISO.
2. Improved CPU/GPU hybrid computing. The best speed-up for 500,000 unknowns is
7.
3. GSS_spd is free.
7/31/2007 GSS 2.1 released.
1. Support Nvidia CUDA.
2. Improved out-core module.
3. Improved memory module of LDLT.
12/25/2007 GSS 2.0 released.
4. Add new balance module.
5. Add LU-partial-updating module.
6. Improved out-of-core, in-core and hybrid-core module。
7. Add hybrid multifrontal/Frontal module.
8. Improved iterative refine module and get better estimation of condition
number.
11/25/2005 GSS 1.2 released.
9. Parallel version released.
10. Support INTEL Hyper-Threading.
11. Improved numerical factorization for symmetrical matrices.
12. Improved static pivoting.
13. Add iterative refine module.
9/12/2005 GSS 1.1 released
14. Add QUOTIENT GRAPH model for symbolic factorization.
15. Improved reorder module of diagonals.
16. Improved Numerical factorization for unsymmetrical matrices.
17. Add scaling module.
18. More experimental results.
7/20/2005 GSS 1.0 released.
|
©copyright 2002-2012 GRUSOFT All Rights Reserved |