GSS(GRUS SPARSE SOLVER) is a novel sparse solver that can solve million unknowns in 1 minute even on PC.
The high performance and generality of GSS has been verified by many commercial users and large testing sets.
Free for symmetric positive definite matrices
Solve million SPD matrices in seconds.
Faster than PARDISO .
High Performance
In most case,1 minute is enough for the numerical factorization of 1000,000 unknowns.
The forward/backward substitution time is in seconds.
For unsymmetric matrices,nearly 3 times faster than PARDISO in numerical factorization(Detail in Experimental Results).
Robust
Handle matrices with high condition number or strange patterns. Some ill-conditioned matrices only can be solved by GSS.
CPU/GPU hybrid computing
GSS is the first sparse solver that supports Nvidia CUDA platform.
Good scalability
Support at most 64 CPUs.
IN-CORE, OUT-CORE and Hybrid-core mode
Handle matrices that needs more than 4G memory in 32-bit platform
High Generality
Support both 32bit and 64 bit Operating System.
Support both Linux and Windows.
Easy to use
Multiple user modes.
Supports user defined module.
More than 10 parameters with default value.
The performance of GSS has been verified by many commercial users and testing sets with thousands of matrices.
Try free trial version for 100,000 unknowns
l download the application form.
2 Fill the application form. Send it to gsp@grusoft.com.
For more details, please contact
MAIL: gsp@grusoft.com
MSN: gsp.cys@gmail.com
QQ: 304718494
The test matrices(Table 1-3) are all from the UF sparse matrix collection,which also used in the paper of UMFPACK.
Table 4 lists the time of numerical factorization(in seconds).
As you see, GSS 2.2 is Nearly 3 times faster than PARDISO in MKL 11.
The testing CPU is INTEL Q8200 with 8G memory. The operating system is Windows Vista 64.
Table 1 symmetric set
Group |
Name |
n |
Nonzeros(in 1000¡¯s) |
Sym. |
description |
NORRIS |
TORSO2 |
115967 |
1033.5 |
0.992 |
2D human torso, electro-phys finite-diff |
SIMON |
OLAFU |
16146 |
1015.2 |
1.000 |
Structure problem |
SIMON |
VENKAT01 |
62424 |
1717.8 |
1.000 |
Unstructured 2D Euler problem |
BAI |
AF23560 |
23560 |
460.6 |
0.944 |
airfoil |
SIMON |
RAEFSKY3 |
21200 |
1488.8 |
1.000 |
Fluid-structure, turbulence |
ZHAO |
ZHAO1 |
33861 |
166.5 |
0.922 |
electromagnetics |
ZHAO |
ZHAO2 |
33861 |
166.5 |
0.922 |
electromagnetics |
FIDAP |
EX11 |
16614 |
1096.9 |
1.000 |
3D fluid flow, cylinder and plate |
SIMON |
RAEFSKY4 |
19779 |
1316.8 |
1.000 |
Container buckling problem |
WANG |
WANG4 |
26086 |
177.2 |
1.000 |
3D MOSFET semiconductor |
RONIS |
XENON1 |
48600 |
1181.1 |
1.000 |
Zeolite, sodalite crystals |
VANHEUKELUM |
CAGE10 |
11397 |
150.6 |
1.000 |
DNA electrophoresis |
NORRIS |
STOMACH |
213360 |
3021.6 |
0.848 |
3D electro-physical, human duodenum |
Table 2 2-by-2 set
Group |
Name |
n |
Nonzeros(in 1000¡¯s) |
Sym. |
description |
GOODWIN |
GOODWIN |
7320 |
324.8 |
0.635 |
Fluid mechanics, finite-element |
AVEROUS |
EPB2 |
25228 |
175.0 |
0.670 |
Plate-fin heat exchanger |
GARON |
GARON2 |
13535 |
373.2 |
0.999 |
2D finite-element, Navier-Stokes |
GOODWIN |
RIM |
22560 |
1015.0 |
0.639 |
Fluid mechanics, finite-element |
NORRIS |
HEART2 |
2339 |
680.3 |
1.000 |
Quasi-static FEM, human heart |
AVEROUS |
EPB3 |
84617 |
463.6 |
0.667 |
Plate-fin heat exchanger |
BOVA |
RMA10 |
46835 |
2329.1 |
1.000 |
3D model of Charleston Harbor |
NORRIS |
HEART1 |
3557 |
1385.3 |
1.000 |
Quasi-static FEM, human heart |
HB |
PSMIGR_1 |
3140 |
543.2 |
0.479 |
Population migration |
Table 3 unsymmetric set
Group |
Name |
n |
Nonzeros(in 1000¡¯s) |
Sym. |
description |
AT&T |
ONETONE2 |
36057 |
222.6 |
0.116 |
Harmonic balance method |
GRAHAM |
GRAHAM1 |
9035 |
335.5 |
0.718 |
Navier-Stokes, finite-element |
MALLYA |
LHR34C |
35152 |
764.0 |
0.002 |
Light hydrocarbon recovery |
SHEN |
E40R0100 |
17281 |
553.6 |
0.308 |
|
MALLYA |
LHR71C |
70304 |
1528.1 |
0.002 |
Light hydrocarbon recovery |
FIDAP |
EX40 |
7740 |
456.2 |
1.000 |
Navier-Stokes, FEM (3D) |
AT&T |
ONETONE1 |
36057 |
335.6 |
0.076 |
Harmonic balance method |
VAVASIS |
AV41092 |
41092 |
1683.9 |
0.001 |
Unstructured finite-element |
AT&T |
TWOTONE |
120750 |
1206.3 |
0.246 |
Harmonic balance method |
HB |
PSMIGR_2 |
3140 |
540.0 |
0.479 |
Population migration |
SIMON |
BBMAT |
38744 |
1771.7 |
0.529 |
2D airfoil, turbulence |
HOLLINGER |
G7JAC200SC |
59310 |
717.6 |
0.025 |
Economic modeling |
HOLLINGER |
MARK3JAC140SC |
64089 |
376.4 |
0.061 |
Economic modeling |
Table 4 comparative testing between GSS and PARDISO
SET |
Matrix |
PARDISO |
GSS |
GSS/PARDISO |
symmetric |
TORSO2 |
1.09 |
0.83 |
0.76 |
OLAFU |
0.34 |
0.22 |
0.65 |
VENKAT01 |
0.59 |
0.41 |
0.69 |
AF23560 |
0.75 |
0.42 |
0.56 |
RAEFSKY3 |
0.5 |
0.34 |
0.68 |
ZHAO1 |
0.48 |
0.23 |
0.48 |
ZHAO2 |
1.4 |
0.23 |
0.16 |
EX11 |
0.7 |
0.44 |
0.63 |
RAEFSKY4 |
0.81 |
0.58 |
0.72 |
WANG4 |
0.76 |
0.39 |
0.51 |
XENON1 |
1.95 |
1.14 |
0.58 |
CAGE10 |
1.78 |
0.42 |
0.24 |
STOMACH |
10.5 |
4.1 |
0.39 |
2-by-2 |
GOODWIN |
0.06 |
0.14 |
2.33 |
EPB2 |
0.22 |
0.13 |
0.59 |
GARON2 |
0.17 |
0.09 |
0.53 |
RIM |
0.23 |
0.69 |
3.00 |
HEART2 |
0.11 |
0.08 |
0.73 |
EPB3 |
0.34 |
0.23 |
0.68 |
RMA10 |
0.62 |
0.39 |
0.63 |
HEART1 |
0.33 |
0.19 |
0.58 |
PSMIGR_1 |
4.96 |
0.81 |
0.16 |
unsymmetric |
ONETONE2 |
0.2 |
0.17 |
0.85 |
GRAHAM1 |
0. 09 |
0.28 |
3.11 |
LHR34C |
0.44 |
0.25 |
0.57 |
E40R0100 |
0.12 |
0.12 |
1.00 |
LHR71C |
0.91 |
0.52 |
0.57 |
EX40 |
0.25 |
0.27 |
1.08 |
ONETONE1 |
0.84 |
0.34 |
0.40 |
AV41092 |
2.42 |
0.72 |
0.30 |
TWOTONE |
8.36 |
0.91 |
0.11 |
PSMIGR_2 |
5.9 |
1.01 |
0.17 |
BBMAT |
8.37 |
2.03 |
0.24 |
G7JAC200SC |
13.5 |
4.98 |
0.37 |
MARK3JAC140SC |
2.29 |
1.56 |
0.68 |
sum |
|
72.38 |
25.66 |
0.35 |
History
9/19/2009 GSS 2.2 released.
1. Improved numerical factorization for SPD matrices, which is 20% faster than PARDISO.
2. Improved CPU/GPU hybrid computing. The best speed-up for 500,000 unknowns is 7.
3. GSS_spd is free.
7/31/2007 GSS 2.1 released.
1. Support Nvidia CUDA.
2. Improved out-core module.
3. Improved memory module of LDLT.
12/25/2007 GSS 2.0 released.
4. Add new balance module.
5. Add LU-partial-updating module.
6. Improved out-of-core, in-core and hybrid-core module¡£
7. Add hybrid multifrontal/Frontal module.
8. Improved iterative refine module and get better estimation of condition number.
11/25/2005 GSS 1.2 released.
9. Parallel version released.
10. Support INTEL Hyper-Threading.
11. Improved numerical factorization for symmetrical matrices.
12. Improved static pivoting.
13. Add iterative refine module.
9/12/2005 GSS 1.1 released
14. Add QUOTIENT GRAPH model for symbolic factorization.
15. Improved reorder module of diagonals.
16. Improved Numerical factorization for unsymmetrical matrices.
17. Add scaling module.
18. More experimental results.
7/20/2005 GSS 1.0 released.