
SP Parallel Programming Workshop
IBM SP Hardware/Software Overview
© Copyright Statement
Table of Contents
- IBM's POWER Architectures
- Scalable Parallel Strategy
- SP Hardware
- SP Frames
- Processor Nodes
- High Performance Switch
- System Connectivity
- Control Workstation
- File/Install Servers
- SP Software
- AIX Operating System
- System Administration
- Parallel Environment
- LoadLeveler
- SP Applications
- Performance Benchmarks
- References, Acknowledgements, WWW Resources
- In 1990, IBM announced the RISC System/6000
(RS/6000)family of superscalar workstations and servers based
upon IBM's POWER architecture
- POWER = Performance Optimized With Enhanced RISC
- RISC = Reduced Instruction Set Computer
- Superscalar = simultaneous execution of multiple instructions.
- RS/6000 Cluster: networked configuration of multiple RS/6000 machines.
Provided initial basis for RS/6000 based, parallel, distributed
computing.
- IBM's first scalable, POWERparallel system was the SP1. Included:
- POWER (RS/6000) processor architecture
- Rack configuration - multiple machines in a frame
- High performance inter-processor communications
- System software for managing multiple machines
- Parallel Environment software for parallel program developers
- Continued improvements in the POWER processor architecture led to
the POWER2 processor. Improvements in the SP1 led to the SP2,
which incorporated the POWER2 processor.
- POWER2 processor features include:
- Multiple chip, CMOS (Complementary Metal-Oxide Semiconductor)
technology processor:
- ICU - Instruction Cache Unit (32 KB)
- DCU - Data Cache Unit (64 to 256 KB)
- FPUs - Dual Floating Point Units
- FXUs - Dual Fixed Point Units
- SCU - Storage Control Unit
- Clock speed = 66.5 MHz (77 MHz clock available 8/95)
- Combined floating point multiply-add (FMA) instruction which allows
a peak MFLOPS rate equal to two times the MHz rate
- Execute up to 6 instructions per clock cycle (two fixed-point,
two floating point, branch, conditional register). Equals 8
operations if both floating point units are executing the combined
multiply-add instruction.
- Zero-cycle branches - instruction path determined in advance by
the instruction cache unit
- Special instructions for increasing performance - for example:
- Quad-word(128 bits) floating-point storage references - both load
and store. For example, a Quad load permits two adjacent double
precision floating point numbers to be loaded into two adjacent
floating point registers with one instruction.
- Square root instruction (performed in hardware)
- An illustration of the POWER2 processor complex appears below:
- Additional information about the RS/6000 systems and detailed
technical descriptions of the POWER2 processor complex can be
found at:
- Based upon proven RS/6000 technology and architected for growth
- Consistent software environment from workstations to
massively parallel systems, based upon open standards -
"Palmtop to teraFLOPS" strategy
- Flexible architecture- provide system
options that allow both technical and commercial customers
to design and change their own computing environment
- Single-point-of-control system management capability that
balances serial and parallel, batch and interactive applications
- Make available a full range of serial and parallel applications
- Present an affordable entry with growth according to needs
Click here for a larger image
- An SP system is composed of 1 or more SP frames each containing
multiple processor nodes
- Frame characteristics:
- 2 - 16 machines/frame (depends if machines are wide/thin nodes)
- High Performance Switch (optional)
- Redundant frame power
- Concurrent maintenance
- Can mix different machine types (wide/thin node) within a frame
- For larger systems (over 80 nodes), frames are used to house
intermediate High Performance Switch hardware components
- Photos of an SP frame (cover opened)
- Different models, types and memory/disk configurations of SP nodes
are available.
| SP Processor Comparisons |
| Processor Type |
Thin |
Thin 2 |
Wide |
Wide |
| Nodes per frame
| 16
| 16
| 8
| 8
|
| Clock Speed
| 66 MHz
| 66 MHz
| 66 MHz
| 66 MHz
|
| Peak Megaflops
| 266
| 266
| 266
| 266
|
| Instruction Cache
| 32 KB
| 32 KB
| 32 KB
| 32 KB
|
| Memory Cards
| 2
| 2
| 2
| 4 or 8
|
| Memory
| 64-512 MB
| 64-512 MB
| 64-512 MB
| 64-2048 MB
|
| Data Cache
| 64 KB
| 128 KB
| 128 KB
| 256 KB
|
| Memory to Data Cache Bus
| 64 bit
| 128 bit
| 128 bit
| 256 bit
|
| Data Cache to Processor Bus
| 128 bit
| 256 bit
| 256 bit
| 256 bit
|
| Disk
| 1-9 GB
| 1-9 GB
| 1-18 GB
| 1-18 GB
|
| Microchannel Adapter Slots
| 4
| 4
| 8
| 8
|
| L2 Cache
| 0-1 MB
| 0-2 MB
| n/a
| n/a
|
- Although all nodes have the same clock speed, actual performance may
be affected by cache size, memory-cache bus and cache-processor bus.
For example:
- 77 MHz wide node available 8/95
- IBM reference document, "Processor Node Comparison" details the
different node implementations and compares relative performance.
Available from the MHPCC anonymous ftp server (ftp.mhpcc.edu)
in pub/IBM.PostScript.Manuals/sp2nodes.ps.Z
- Photo of the inside of a thin node
available here.
- Provides the INTERNAL message passing fabric that connects all
of the SP processors together.
- Topology
- SP2 networks are bidirectional multistage interconnection networks (MIN's)
- Bi-directional, any-to-any internode connection - allows all
processors to send messages simultaneously
- Multistage Interconnection: on larger systems
(over 80 nodes/5 frames), additional intermediate switches are
added as the system is scaled upward.
- Sample 64 node switch configuration below.
- Additional switch configuration diagrams:
- SP High Performance Switch network characteristics
- Packet-switched network (versus circuit-switched)
- Support for multi-user environment - multiple jobs may run
simultaneously over the switch (one user does not monopolize switch)
- Path redundancy - multiple routings between any two nodes. Permits
routes to be generated even when there are faulty components in
the system.
- Error detection
- Architected for expansion to 1000s of ports
- Protocols
- IP (Internet Protocol) - default; permits shared usage of
HPS-2 adapter by multiple processes.
- US CSS (User Space Communication Subsystem) - intended for
parallel applications that require maximum communications
performance. Only one process per node may use US communications.
- Performance
- Peak bi-directional bandwidth between any two nodes: 40 MB/sec
- Hardware latency: 500 ns up to 80 nodes, 875 ns for systems with
up to 512 nodes
- Message passing performance (IBM)
| Protocol
| Node Type
| Latency
| Pt to Pt Bandwidth
|
| IP
| Thin 66 MHz
| 312.1 usec
| 9.9 MB/Sec.
|
| Thin-2 66 MHz
| 270.4 usec
| 12.0 MB/Sec.
|
| Wide 66 MHz
| 268.8 usec
| 12.1 MB/Sec.
|
| US
| Thin 66 MHz
| 40.0 usec
| 35.4 MB/Sec.
|
| Thin-2 66 MHz
| 39.0 usec
| 35.7 MB/Sec.
|
| Wide 66 MHz
| 39.2 usec
| 35.6 MB/Sec.
|
- Two basic hardware elements
- Switch board - One switch board per SP frame. Contains 8 logical
switch chips with 16 physical chips for reliablility reasons.
8 logical chips wired as bidirectional 4-way to 4-way crossbar.
- Communications adapter - one HPS-2 adapter per SP node. Occupies
one microchannel adapter slot.
- Photo of the High Performance Switch
hardware provided here
- HPS Adapter-2 features
- Incorporates an Intel i860 XR 64-bit microprocessor, with 8 MB of
DRAM for communications coprocessing - to offload work from the CPU.
- Error checking - message CRC generation and checking
- Switch failure recovery
- Uses multiplexing to permit simultaneous IP and US communications
on same node. However, only one US protocol job is permitted
to use the adapter at a time.
- Continuing improvement and growth in switch
fabric design
- Additional technical information about the High Performance Switch:
- SP nodes are equipped with microchannel adapter (MCA) slots to permit
a variety of I/O and network interfaces.
- Wide nodes have 8 MCA slots, thin nodes have 4.
- One Ethernet LAN Standard
- Supported adapters:
- Ethernet
- FDDI
- SCSI
- BMCA
- FCS
- HIPPI
- Token Ring
- ESCON
- ATM
- High Performance Switch
- Sample connectivity graphic available
here
- Serves as the single point of control for System Support Programs
used by System Administrators for system monitoring, maintenance and
control.
- Separate machine - not part of the SP frame
- Must be a RISC System/6000
- Connects to each frame with
- RS-232 control line
- external ethernet LAN
- Acts as install server for other SP nodes
- May also act as a file server
- Typically, the full AIX operating system and IBM software is installed
on all SP2 nodes.
- Upgrades to operating system or IBM software can first be installed
on control workstation and then propagated to "install" nodes.
Install nodes can, in turn, propagate upgrades to other nodes.
- SP nodes can be attached to auxillary disk to act as file servers.
Wide nodes are usually used if this is desired.
- Other network connected machines can be used to serve files to SP nodes
over the network.
- A simple SP2 configuration appears here
- SP system administration software includes a full set of tools used
primarily by a System Administrator with "root" authority to perform
the following tasks:
- Install and configure the system
- High Speed Switch installation and maintenance
- Software installation
- Manage user accounts
- Manage file collections
- Manage print and mail services
- Manage automounter
- Create diskless clients
- Monitor and control system hardware. Examples of several
monitoring tool displays:
- Includes both proprietary and publically available code
- The Parallel Environment (PE) is the SP software environment designed
for the development and execution of parallel Fortran, C and C++ programs.
- PE consists of the following components
- Parallel Operating Environment (POE)
- Execution environment for user parallel tasks
- Interface to compilers
- Initializes parallel environment
- Partition Manager allocates nodes for task
- Copies executables from the initiating node to each node in partition
- Loads executable on each node in partition
- Sets up standard in/out for nodes in partition
- Parallel Message Passing Libraries (MPL and MPI)
- Point to Point Message Passing Library
- Collective Communications Library (CCL)
- Visualization Tool (VT)
- Program Visualization
- Performance Monitoring
- Parallel Debuggers
- Program Marker Array
- System Status Array
- Parallel Profiler (prof)
- See the POE Tutorialand
POE Exercisesfor detailed
information.
- A batch job scheduling application
- Schedules either serial or parallel jobs
- Provides accounting information
- Provides a graphical user interface for job submission and monitoring
- Can be configured to interface with NQS system
- Runs on an SP and RS/6000, Sun, and Silicon Graphics workstations
- Additional information:
- Over 10,000 serial AIX software applications are currently available.
- Detailed information for serial AIX software products can be accessed on
IBM's WWW server at:
- Serial application areas include:
Technical
Architecture, Engineering, Construction
Geographic Systems/Mapping
Physical Sciences
Scientific/Engineering
Mechanical Design and Analysis
Fluids, CFD, Flow Analysis
Solids and Drafting
Structural Analysis, CSM
General Business
Accounting
Legal Services
Real Estate
Distribution: Wholesale/Retail
Industrial/Manufacturing
Financial Services
Health Care
Insurance
Other Commercial Applications
Education
Government/Public Management
Graphics
Imaging and Document Management
Mass Media/Communications
Organizations/Fund Raising
Public Utilities
Transportation
Cross Industry
Artificial Intelligence
Database
Decision Support
Information Management
Integrated Workgroup
Publishing
Personal Information Managers
Project Management
Networks and Communications
Development Tools
Compilers/IDEs
Cross Platform Development Tools
Device Driver
Editors
Fourth Generation Languages
On-Line Transaction Processing Systems
Other Database Utility
Software Design Tools
Software Testing
- Parallel application software for the SP includes a number of
packages either already available or committed for availability.
- Detailed information for parallel SP software products can be
accessed on IBM's WWW server at:
- IBM awarded $7 million ARPA grant to deliver parallel computing
applications to U.S. industry - read the details from IBM's WWW
server at:
http://www.rs6000.ibm.com/parallel/news/stories/arpa.html
- Parallel SP application software (available or committed) includes:
Chemical and Pharmaceutical.
AMBER 4.0 AMPAC
BATCHMIN 4.0 BIGSTRN-3
CHARMm 22 DGEOM
DISCOVER 2.9 DISCOVER 2.9.5
DMOL 2.3 GAMESS(USA)
GAUSSIAN 94 HONDO 8.4
MOPAC 7.0 MULLEKIN
NCSA Disco SIMBIG
SPARTAN 3.0 WESDYN
XPLOR 3.1
Electronic Analysis.
ASX-P ATHENA
BEBOP DAVINCI
FAIM MCP2D
MCP3D MEDICI
NIAGRA PISCES-MP
PROCLAT PROCPHASE
PWORDS SMART-SPICE
STRIDE THUNDER
VWF X-Wire
Engineering Analysis and Research.
ABAQUS Airplane
CFDS-FLOW3D DPAM/Cutter
FIRE FLOW3P
FLO67P Forge 3
GTNS2D LS-DYNA3D V930MP
MARC(linear) MARC(non-linear)
MSC/Eng. Application Code NEKTON
NPARC3D V1.0 PAM-CRASH
PCG Solver POLYFLOW
PVSOLVE RADIOSS
RAMPANT RAYON
SAMCEF-Structures SESTRA
SPECTRUM STAR-CD
SYSNOISE (Acoustic)
Petroleum Exploration and Production.
COMP III COMP4
FOCUS 2D/3D ECLIPSE Family
GEOVECTEUR MIGPACK
MORE OMEGA
PROMAX SeisUP
SimBest THERM
VIP-EXECUTIVE
Commercial and Database.
ACUCOBOL-85 ADABAS C
ADSM/6000 ADVANTAGE/2000 CA-Unicenter
CA/OpenIngres CICS/6000
DB2/6000 DB2/6000 Parallel Ed.
DCE (Except DCE Manager) EDA/SQL
ENCINA EpochBackup Client
Epsilon Fed Fin Sys (FFS/6000)
FOCUS HACMP 2.1 & HACMP 3.1
IDIS INFORMIX (OnLine DB)
INFORMIX (DSA/xmp) Interactive Analysis System
Job Scheduler for AIX Lawson Software
LEGEND NATURAL
NetWorker Neural Network Utilities
NSL Unitree onGO
OpenWorkFlow ORACLE 7.0
ORACLE 7.1 ORACLE Coop. Apps. Rel 10.4
PeopleSoft HRMS PeopleSoft Financials
PeopleTools Performance Toolbox
Prevail/XP-Jobtrac Remote Prevail/XP-Manager for RS/6000
Prevail/XP-PCS for RS/6000 PSF/6000
Pwrbnch COBOL Pwrbnch C++
Pwrbnch FORTRAN QUANTUM LEAP
Red Brick Warehouse Red Brick Warehouse VPT
REELlibrarian for AIX/6000 SAP R3
SAP R3 Parallel DB Support SAS System
SequeLink SNAP
SYBASE System 10 SYBASE Navigation Server
System Management Template TME
Triton TUXEDO
Universal OLAS Vertical Market Solutions
WorkFlow Template
- NAS Parallel Benchmarks - Numerical Aerodynamic Simulation benchmark
results maintained by the NASA Ames Research Center. For complete
information, access the NASA Ames WWW server at:
http://www.nas.nasa.gov/NAS/NPB The results below were taken
from the April, 1995 report.
- IBM reference document, "Performance Measurements" details a number
of SP benchmarks. Available from the MHPCC anonymous ftp server
(ftp.mhpcc.edu) in pub/IBM.PostScript.Manuals/sp2perf.ps.Z
Additional Information on the WWW
References and Acknowledgements
- "IBM AIX Parallel Environment Operation and Use Release 2.0", IBM
Corporation
- "IBM 9076 Scalable POWERparallel Systems Administration Guide", IBM
Corporation"
- IBM RS/6000 Scalable POWERparallel Systems Processor Node Comparison", IBM
Corporation. June 19, 1995.
- IBM RS/6000 Scalable POWERparallel Systems Performance Measurements" IBM
Corporation. June 19, 1995.
- We gratefully acknowledge the IBM Corporation for
providing much of the original material included in this document.
© Copyright 1994
Maui High Performance Computing Center. All rights reserved.
Documents located on the Maui High Performance Computing Center's WWW server
are copyrighted by the MHPCC. Educational institutions are encouraged to
reproduce and distribute these materials for educational use as long as
credit and notification are provided. Please retain this copyright notice
and include this statement with any copies that you make. Also, the MHPCC
requests that you send notification of their use to help@mail.mhpcc.edu.
Commercial use of these materials is prohibited without prior written
permission.
26 December 1996 editor@mhpcc.edu