
SP Parallel Programming Workshop
Mass Storage System - NSL UniTree
© Copyright Statement
-
A mass storage system provides reliable hierarchical file storage space
that is automatically managed, has a large file capacity, and is easily
accessible by the user.
-
Many different mass storage system configurations are possible.
-
We are initially using National Storage Laboratory's UniTree software
(NSL UniTree) with a variety of storage media.
-
We will be transitioning to the High Performance Storage System (HPSS)
later this year.
Overview:
What is NSL UniTree?
NSL UniTree is the Hierarchical Storage Management system (HSM) which
concurrently manages the different layers and performance levels of our
storage media.
- A variety of storage media provides mix of short-term (disk) and
long-term (tape) storage.
- Long-term storage space is virtually unlimited.
- Provides transparent access to files, regardless of their physical
location on the media.
-
Works in conjuction with AIX
- UniTree directories look just like UNIX directories
- Access is available via several interfaces.
- Files are stored and retrieved as unchanged byte streams.
Overview:
Our Current Configuration
- A HIPPI-attached Maximum Strategies Gen5 disk array, with a maximum
storage capacity of 376 Gb of RAID disk, serves as our first storage layer.
- Currently, we have two 80 GB RAID disk facilities, of which we are
using 120 Gb
- Fastest access occurs over HIPPI to memory on a HIPPI node (45MB/sec)
- There are only 8 HIPPI nodes on our SP2
- The IBM 3494 Tape Library Dataserver, which incorporates the IBM High
Performance Tape Subsystem (Tape Robot), acts as our second storage layer.
- IBM 3590 tape drives and tapes are used on this system (NTP)
- NTP tapes have a 10-20Gb capacity, depending on the compression
- Files staged to or from the 3590 drives average 10Mb/sec (assuming 12 Gb
of data per tape)
- Current maximum capacity is 25 Terabytes
- An RS/6000 990 acts as the storage server. It is connected to the tape
archive by SCSI, to the disk cache by ethernet and HIPPI and to the SP2 by
ethernet and HIPPI.
Files are migrated from disk to tape depending on several factors:
- the amount of disk space available
- the size of the files
- the age of the files
(UniTree tends to lean toward age, when choosing files to migrate or purge)
Special Features:
Multiple Storage Hierarchies
The ability to define multiple storage hierarchies allows centers to
configure different classes of storage devices, which provide for storage
management at different performance levels.
- A medium performance level could be met by a local SCSI disk cache
that migrates to a SCSI tape robot.
- We have a high performance level, with a high speed disk cache that
migrates to the 990 over HIPPI, then writes to tape over SCSI (tape robot).
- We are also testing a SCSI disk cache, which would improve our NFS
performance.
Special Features:
Automatic File Management
Files and their locations in the storage hierarchies are automatically
managed. The files are regularly moved within the hierarchy, through the
migration, purging, and caching processes.
- The RAID disk cache, the top level of the hierarchy, is reserved
for files which have most recently been accessed.
- Migration is the process of copying files from the top level down to
another level (tape). (At this point, files exist on both levels)
- Purging occurs when files on the disk cache exceed a preset limit of
available space. (Only files which have been migrated are eligible
for purging)
UniTree has a link to all files no matter which storage level they are on.
- A file residing on the top level, gives the user full access to it.
(no response lag)
- A file residing on the lower level, may have to be copied (cached) back
to the highest level in the hierarchy when accessed. (some response lag)
Special Features:
Large Files
- UniTree imposes no limits on the size of a file, however, the length of a
file name is limited to 256 characters.
- AIX limits the size of a file to 2 Gigabytes but has no restrictions on
the length of a file name.
- Keep the number of files in a directory to <500. The UniTree nameserver
doesn't handle large directories well.
- Be cautious when transferring files close to these limits.
Special Features:
Trash Cans
Protection from accidental file deletion is offered through the use of trash
cans.
- Each UniTree user has a directory named ".trash" in their UniTree home
directory.
- Files deleted from UniTree are moved into the ".trash" directory.
- To avoid naming conflicts, files are given extensions:
- the date the file was removed
- a global counter, that distinguishes between multiple files with the same
names, deleted simultaneously from different directories.
- Files will remain in the trash can for 30 minutes, then they are deleted.
- Files deleted from the trash can are NOT recoverable.
Special Features:
Multiple Copies
Keeping multiple copies of files on different tape volumes, minimizes the risk
of lost data due to media failure.
- The default number of copies is 0
- The maximum allowable number of copies is 3
- You can specify the number of copies to be made each transfer session
- Copies are created during migration of a file to tape
- Currently, there is no method to query or modify the number of copies
on a file-by-file basis.
Special Features:
File Families
Files may be segregated into families.
- Files in a family are migrated to storage on tapes that belong exclusively
to that family.
- Family designation reduces overall access times for files typically used
as a group.
- Once a file is part of a family, it may not be changed to another family.
- By default, all files are assigned to the "common family" with a family ID
of zero.
User Interfaces
UniTree may be accessed through several different interfaces. Each has
various capabilities and limitations. Try them all and choose the one(s)
that fit your needs. You may find a mix suits you best.
- Standard FTP
File Transfer Protocol (FTP), is a standard in the industry used to transfer
files to and from computer systems.
- Washington University of Saint Louis public domain FTP daemon (WU-FTP).
- Provides extended set of commands and capabilities.
- Allows use of UniTree commands from inside FTP with the "quote" command.
There are two FTP servers on site. You can access your files through either
of them:
akamai.mhpcc.edu
(ethernet - internet accessible)
wikiwiki.mhpcc.edu
( HIPPI - local to SP2 only)
Ftp to the server of your choice - (use "binary" mode for best performance
and to avoid data corruption)
ftp akamai.mhpcc.edu
ftp wikiwiki.mhpcc.edu
Enter your SP2 login ID and password, use FTP commands to transfer files to
or from your location.
- UTI
UTI provides a user-friendly interactive or batch interface to NSL UniTree.
- Developed by the Oak Ridge National Laboratory's Center for Computational Sciences
- Performance roughly equivalent to FTP
- Based upon LIBNSL client interface (IBM)
- Can be used in scripts, pipelines, and from the command line
- Superset of standard FTP command set
- Unix-style commands to manipulate files and directories
- Allows multiple working directories
- Automatic authentication using "magic biscuits"
- Multiple and Conditional commands, Wildcards, Recursion
- IN, OUT, and LOG commands allow you to read commands from a file, write
listable output to a file, or create audit trails
- Interrupt handling ^C, FIND, CHMOD, automatic renaming (backup) of
existing files, auto-retry if server is down
The UTI utility stores and retrieves complete files; access to partial files
is currently not possible.
UTI is located in /usr/local/bin; to execute it enter:
uti (start uti interactively)
uti [options] commands(s) (execute all commands & exit)
(Extensive online help is available as are man pages.)
- LIBNSL
Libnsl provides a library of functions similiar to the C library, that allows
access to UniTree data directly from a program.
- Performance is roughly equivalent to FTP and UTI
- Specify SP2 switch or ethernet when linking
- Provides access to UniTree via Remote Procedure Calls (RPC) for control
messages
- Function names are prepended with "nsl"
- Library resides in /usr/nslu/lib/libnsl.a
Sample source code is available in
/s/local/nslu/clnt/demo/example/
A README file, a brief description of each utility, and example code to
perform basic filesystem operations are available in
/s/local/nslu/clnt/demo/basic/
Edit the makefile for the platform you are using; use the code as an example
for building other applications. ALL DISCLAIMERS APPLY.
NOTE: There are NO man pages.
- RUCP
RUCP is similiar to the Remote Copy (RCP) utility.
- Moves one file at a time into or out of UniTree
- Transfer rates -
- non HIPPI nodes (8-14 MB/sec)
- HIPPI nodes (17 MB/sec)
- Good for LoadLeveler - no passwords
- Written and developed at the MHPCC
- Man pages available in /usr/local/man/
(Add this to your MANPATH environment variable)
- LIBHSM
HSM file distribution utilities.
- moves multiple files between /localscratch and UniTree with good
performance.
- hsmscatter(), hsmscatter_bytaskid(), and hsmgather()
- written and developed at the MHPCC
- located in /usr/local/lib/libhsm.a
- documented in /usr/local/man/man1/libhsm.1
- NFS
NFS is an industry standard that provides interconnection of file systems
between independent computers.
- With NFS, our local machine (SP2) has the ability to share files and
directories with UniTree
- This makes UniTree appear to be part of our SP2 file system
- UniTree home directories are transparently mounted via NFS and are visible
from any SP2 node
- Directories are located in /s/nslu/
- Your directory name is your SP2 userid
- Remote UniTree files look just like local files
- Hierarchy levels are indistinguishable
- Access speed will indicate the level
If the link between machines goes down you will see:
NFS server not responding, will try again
when the link is re-established you will see:
NFS server okay
- Use ^C to kill incomplete or hung processes
- NFS is over ethernet - speed is limited
- Performance Comparison
Results assume I/O is to locally mounted disk. Transfers
to and from memory are faster. The libhsm figures refer
to an aggregate speed across multiple transfers.
UNITREE PERFORMANCE RANGES BY ACCESS METHOD
-------------------------------------------------------
Access Platform Protocol MB/sec
+-----------+------------------+------------+---------+
|FTP(binary)| Non-SP2 host | A | 0.2-1.2 |
| | SP2 Compute Node | B | 2-6 |
| | SP2 I/O Node | C | 6-12 |
+-----------+------------------+------------+---------+
|UTI | Non-SP2 host | A | 0.2-1.2 |
| | SP2 Compute Node | B | 2-6 |
| | SP2 I/O Node | E | 6-12 |
+-----------+------------------+------------+---------+
+-----------+------------------+-----------------------
|libhsm | Non-SP2 host | N/A | N/A |
| | SP2 Compute Node | D | 15-28 |
| | SP2 I/O Node | D | 15-28 |
+-----------+------------------+------------+---------+
|rucp | Non-SP2 host | N/A | N/A |
| | SP2 Compute Node | B | 6-12 |
| | SP2 I/O Node | E | 12-17 |
+-----------+------------------+------------+---------+
+-----------+------------------+------------+---------+
|libnsl | Non-SP2 host | A | 0.2-1.2 |
| | SP2 Compute Node | D | 6-12 |
| | SP2 I/O Node | E | 12-17 |
+-----------+------------------+------------+---------+
|NFS | Non-SP2 host | F | 0.2-0.5 |
| | SP2 Compute Node | F | 0.2-0.5 |
| | SP2 I/O Node | F | 0.2-0.5 |
+-----------+------------------+------------+---------+
PROTOCOLS:
A - TCP/IP across ethernet
B - TCP/IP across SP2 switch and IPI-3 over HIPPI
C - TCP/IP and IPI-3 over HIPPI
D - TCP/IP across SP2 switch,IPI-3 over HIPPI & native SP2 switch
E - IPI-3 across HIPPI
F - UDP across ethernet
Recommendations:
-
Accessing files via the HIPPI FTP server (wikiwiki) offers the best
FTP performance.
-
Accessing files by the SP2 switch and over HIPPI (rucp, uti) offers the
fastest performance.
-
NFS access to files is the last choice when performance is a concern.
Summary:
-
Method A - protocol for FTP and internet access
-
Method B - protocol for rucp and uti
-
Method C - protocol for libnsl
-
Method D - protocol for libhsm with us (user space)
-
Method E - protocol for LL (HIPPI nodes only)
-
Method F - protocol for NFS
UniTree at the MHPCC
- All MHPCC users are automatically given a UniTree account; login ID and
password are the same as the ones used on the SP2.
- Special UniTree only accounts are not available.
- New accounts and password changes are propogated to UniTree at
approximately 11:00 HST each evening.
- The UniTree Mass Storage System is NOT backed up. YOU are responsible
for backing up critical files!
- You must remove all your files from UniTree before your account expires
or becomes inactive. If any files remain in UniTree after your userid expires,
they will be deleted and no backups will be kept.
- Users must not store files unrelated to their MHPCC projects on this
storage system. They must review their files periodically and remove those that
are no longer needed.
- Files are migrated from disk to tape almost immediately.
- UniTree purge -
- considers the "least recently accessed" files first as candidates for
deletion,
- goes by modification age as the last access,
- does NOT consider a read access as a modification.
Tips and Tricks
- UniTree handles a few large files better than a lot of small files.
Use the tar command to consolidate files before transfer.
- If you tar up files to put in UniTree, make sure none of them have
read only permissions (mode 400).
- If you attempt to untar a mode 400 file in your nslu directory,
tar will give a permissions error and stop.
- Untar the file somewhere else, extract the offender, change its
permissions and rearchive the file.
- UniTree is not fixing this bug.
- Use the "stage" utility to stage data to the UniTree disk cache
- Pre-stage all data that a job needs to avoid inefficiencies from staging
data on an as-needed basis
- Worst case - a separate tape mount and search is done for each file requested even though all the files were adjacent to each other on the same tape
Example:
looping through a list of files, reading from each one in turn and
all of them have to be read from tape
Result:
a separate tape mount will always occur for each file, UniTree will
not get the next request until the previous file request completes.
- Entire directories can be staged and recursion through subdirectories is
supported
- NFS pathnames and wildcards are also supported
- "stage" is located in /s/local/bin and the man page is available in
/s/local/man/man1
HPSS - The Future of Mass Storage for Supercomputers
References, Acknowledgements, WWW Resources
Additional Information on the WWW
References
- "NSL UniTree Users Guide", MHPCC Version Release 2.1, IBM. PostScript
document available by anonymous ftp from
ftp://ftp.mhpcc.edu/pub/UniTree/userguide.ps
- "UTI (UniTree Interface) Reference Manual", Michael K. Gleicher,
Center for Computational Sciences, Oak Ridge National Laboratory.
PostScript documentation available by anonymous ftp from
ftp://ftp.mhpcc.edu/pub/UniTree/utiref.ps
- "NSL UniTree Programmer's Guide", Release 2.1.1, IBM.
Acknowledgements
- We gratefully acknowledge Mike Gleicher from the Center for Computational
Sciences, Oak Ridge National Laboratory,Oak Ridge, TN, for installing UTI
at the MHPCC. In addition to allowing the use of his presentation
materials, Mike continues to graciously assist and support our UniTree and
HPSS endeavors.
- We gratefully acknowledge Terry Tyler from the IBM Government Systems
Division, MHPCC, Maui, Hi., for allowing us the use of his HPSS
presentation materials. Terry is the single point of contact for
installing, testing and supporting our HPSS system.
© Copyright 1996
Maui High Performance Computing Center. All rights reserved.
Documents located on the Maui High Performance Computing Center's WWW server
are copyrighted by the MHPCC. Educational institutions are encouraged to
reproduce and distribute these materials for educational use as long as
credit and notification are provided. Please retain this copyright notice
and include this statement with any copies that you make. Also, the MHPCC
requests that you send notification of their use to help@mail.mhpcc.edu.
Commercial use of these materials is prohibited without prior written
permission.
Written: Deidre Ashley ashley@mhpcc.edu
Revised: 03 July 1996 blaise@mhpcc.edu