Notes
Slide Show
Outline
1
Storage Area Network Usage

A UNIX SysAdmin’s View of How A SAN Works
2
Disk Storage
  • Embedded
    • Internal Disks within the System Chassis
  • Directly Attached
    • External Chassis of Disks connected to a Server via a Cable
  • Directly Attached Shared
    • External Chassis connected to more than one Server via a Cable
  • Networked Storage
    • NAS
    • SAN
    • others
3
Disk Storage – 2000-2004
4
Deficiencies of Direct Connect Storage
  • Single System Bears Entire Cost of Storage
    • Small Server in an EMC Shop
    • Large Server cannot easily share its unused storage
  • Managability
    • Fragmented and Isolated
  • Scalability
    • Limited
    • What happens when you run out of peripheral bus slots?
  • Availability
    • “SCSI Bus Reset”
    • Failover is a complicated add-on, if available at all
5
DASD
  • Direct Access Storage Device
    • They still call it this in an IBM Mainframe Shop
  • Basic Limits of Disk Storage Recognized
    • Latency
      • Rotation Speed of the disk
    • Seek Time
      • Radial Movement of the Read/Write Heads
    • Buffer Sizes
      • Stop sending me data, I can’t write fast enough!
6
SCSI
  • SCSI – Small Computer System Interface
    • From Shugart’s 1979 SASI implementation
      • SASI: Shugart Associates System Interface
  • Both Hardware and I/O Protocol Standards
    • Both have evolved over time
    • Hardware is source of most limitations
    • I/O Protocol has long-term potential
7
SCSI - Pro
  • Device Independence
    • Mix and match device types on the bus
    • Disk, Tape, Scanners, etc…
  • Overlapping I/O Capability
    • Multiple read & write commands can be outstanding simultaneously
  • Ubiquitous
8
SCSI - Con
  • Distance vs. Speed
    • Double the Signaling Rate
      • Speed: 40, 80, 160, 320 MBps
    • Halve the Cable Length Limits
  • Device Count: 16 Maximum
    • Low voltage Differential Ultra3 SCSI can support only 16 devices on a 12 meter cable at 160 MBps
  • Server Access to Data Resources
    • Hardware changes are disruptive
9
SCSI – Overcoming the Con
  • New Hardware & Signaling Platforms
  • SCSI-3 Introduces Serial SCSI Support
    • Fibre Channel
    • Serial Storage Architecture (SSA)
      • Primarily an IBM implementation
    • FireWire (IEEE 1394 – Apple fixes SCSI)
      • Attractive in consumer market
  • Retains SCSI I/O Protocol
10
Scaling SCSI Devices
  • Increase Controller Count within Server
    • Increasing Burden To CPU
      • Device Overhead
      • Bus Controllers can be saturated
    • You can run out of slots
    • Many Queues, Many Devices
      • Queuing Theory 101 (check-out line) - undesirable
11
Scaling SCSI Devices
  • Use Dedicated External Device Controller
    • Hides Individual Devices
      • Provide One Large Virtual Resource
    • Offloads Device Overhead
    • One Queue, Many Devices - good
    • Cost and Benefit
      • Still borne by one system
12
RAID
  • Redundant Array of Inexpensive Disks
  • Combine multiple disks into a single virtual device
  • How this is implemented determines different strengths
    • Storage Capacity
    • Speed
      • Fast Read or Fast Write
    • Resilience in the face of device failure
13
RAID Functions
  • Striping
    • Write consecutive logical byte/blocks on consecutive physical disks
  • Mirroring
    • Write the same block on two or more physical disks
  • Parity Calculation
    • Given N disks, N-1 consecutive blocks are data blocks, Nth block is for parity
    • When any of the N-1 data blocks are altered, N-2 XOR calculations are performed on these N-1 blocks
    • The Data Block(s) and Parity Block are written
    • Destroy one of these N blocks, and that block can be reconstructed using N-2 XOR calculations on the remaining N-1 blocks
    • Destroy two or more blocks – reconstruction is not possible
14
RAID Function – Pro & Con
  • Striping
    • Pro: Increases Spindle Count for Increased Thruput
    • Con: Does not provide redundancy
  • Mirroring
    • Pro: Provides Redundancy without Parity Calculation
    • Con: Requires at least 100% disk resource overhead
  • Parity Calculation
    • Pro: Cuts Disk Resource Overhead to 1/N
    • Con: Parity calculation is expensive
      N-2 calculations are required
      If all N-1 data blocks are not in cache, they must be read
15
RAID Types
  • RAID 0
    • Stripe with No Parity
  • RAID 1
    • Mirror two or more disks
  • RAID 0+1
    • Stripe on Inside, Mirror on Outside
  • RAID 1+0
    • Mirrors on Inside, Stripe on Outside
  • RAID 3
    • Synchronous, Subdivided Block Access; Dedicated Parity Drive
  • RAID 4
    • Independent, Whole Block Access; Dedicated Parity Drive
  • RAID 5
    • Like RAID 4, but Parity striped across multiple drives
16
RAID 0 RAID 1
17
RAID 3 RAID 5
18
RAID 1+0 RAID 0+1
19
Breaking the Direct Connection
  • Now you have high performance RAID
    • The storage bottleneck has been reduced
    • You’ve invested $$$ to do it
    • How do you extend this advantage to N servers without spending N x $$$?
  • How about using existing networks?
20
How to Provide Data Over IP
  • NFS (or CIFS) over a TCP/IP Network
    • This is Network Attached Storage (NAS)
    • Overcomes some distance problems
    • Full Filesystem Semantics are Lacking
      • …such as file locking
    • Speed and Latency are problems
    • Security and Integrity are problems as well
  • IP encapsulation of I/O Protocols
    • Not yet established in the marketplace
    • Current speed & security issues
21
NAS and SAN
  • NAS – Network Attached Storage
    • File-oriented access
    • Multiple Clients, Shared Access to Data
  • SAN – Storage Area Network
    • Block-oriented access
    • Single Server, Exclusive Access to Data
22
NAS: Network Attached Storage
  • File Objects and Filesystems
    • OS Dependent
    • OS Access & Authentication
  • Possible Multiple Writers
    • Require locking protocols
  • Network Protocol: i.e., IP
  • “Front-end” Network
23
SAN: Storage Area Network
  • Block Oriented Access To Data
  • Device-like Object is presented
  • Unique Writer
  • I/O Protocol: SCSI, HIPPI, IPI
  • “Back-end” Network
24
 
25
A Storage Area Network
  • Storage
    • StorageWorks MA8000 (24), EVA (2)
    • HDS is 2nd Approved Storage Vendor
      • 9980 Enterprise Storage Array – EMC class storage
  • Switches
    • Brocade 12000 (8), 3800 (20), & 2800 (34)
      • 3900’s are being deployed – 32 port
  • UNIX Servers on the SAN
    • Solaris (56), IRIX (5), HP-UX (5), Tru64 (1)
  • Storage Volume Connected to UNIX Servers
    • 13000 GB as of May, 2003
  • Windows Servers
    • Windows 2000 (74), NT 4.0 (16)
26
SAN Implementations
  • FibreChannel
    • FC Signalling Carrying SCSI Commands & Data
    • Non-Ethernet Network Infrastructure
  • iSCSI
    • SCSI Encapsulated By IP
    • Ethernet Infrastructure
  • FCIP – FibreChannel over IP
    • FibreChannel Encapsulated by IP
    • Extending FibreChannel over WAN Distances
    • Future Bridge between Ethernet & FibreChannel
    • iFCP - another gateway implementation
27
NAS & SAN in the Data Center
28
FCIP In The Data Center
29
FibreChannel
  • How SCSI Limitations are Addressed
    • Speed
    • Distance
    • Device Count
    • Access
30
FibreChannel – Speed
  • 266 Mbps – ten years ago
  • 1063 Mbps – common in 1998
  • 2125 Mbps – available today
  • 4 Gbps – near future products
    • Backward compatible to 1 & 2 Gbps
  • 10 Gbps – 2005?
    • Not backward Compatible with 1/2/4Gbps
    • But 10 Gig Ethernet will compete
    • Remember FDDI & ATM
31
Why I/O Protocols are Coming to IP
  • IP Networking is ubiquitous
  • Gigabit ethernet is here
    • 10Gbps ethernet is just becoming available
  • Don’t have to invest in a second network
    • Just upgrade the one you have J
  • IP & Ethernet software is well understood
    • Existing talent pool for vendors to leverage
      • Developers, not end-user Network Engineers
32
FibreChannel – Distance
  • 1063 Mbps
    • 175m (62.5 um – multi-mode)
    • 500m (50.0 um – multi-mode)
    • 10 km (9 um – single-mode)
  • 2125 Mbps
    • 500m (50.0 um – multi-mode)
    • 2 km (9 um – single-mode)
33
FibreChannel – A Network
  • Layer 1 – Physical (Media: fiber, copper)
    • Fibre: 62.5, 50.0, & 9.0 um
    • Copper: Cat6, Twinax, Coax, other
  • Layer 2 – Data Link (Network Interface & MAC)
    • WWPN: World Wide Port Name
    • WWNN: World Wide Node Name
      • In a single port node, usually WWPN = WWNN
    • 64-bit device address
    • Comparable to 48-bit Ethernet device addresses
  • Layer 3 – Network (IP & SCSI)
    • 24-bit fabric address
    • Comparable to an IP address
34
FibreChannel Terminology: Port Types
  • N_Port
      • Node port – Computer, Disk, or Storage Node
  • F_Port
      • Fabric port – Found only on a Switch
  • E_Port
      • Expansion Port – Switch to Switch port
  • NL_Port
      • Node port with Arbitrated Loop Capabilities
  • FL_Port
      • Fabric port with Arbitrated Loop Capabilities
  • G_Port
      • Generic Switch Port: Can act as any of F_Port, E_Port, or FL_Port
35
 
36
FibreChannel - Topology
  • Point-to-Point
  • Arbitrated Loop
  • Fabric
37
FibreChannel – Point-to-point
  • Direct Connection of Server and Storage Node
  • Two N_Ports and One Link
38
FibreChannel - Arbitrated Loop
  • Up to 126 Devices in a Loop via NL_Ports
  • Token-access, Polled Environment (like FDDI)
  • Wait For Access Increases with Device Count
39
FibreChannel - Fabric
  • Arbitrary Topology
  • Requires At Least One Switch
  • Up to 15 million ports can be concurrently logged in with the 24-bit address ID.
  • Dedicated Circuits between Servers & Storage
    • via Switches
  • Interoperability Issues Increase With Scale
40
FibreChannel – Device Count
  • 126 devices in Arbitrated Loop
  • 15 Million in a fabric (24-bit addresses)
    • Bit 0-7: Port or Arbitrated Loop addr
    • Bit 8-15: Area, identifies FL_Port
    • Bit 16-23: Domain, address of switch
      239 of 256 address available
    • 256 x 256 x 239 = 15,663,104
41
FibreChannel Definitions
  • WWPN
  • Zone & Zoning
  • LUN
  • LUN Masking
42
FibreChannel - WWPN
  • World-Wide Port Number
  • A unique 64-bit hardware address for each FibreChannel Device
  • Analogous to a 48-bit ethernet hardware address
  • WWNN - World-Wide Node Number
43
FibreChannel – Zone & Zoning
  • Switch-Based Access Control
  • Analogous to an Ethernet Broadcast Domain
  • Soft Zone
    • Zoning based on WWPN of Nodes Connected
    • Preferred
  • Hard Zone
    • Zoning Based on Port Number on Switch
      • to which the Nodes are Connected
44
FibreChannel - LUN
  • Logical Unit
  • Storage Node Allocates Storage and Assigns a LUN
  • Appears to the server as a unique device (disk)
45
FibreChannel – LUN Masking
  • Storage Node Based Access Control List (ACL)
  • LUNs and Visible Server Connections (WWPN) are allowed to see each other thru the ACL.
  • LUNs are Masked from Servers not in the ACL
46
LUN Security
  • Host Software
  • HBA-based
    • firmware or driver configuration
  • Zoning
  • LUN Masking
47
LUN Security
  • Host-based & HBA
    • Both these methods rely on correct security implemented at the edges
    • Most difficult to manage due to large numbers and types of servers
    • Storage Managers may not be Server Managers
    • Don’t trust the consumer to manage resources
      • Trusting the fox to guard the hen house
48
LUN Security
  • Zoning
    • An access control list
    • Establishes a conduit
      • A circuit will be constructed thru this
    • Allows only selected Servers see a Storage Node
    • Lessons learned
      • Implement in parallel with LUN Masking
      • Segregate OS types into different Zones
      • Always Promptly Remove Entries For Retired Servers
49
LUN Security
  • LUN Masking
    • The Storage Node’s Access Control List
      • Sees the Server’s WWPN
      • Masks all LUNs not allocated to that server
      • Allows the Server to see only its assigned LUNs
    • Implement in parallel with Fabric Zoning
50
LUN - Persistent Binding
  • Persistent Binding of LUNs to Server Device IDs
  • Permanently assign a System SCSI ID to a LUN.
  • Ensures the Device ID Remains Consistent Across Reconfiguration Reboots
  • Different HBAs use different binding methods & syntax
  • Tape Drive Device Changes have been a repeated source of NetBackup Media Server Failure
51
SAN Performance
  • Storage Configuration
  • Fabric Configuration
  • Server Configuration
52
SAN - Storage Configuration
  • More Spindles are Better
  • Faster Disks are Better
  • RAID 1+0 vs. RAID 5
    • “RAID 5 performs poorly compared to RAID 0+1 when both are implemented with software RAID”
      Allan Packer, Sun Microsystems, 2002
    • Where does RAID 5 underperform RAID 1+0?
      • Random Write
  • Limit Partition Numbers Within RAIDsets
53
SAN - Fabric Configuration
    • Common Switch for Server & Storage
      • Multiple “hops” reduce performance
      • Increases Reliability
    • Large Port-count switches
      • 32 ports or more
      • 16 port switches create larger fabrics simply to carry its own overhead
54
SAN - Server Configuration
  • Choose The Highest Performance HBA Available
    • PCI: 64-bit is better than 32-bit
    • PCI: 66 MHz is better than 33 MHz
  • Place in the Highest Performance Slot
    • Choose the widest, fastest slot in the system
    • Choose an Underutilized Controller
  • Size LUNs by RAIDset disk size
    • BAD: LUN sizes smaller than underlying disk size
55
SAN Resilience
  • At Least Two Fabrics
  • Dual Path Server Connections
    • Each Server N_Port is Connected to a Different Fabric
    • Circuit Failover upon Switch Failure
  • Automatic Traffic Rerouting
  • Hot-Plugable Disks & Power Supplies
56
SAN Resilience – Dual Path
  • Multiple FibreChannel Ports within Server
  • Active/Passive Links
  • Most GPRD SAN disruptions have affected single-attached servers
57
SAN – Good Housekeeping
  • Stay Current With OS Drivers & HBA Firmware
  • Before You Buy a Server’s HBA
    • Is it supported by the switch & storage vendors?
  • Coordinate Firmware Upgrades
    • Storage & Other Server Admin Teams Using SAN
  • Monitor Disk I/O Statistics
    • Be Proactive; Identify and Eliminate I/O Problems
58
SAN Backups – Why We Should
  • Why We Should
    • Offload Front-end IP Network
    • Most Servers are still connected to 100baseT IP
    • 1 or 2 Gbps FC Links Increase Thruput
    • Shrink Backup Times
  • Why We Don’t
    • Cost
      • NetBackup Media Server License: starts at $5K list
59
Backup Futures
  • Incremental Backups
    • No longer stored on tape
    • Use “near-line” cheap disk arrays
      • Several vendors are under current evaluation
  • Still over IP
    • 1 Gbps ethernet is commonly available on new servers
    • 10 Gbps ethernet needed in core
60
Questions