The 60th anniversary of IBM's digital tape is coming up in May. Oh yeah, and tape is dead. Or so industry pundits have declared, echoing similar prognostications for the mainframe.
But in reality, tape has a long life ahead of it. At 60, in many ways, it's just getting started.
That's because, unlike the mainframe, tape's role in the enterprise is dramatically changing. Only a few years ago, with the emergence of cheap, high-capacity disk drives, many pundits thought tape would be relegated to the dusty storerooms of long-term data archive. Gone were the days when tape was used for primary backup and recovery or streaming media.
But, with the performance of next-generation tape drives hitting 525MB/sec. -- and at a price of around $25 per terabyte of capacity -- tape is too fast and too cheap to write off. New open file formats are also making it possible to use tape in new markets.
IBM's first magnetic tape device for digital storage, the 7-track tape, was introduced in 1952. The IBM 726 tape was about the size of a pizza and held 2.3MB of data with a transfer rate of about 7.5KB/sec. That's about enough to store a minute and a half of a song on your smartphone.
IBM arrived in the tape market a year after the first magnetic tape was introduced. It was used to store data from the Eckert-Mauchly UNIVAC I, the enormous piece of equipment that was the first commercial computer in the U.S. That tape reel held just 224KB of data.
IBM's 726 tape unit was released in 1952. Each tape held 2.3MB of data.
Tape rules the wallet
Today, an 800GB LTO-4 tape cartridge (1.6TB with compressed data) sells for as little as $22. In comparison, the lowest price of a 1TB 7200-rpm, 3.5-in. SATA hard drive is about $104 and a 1TB 2.5-in. hard drive costs about $128 on the low end.
So it's easy to see that tape cartridges sell for roughly one-fifth the cost of spinning disk. Multiply that by thousands of tapes and petabytes or exabytes of corporate legacy data, and the cost savings can be monumental.
Any cost comparison also has to take into account the fact that an enterprise might need just one tape library for backup and archive. That compares with the expense of running rack upon rack of spinning disk storage arrays.
The Ultrium Linear Tape Open (LTO) specification, by far the most widely used tape spec in the industry, has a road map that takes tape out to 32TB per cartridge and up to 1.2GB/sec. throughput. "We've done a public demonstration of 29.5Gbits of data in a square inch of tape," said Brian Truskowski, IBM's general manager of system storage and networking. "We see a lot of headroom in terms of areal density."
In comparison, Seagate recently announced it had achieved a density of 1 terabit (1 trillion bits) per square inch on a disk drive platter. That breakthrough should lead to 20TB laptop drives within the decade.
The Ultrium LTO tape drive road map.
LTFS and LTO-5
Today, two major advances -- LT0-5 and the Linear Tape File System (LTFS) -- are allowing tape to handle new applications, such as cloud storage, Big Data and streaming media.
"A lot of people joke that you don't hear the words tape and excitement in the same sentence, but LTFS is one reason you do now," said Truskowski. "The point is, archive data is becoming more important to clients, as is the ability to keep that data in a near-line environment."
The LTFS specification and file system was released in 2010. It's supported by major tape vendors, including IBM, HP, Quantum and Oracle, as well as the LTO Consortium. Oracle has integrated its T10KC enterprise tape drives with LTFS.
The LTO-5 format was introduced last year. It offers 1.5TB of uncompressed data (3TB compressed) and, when combined with LTFS, allows users to access files on tape drives as easily as if they were on a USB flash drive or an external hard disk drive.
LTO-5, like every Linear Tape Open generation before it, offers twice the capacity and double the data transfer rates of its predecessor. LTO-5 tape drives can stream data at up to 140MB/sec. native and 280MB/sec. compressed. And, with LTO-6 due out in this year, those data rates will be moving to a maximum of 525MB/sec. and the capacity point to 8TB.
An LTO-5 tape drive is smaller than a breadbox. Each tape holds up to 3TB of data.
Like LTO-4, LTO-5 offers AES 256-bit hardware-based encryption, and write once, read many (WORM) functionality. Unlike its predecessor, LTO-5 offers dual partitioning for faster data access and improved data management.
LTFS itself is a file system with a POSIX interface that applications such as File Explorer can access. A user can then add a network-attached storage stack (e.g. NFS and/or CIFS) on top of LTFS, allowing seamless access to files from any desktop. LTFS is enabled by the dual partitioning capability of LTO-5.
For example, Partition 0 would hold the tape's content index, which can be more quickly accessed. The second partition, Partition 1, holds the content of the tape.
The partitions allow users to view that data without having to read through an entire tape. Once the desired data is located in the index, a simple copy command can be used to move the data from the tape to, for instance, a disk drive.
"At the end of the day, the benefits are that you have the ability to store data on a tape cartridge and you can retrieve that data without any unique host system software or application," said Robert Amatruda, an IDC analyst specializing in data protection and recovery.
Film and broadcast industries love LTFS
Mark Lemmons, CTO of Thought Equity Motion , said that when television broadcasters and motion picture companies went to disk storage from standardized video tape for media, they lost seamless, global interoperability with no overhead.
"There's no such thing as a film that gets created without 20 companies being involved. There's no such thing as broadcaster that doesn't distribute to 40 or 50, or 400 or 500 broadcast outlets. Seamless interoperability was baked into the business and we threw it out like the trash," he said.
Thought Equity, a cloud-based storage service for master-quality video, stores stock content for Paramount Pictures, Sony Pictures Entertainment, National Geographic, The New York Times and the NCAA. It recently changed over to an LTO-5 tape library using LTFS to handle more than 10 petabytes of data. The company estimates that data will soon exceed 50 petabytes.
Disk is not something the broadcast industry or film industry can even consider for storing media for longer than 10 years, Lemmons said, because it doesn't have the retrieval attributes the video media business needs.
"Over the last two years, disk drives have gotten bigger, they've gone from 1TB to 3TB, but they haven't gotten faster," he said. "They're more like tape. Meanwhile, tape is going the other direction, it's getting faster."
LTFS was a critical upgrade for Thought Equity, Lemmons said, because the company needs to ensure that its clients will be able to access video files no matter type of IT infrastructure they have.
The LTO tape had never been an appropriate medium for media in any big way, Lemmons said, because it was built for large banks or other corporations performing backups that were stored for catastrophic data recovery scenarios -- not for ubiquitous access and video.
"Historically, tape was complex enough just to get working, and the IT software layer on top of it was not sharable," he said. "If Client X gives me a petabyte of data, then it's on my system, on my tape and with my software interpreting it. If I was to take that tape and ship it to them, it would be a paperweight unless they had the same IT stack."
"It's IBM Tivoli, Oracle SAM-FS... it's the HSM layer or the file system layer. It's a very expensive, proprietary layer, and it makes it impossible to share at the tape level," Lemmons said.
LTFS allows any file system to access the data, so the backup software used to store it becomes irrelevant.
Lemmons can write a video file to a tape; the tape then shows up on any desktop, such as a Mac, Windows or Linux machine, and it presents itself just as if it were a hard drive volume.
"I can drag and drop the file, write it to the tape at essentially the same speed as a SATA drive, and ship it around the world just like a Digibeta or HDcam tape, and without having to have the same level of infrastructure that would be required for a front porch integration. I don't have to have anything but a little piece of open-source software, thanks to IBM and HP and Oracle and others in this LTFS initiative," he said.
Cloud and Big Data to drive tape adoption
Along with streaming media, Big Data and the cloud have opened new markets for tape storage.
Both private and public cloud infrastructures require massive amounts of data to be available in a near-line fashion. Depending on the service-level agreement, cloud providers might offer a tiered storage infrastructure where data that needs to be accessed quickly and easily is stored on solid-state drives and data that doesn't have to be immediately available is kept on disk drives or tape drives, with the latter offering the least expensive option yet delivering "good enough" performance for storage of large files.
Tape also natively offers greater security in a multitenant cloud environment.
When disks are used for cloud storage, disk drive arrays use deduplication and thin-provisioning to compress data and reduce capacity requirements. RAID is also used to break up and spread data at the block level across disks for data resiliency. Metadata mapping tables are required to find the data across massive disk arrays in a cloud environment.
Additional software is also required to ensure that any given customer's data is securely isolated from every other user on a given disk or array.
In LTO tape environments, however, each tape cartridge is a separate object. The customer or cloud provider has control over what's on each one. Tape libraries can also be partitioned, offering many virtual libraries to a cloud customer while denying any inter-accessibility.
Tape is also positioned to play a key role in the world of Big Data.
Big Data -- an all-encompassing term referring any kind of data, structured or unstructured, that an entity stores -- has sparked the use of distributed computing software such as Map Reduce and batch data analytics tools to extrapolate business information that can be used for marketing, sales and other business operations.
The lion's share of big data resides in unstructured file formats, such as email documents, computer logs, Internet search data, seismic data, business informatics, music, videos and photos.
Currently, the digital universe (all digitally stored data worldwide) is made up of 1.8 trillion gigabytes stored in 500 quadrillion files. Over the next four years, the amount of file-based data will grow by a factor of eight, according to IDC's 2011 Digital Universe Study.
In order to access corporate archives, Map Reduce applications such as Apache Hadoop need access to vast data stores, and tape libraries with petabytes and even exabytes of capacity fill that role perfectly, said IDC's Amatruda.
"That's where you'll see more of the tools in analytics being wrapped around tape," Amatruda said. "That's the next phase of integration and investment: the ability to sort that data effectively and mine it."
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian , or subscribe to Lucas's RSS feed . His email address is firstname.lastname@example.org .
Read more about storage in Computerworld's Storage Topic Center.