Purveyors of cloud storage services may be doing their customers, or themselves, a disservice by relying on imprecise metrics for billing, argued a researcher at a Usenix conference.
"Disk time is what costs, not I/Os or bytes, and that is what should be the metric in cloud storage systems," said Matthew Wachs, a researcher at Carnegie Mellon University, in a talk at the Usenix HotCloud workshop this week in Portland, Oregon.
Wachs, along with other researchers at Carnegie Mellon and VMware, investigated the topic in their Usenix paper, "Exertion-based billing for cloud storage access."
"Cloud storage access billing should be exertion-based, charging tenants for the costs actually induced by their I/O activities rather than an inaccurate proxy (e.g., byte or I/O count) for those costs," the paper said.
Today, IaaS (Infrastructure-as-a-Service) cloud storage providers such as Amazon or Google typically bill on two factors, the amount of data being stored and the amount of data that is transferred to and from the cloud, or I/O.
While charging based on the amount of data stored is a reasonable metric, Wachs contended, the amount charged for I/O is flawed, given the work expended to read that data from disk or write that data to disk. The cost of handling those bits on disk may vary widely from one instance to another, Wachs pointed out.
"As a result, tenant bills for storage access may bear little to no relationship to the actual costs," the paper said.
Wachs mentioned a number of factors that can lead to this variance, the most prominent being the difference between random and sequential access on the disk.
In sequential access, data is written to or read from one portion of the disk in a continual stream of bits. In random access, the disk head must jump around to different parts of the disk to read or write data.
The difference between these two types of workloads can be immense, Wachs said.
For instance, sequential access can achieve a throughput on an average disk of up to 63.5MB/s (megabits per second), whereas random access can only be executed at 1.5MB/s.
In practical terms, this disparity means that one customer executing lots of random reads and writes is using a lot more of the system's resources than another customer who may be accessing the same amount of data through sequential accesses, even though both customers are charged the same amount.
In the long run, this practice would provide no incentive for customers to establish more efficient data transfer practices, and fiscally penalize those customers who do have such practices in place. It could also erode the profit margins of storage providers, who may not have accounted for these inefficiencies in their original plans.
Other factors may heighten this disparity between workloads even further, Wachs said. For instance, disk caching may eliminate the need to access the disk at all. In cases where caching is used, the customer may actually be severely overcharged. Also, excessive metadata lookups to find the appropriate data location may consume an inordinate amount of resources.
"This is an unsustainable approach because either the client or the provider will be unhappy," Wachs said. "The clients with the easy requests will pay too much and the clients with the difficult requests will pay too little."
Wachs suggested an alternative billing mechanism, one based on disk time, or the amount of time it actually takes the disk to read or write the material.
"When we charge for disk time, and chose a rate for disk time that matches the cost for the provider, the costs are being recovered fairly," Wachs said.
Attendees brought up various issues with this approach. One noted that clients may be willing to pay a bit more overall to get a more predictable and easily understandable bill, mentioning as an example how the cellular phone industry charges on a simple flat rate and per-minute basis rather than how much the actual cell phone towers are used by each customer.
Wachs countered that the disparity between the costs of running a cloud service and what is being charged can be a significant difference, and not just a subtle averaging of the costs.
Cell phone customers probably "aren't losing sleep over whether they are paying $40 a month instead of $30 a month," he said. Businesses that are paying $40 million a month rather than $30 million a month, on the other hand, may want "the accounting and pricing to be a lot closer to the actual cost," he said.
Andrew Warfield, the session chairman for the economics track that Wachs' talk was part of, noted that the presentation was one aspect of a larger challenge now being faced by cloud providers, namely the task of examining current operational practices in a deeper, more complex way in order to offer simpler, less expensive services to their customers.
Existing cloud storage metrics "are appealing from a customer-facing standpoint as the right way to market the whole system," Wachs said. "But you need to have something in the long term that will actually match the cost for the provider," he said.