Permabit’s Albireo Data deduplication technology is now supporting enterprise level flash technologies. By deduplicating data before it ever hits the flash, Permabit is claiming it can reduce the cost of enterprise flash storage by 5x to 35x.
Not because the enterprise solid state device is somehow cheaper thanks to Albireo, but because you don’t need as much flash to hold an equivalent amount of data. It’s like making your current storage bigger.
By examining blocks of data and hashing them, it can generate a “fingerprint” of the block. Future blocks that may share the same fingerprint are not written to the flash, but rather, pointed to an identical reference block instead. In this way, the amount of space needed to hold the data can be greatly reduced.
Albireo’s deduplication technology — so called because it eliminates duplicate data — can work with all manner of storage. But with the current trends of using flash to cache/accelerate spinning disks and new types of all-solid-state storage, supporting these markets is a must.
This is similar to the way several data compression technologies work. From zip files to SandForce SSDs, reducing the size of data without losing any data itself can pay big dividends. That benefit can be both in terms of lowering requisite storage space and increasing performance. And at an enterprise level, every GB that doesn’t need to be written can help lower the total cost of the storage system. Less disks means less cost, less power to operate, and less waste heat that needs to be cooled.
Part of Albireo’s dedupe magic is the indexing engine. Albireo’s index is remarkably efficient with system resources, requiring 0.1 bytes of system RAM per index. With only 1GB of RAM, Albireo can manage the deduplication needs of 40TB of data using 4K blocks. Using smaller blocks can help reduce the amount of data written to disk, but requires more resources. Stepping up to 128K blocks means the system can dedupe 1,280TB with just 1GB of RAM. With 16GB of RAM, Albireo can manage an astronomical 20.5 petabytes. That’s 20,480 1TB HDDs worth of data with only 16GB of RAM when using a 128K block size.
One modern CPU core is enough to handle 250,000 IOPS, so when combined with the low RAM requirements, it seems as though Albireo doesn’t need all that many system resources to run effectively. But if it can lower the operational and capital expenditures required to maintain petabytes of data, no one will mind giving up a few GB of RAM and a few CPU cores.