<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Will the real data please stand up? A look at deduplication in the online backup world</title>
	<atom:link href="http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/</link>
	<description></description>
	<lastBuildDate>Wed, 25 Aug 2010 22:05:21 -0400</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Sekar Vembu</title>
		<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/comment-page-1/#comment-13546</link>
		<dc:creator>Sekar Vembu</dc:creator>
		<pubDate>Wed, 06 May 2009 06:11:33 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.vembu.com/?p=191#comment-13546</guid>
		<description>It is still early days and hence no time frame has been set. It might be at least another 6 months before we support full fledged de-duplication.

Sekar.</description>
		<content:encoded><![CDATA[<p>It is still early days and hence no time frame has been set. It might be at least another 6 months before we support full fledged de-duplication.</p>
<p>Sekar.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nauman Vawda</title>
		<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/comment-page-1/#comment-13465</link>
		<dc:creator>Nauman Vawda</dc:creator>
		<pubDate>Mon, 04 May 2009 04:15:27 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.vembu.com/?p=191#comment-13465</guid>
		<description>Where is this technology today in the roadmap?</description>
		<content:encoded><![CDATA[<p>Where is this technology today in the roadmap?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sekar Vembu</title>
		<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/comment-page-1/#comment-11695</link>
		<dc:creator>Sekar Vembu</dc:creator>
		<pubDate>Tue, 17 Feb 2009 07:30:16 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.vembu.com/?p=191#comment-11695</guid>
		<description>Jaspreet,
I am talking about deduping across multiple systems in the same organization. So it may have some effect of minimizing the amount of data stored.

I never said source based dedup is only theory or that it is only about checksum matching. My focus is on a much higher level. All methods are based on some kind of matching whether you call it checksum or signature or by some other name. Again my focus is not to get into details of fixed block length comparison or variable block length comparison etc. Those are details to make it more and more optimized and efficient.

What I said was that I am not fond of the source based dedup implementation because instinctively I felt what is good in theory may not be as good in practice. I may be wrong about that. When we implement dedup in StoreGrid we will consider all options and do what we think is practical and what works with least amount of headache. 

The blog post is just to discuss some options at a high level rather than to get into details of  the implementation.

Sekar</description>
		<content:encoded><![CDATA[<p>Jaspreet,<br />
I am talking about deduping across multiple systems in the same organization. So it may have some effect of minimizing the amount of data stored.</p>
<p>I never said source based dedup is only theory or that it is only about checksum matching. My focus is on a much higher level. All methods are based on some kind of matching whether you call it checksum or signature or by some other name. Again my focus is not to get into details of fixed block length comparison or variable block length comparison etc. Those are details to make it more and more optimized and efficient.</p>
<p>What I said was that I am not fond of the source based dedup implementation because instinctively I felt what is good in theory may not be as good in practice. I may be wrong about that. When we implement dedup in StoreGrid we will consider all options and do what we think is practical and what works with least amount of headache. </p>
<p>The blog post is just to discuss some options at a high level rather than to get into details of  the implementation.</p>
<p>Sekar</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jaspreet</title>
		<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/comment-page-1/#comment-11694</link>
		<dc:creator>Jaspreet</dc:creator>
		<pubDate>Tue, 17 Feb 2009 07:27:21 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.vembu.com/?p=191#comment-11694</guid>
		<description>Sekar,

I agree, data deduplication is not a suit all technology. The type of data your customers deal with, it may or may-not be good for deduplication. You would have to check if the practical results are better than simple zip.

But, the source based deduplication for catching duplicates across the clients is not theory. Its been put to practice by some products. Avarmar, I checked recently also does it. But pure-disk is pure crap, you won’t find a single happy customer of pure-disk.

Deduplication now is much more than block checksum matching. 

Various technology changes and optimizations have actually helped (in some cases) deliver 99% better bandwidth and storage utilization than traditional backups.

But, first you would have to check that how many duplicates does you data contain.

Second, ensure that the deduplicated data can be 100% recovered without any chance of failure, else the point of redundancy is gone.

Jaspreet</description>
		<content:encoded><![CDATA[<p>Sekar,</p>
<p>I agree, data deduplication is not a suit all technology. The type of data your customers deal with, it may or may-not be good for deduplication. You would have to check if the practical results are better than simple zip.</p>
<p>But, the source based deduplication for catching duplicates across the clients is not theory. Its been put to practice by some products. Avarmar, I checked recently also does it. But pure-disk is pure crap, you won’t find a single happy customer of pure-disk.</p>
<p>Deduplication now is much more than block checksum matching. </p>
<p>Various technology changes and optimizations have actually helped (in some cases) deliver 99% better bandwidth and storage utilization than traditional backups.</p>
<p>But, first you would have to check that how many duplicates does you data contain.</p>
<p>Second, ensure that the deduplicated data can be 100% recovered without any chance of failure, else the point of redundancy is gone.</p>
<p>Jaspreet</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Web Executive</title>
		<link>http://blogs.vembu.com/2009/02/will-the-real-data-please-stand-up-a-look-at-deduplication-in-the-online-backup-world/comment-page-1/#comment-11675</link>
		<dc:creator>Web Executive</dc:creator>
		<pubDate>Sun, 15 Feb 2009 22:18:52 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.vembu.com/?p=191#comment-11675</guid>
		<description>Remote Backup Software is quickly becoming one of the most important pieces of software package needed by the nearly 200 million employees who operate remotely from their desks. Why is this the case? First over a half a million laptops are misplaced at US airports every year.</description>
		<content:encoded><![CDATA[<p>Remote Backup Software is quickly becoming one of the most important pieces of software package needed by the nearly 200 million employees who operate remotely from their desks. Why is this the case? First over a half a million laptops are misplaced at US airports every year.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
