Author Archive

On Vembu & VC Investment

Friday, February 19th, 2010

I keep getting emails from VC firms at regular intervals. I have had initial phone calls with many of them. But invariably there is no progress as we just do not act on raising money on our own because I fundamentally cannot get myself to pitch my business to a VC just to raise money. The reason is that I am uncomfortable doing a business plan on how I will scale the company.  Because until we try something out we will never know whether it is going to work or not. It is always continuous experiments you run and figure out ways to grow and scale. I am kind of tired trying to be polite and diplomatic with VCs, i.e. responding to their emails and taking their first call and then not taking any initiative in raising money. Couple of days ago when someone was persistent about having a call after I turned down a request for a call, I sent the following response. I want to post that response publicly and I am going to point all VCs who contact me to this post from now on.

“I don’t want to sound arrogant. It is not lack of time. I am pretty jobless trying to figure out ways to scale our company trying various new things. The problem is the serious lack of interest in pitching my company to investors. I have spoken to so many VCs on the phone. It’s always the same. I refuse to do a business plan projecting how we can scale. It is like an experiment we are running and it is against my personal nature to pitch my plans to investors – just to raise money – as something that will work without fail. VCs don’t understand my perspective and I can’t blame them as they have to justify their investments to their LPs. I cannot change my nature and personality just to raise money.  If anyone is interested in my company I prefer a one on one meeting. But I insist that I will not give a business plan nor I will pitch my company to raise money. The investment has to come because they instinctively trust me and have a somewhat religious belief that I will at least give their money back if not grow it by 10 times. That is the understanding with which our angel investors have invested in us, by the way. One of them is a VC and he thinks personally he has no problem with my style but as a VC he cannot convince his other partners. My yard stick for success is different from the pure professional investors.”

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Individual innovations will lift us out of this economic calamity – not government doles

Thursday, March 5th, 2009

While searching randomly on the web I came across my own interview by a blogger on Indian startups. That interview was done little more than a year ago. A couple of things I had said was pertinent to the post “Where Are the Innovators When You Need Them?” by Joe Panettieri of MSPmentor.

Here is the link to my interview: http://desistartups.wordpress.com/2007/12/19/interview-with-vembu-by-prabhu/

To quote myself from the interview:

“The truth is that I am not really an early adopter of anything….. Also, I would like to call myself more of an incremental innovator with a lot of common sense. So I am not sure if I can come up with a completely new world-changing idea or something like that…..I felt I could really contribute with my incrementally innovative ideas and…I felt I could get a product out and start generating revenues to support a boot-strapping model”.

To quote Joe Panettieri from the MSPmentor article:

“Throwing money at problems often isn’t the solution. True innovation — from large, nimble companies and small, hungry start-ups — will pull the US and other countries out of this economic mess. Led by innovators, the turnaround is coming. I just wish I knew when.”

And I had commented the following in response to Joe Panettieri:

“Considering all the excesses in the US economy driven by foreign debt in the last decade or so and considering all the excesses in the form of bailouts after bailouts, the only way for the US to come out of this crisis intact and as strong as ever is through some breakthrough inventions or innovations. One should not forget US achieved its status as a superpower through this process of innovation in the last century.

These innovations have to come from some field which will fundamentally improve human productivity by leaps and bounds. If US fails to deliver on this front in the next decade or so, then it is inevitable that the next generation of Americans will pay a heavy price for the excesses of this generation.

It is worth remembering the fundamental law of economics that “there is no such thing as a free lunch”.

“I want to clarify something about my previous comment so that there is no scope for misinterpretation. When I said US has to deliver on some breakthrough innovation, I actually mean that such innovations have to come from individual initiatives from people like us – who are pushed against the wall. I did not mean that it has to come from some centralized government initiative through some mega-government planning. I thought it is important to clarify this as these days lots of very smart people have started thinking as if the government has to do something.”

“If at all some true innovation happens because of a government initiative it will be more of an accident. True innovations happen through the efforts of millions of individuals – who innovate either in order to survive or because of some innate human curiosity. All that the government has to do is to step out of way and make sure these individuals have the fullest freedom to pursue their dreams and also not interfere in the way these true innovators are rewarded for their efforts.”

I am writing about this in my blog here to drive home the point that no individual starts with a world changing idea or world changing ideas do not present themselves to some gifted individuals. You always start with something which appear small and trivial. But when millions of individuals try millions of small and trivial innovations there is that one fundamental world changing idea that would emerge. And that one innovation would impact the whole of humanity for the better.

So all of us who are wondering about how we are going to get through this impending economic calamity, the time to take charge of our lives is now. Just take the plunge and contribute with your innovation however trivial it appears to be.

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

HP shutters its Upline online data backup service – Why commoditized online backup service is not a sustainable business

Sunday, March 1st, 2009

I was contacted by ChannelWeb to comment on HP’s decision to shutter their Upline online data backup business. The gist of what I commented was carried in the article “HP To Shutter Upline Online Storage Backup Service” by ChannelWeb’s Senior Editor Joseph F. Kovar. I felt it’s a good idea to post here my full comments along with my view about commodity online backup services like Carbonite and EMC’s Mozy.

Hope these are not perceived as just wishful thinking on my part. My comments are based on our experience supporting more than 1000 partners offering backup services to tens of thousands of SMB customers. Below is my unedited comment I sent to ChannelWeb.

On HP’s decisions to kill its Upline online storage service we are not very surprised by the decision. The reason is that we always believed that backup is not like Skype where you install it and it works. Backups by its very nature require monitoring, management and administration to ensure everything goes smoothly. So any large vendor who gets into online backup services thinking that you just sign up large number of customers and then everything can be put on auto-pilot is completely mistaken. That is the reason we never offered online backup services directly to end customers. Our business model is to partner with MSPs and VARs who already provide IT services to their SMB customers. These local MSPs and VARs, because of their proximity to their customers, are in the best position to offer backup services. Since they act as “Virtual CIOs” to their SMB clients they are in the best position to monitor and manage the backups along with everything related to IT in these SMB organizations.

With regard to consumers who backup to a brand name mega online backup service providers, we do not think that is a very profitable business because consumers view storage as a commodity. They do not appreciate the additional value delivered by good backup software and treat everything as just raw storage. Since backup requires monitoring and management the more consumers you sign up the more support you will have to deal with. This just cannot be sustained as consumers are willing to pay for only raw storage and not for the value the software brings. This is one reason HP would have felt it’s not worth their while to go after consumers nor after SMBs where it just cannot be put on auto-pilot. No wonder AOL shut down their XDrive business a few months ago.

Considering the above I strongly believe Carbonite may be under pressure notwithstanding the twenty plus million venture capital they have raised. With the meager amount they charge their customers for storage it is just not sustainable as the cost of offering good customer support can never be recovered.  Needless to say, in spite of Mozy’s brand recognition and EMC’s backing, Mozy may also struggle to scale their business profitably. It may be relevant to point out the blog post, “May be I am not so impressed with EMC“, which I wrote on EMC’s decision to spin off Mozy (Decho).

I also want to highlight another blog post by my colleague, Lux, some time ago: Carbonite and Mozy’s Achilles Heel.

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Will the real data please stand up? A look at deduplication in the online backup world

Wednesday, February 11th, 2009

Talk about data deduplication (in the backup and archiving domain) seems to be gaining a fair amount of momentum in the last few years! Most enterprise backup software vendors like Symantec (Veritas), EMC (Avamar) etc. support deduplication in some form or the other – some do deduplication in the source system (that is being backed up) and others do deduplication at the target (backup/storage server). There are also pure “deduplication based storage hardware vendors” like Data Domain who have gained considerable traction in the enterprise.

I am actually quite surprised by the hype around deduplication and the adoption it seems to have gained in the enterprise. The reason I am surprised is similar to the one I articulated in my previous blog post: “Synthetic Full Backup in the online backup world – Are we inviting trouble?“. The crux of my argument is that backup and archiving is about building redundancy to the data and not about eliminating redundancy in the name of efficiency of storage or network bandwidth. So it is my contention that wherever feasible we should have as much redundancy to the data (that needs backing up) and only under unavoidable circumstances should we resort to using synthetic full backup or deduplication. Actually, let me state this more strongly: “avoid falling for the synthetic full backup or deduplication hype if you can!”

But who am I to say this. I am neither an “industry expert” nor am I Steve Jobs to say “this is what is good for you; take it or leave it”. Given that we are a niche company trying to grow (and growing) in the face of industry giants, we are actually contemplating building deduplication support in our data backup software, StoreGrid. While not many of our customers/partners are asking for it, we do get the occasional prospect saying that deduplication (rather, the lack of it) is a show stopper feature for them!

As we started thinking about and designing the best way to support deduplication in StoreGrid, we encountered many options to consider and many complexities to be handled. But at the end, we were left with a fundamental question – whether a full-fledged deduplication is indeed possible in the online backup world! Before I explain some of the options and the complexities, and why we think a full-fledged de-duplication may not be feasible in a pure online backup scenario, let me first get into a broad overview of the two deduplication approaches…

Deduplication at the source (client) vs. at the target (backup server) : There are vendors who claim they do the deduplication at the source (i.e. the client system that is being backed up) as opposed to others who claim that they do deduplication at the target (i.e. at the backup server). If deduplication is done at the source then it is easy to deduplicate data at a block level across all files within the source system. If deduplication is done at the target then it is equally easy to deduplicate data at a block level across all files across all the client systems backing up to the backup server. Quite obviously doing deduplication across all files across all clients will be much more effective than doing deduplication only at a client system level. It is theoretically possible to do deduplication at the source system and still be able to deduplicate across all systems backing up to the backup server. In this case, each client (source) has to continuously update itself with the meta-data of the blocks that are being stored in the backup server. The meta-data in this case would simply be the checksums of the blocks. These checksums are looked up to identify similar blocks of data. I have not personally tested such a product myself – i.e. the ones doing deduplication at the source system and still being able to deduplicate across all systems backing up to the backup server. But this may not be as efficient in terms of performance as compared to doing the deduplication at the backup server end, especially if the backup/storage server resides at a remote data center (and the meta-data needs to be downloaded each time from the remote server).

Armed with this background, lets dive deeper into the implications of these ‘approaches’ in the online backup context…

Option 1: Deduplication at target
One of the most important requirements in the online backup domain is that the data that is backed up is encrypted before the data leaves the source system and is sent over the internet to the remote data center (where the data is stored). Deduplication works by finding similar blocks across all the files and physically storing only one copy of the block in the storage system. And encryption works by destroying all patterns in a given data and making the data random. Because of the way encryption eliminates all patterns, trying to do deduplication on a set of encrypted files will have no effect – i.e. finding similar blocks of data across encrypted data will not be of much use as encryption would have eliminated all patterns. That means doing deduplication at the remote storage end, where all the data from different clients systems are encrypted and stored, is technically not possible. The option of not encrypting the data that is being backed up to the remote data center is not really an option in the online backup world.  Another point to note is that deduplication at target doesn’t really help much in the case of an online backup scenario – clients still send all data across and hence don’t save anything on bandwidth! Of course, you save on ’server side storage’ but optimizing this, I’d assume, comes a distant second to optimizing bandwidth utilization – for online backups!

Option 2: Deduplication at source – with a common encryption key
As I said before it is theoretically possible to do deduplication at source and still be able to deduplicate across all client systems in an organization. In order to do that, either the data should not be encrypted during backup or all the client systems will have to use a common encryption key to encrypt the data. Not encrypting the data is not really an option with online backups. Using a common encryption key would mean that for each block of data that is backed up the checksum signature of the unencrypted block is also sent to the backup server where it is stored. Every client that is backed up should look up this database of checksums stored in the backup server before sending a block of data to the backup server. Though this can be done efficiently, I am not really fond of this option, because of the performance penalty, considering that the backup server is at a remote location in the case of online backups.

Option 3: Deduplication at local target backup server – with offsite replication
The only practical option I can think of is to have a deployment model where all clients in an organization backup to a local backup server – without encryption. The backed up data is deduplicated at the local backup server and then encrypted and sent to a remote backup or replication server. This deployment model will ensure that the deduplication is done on data from across all clients backing up to the local backup server.  Depending upon a customer’s preference, the local backup server can either keep a copy of the deduplicated backed up data (for quicker restores) or the backed up data at the local backup server can be purged (not recommended) once the data is moved to the remote backup/replication server.

In summary, we prefer the last approach, viz. doing the deduplication at the target backup server which is deployed locally at the site where clients systems are. This would allow the client to backup to the local backup server without encrypting the data – thus facilitating  deduplication at the target. And for offsite storage, the data from the local backup server would be deduplicated, encrypted and sent to the remote backup or replication server.  This would also ensure that the benefits of bandwidth savings associated with deduplication are also achieved.

I look forward to feedback & suggestions on other ‘better’ ways of implementing deduplication in the online backup domain!

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Synthetic Full Backup in the online backup world – Are we inviting trouble?

Wednesday, January 28th, 2009

We have been noticing an increase in the number of ‘prospective partners’ asking if StoreGrid supports the synthetic full backup feature. StoreGrid does not yet support this feature as we have always given a low priority to this feature in the past. But now that it is being frequently asked for, we have started implementing it and hope to have this feature in the next few months. Though we would always like to give our partners as much choice and flexibility while using StoreGrid for their online backup services business, this particular feature has been haunting me for sometime. I feel using synthetic full backup is a double edged sword; it may come to haunt you when things go wrong. In fact, some of our partners actually told us they would not use this feature at all because of the additional risks it introduces! Let me clarify some of these viewpoints and try to put all the pros and cons of the synthetic full backup feature on the table.

What is Synthetic Full Backup, anyway? Synthetic Full Backup is a way to create a new full backup without actually doing a full backup. The way it is done is by combining a previous full backup and the subsequent differential/incremental backups to “synthesize” a new full backup. Note that all of these are done at the backup server and hence it does not involve actual transfer of data from the clients to the backup server. Here is a definition of synthetic full backup on the web.

The advantage of using a synthetic full backup is that the client systems (the production servers and the user desktops/laptops) do not have to do a complete full backup periodically. This would reduce the load on client systems and the time taken for periodic full backups quite significantly. This is especially much more attractive in the online backup world because synthetic full backups eliminate the need to transfer large amount of data (involved during full backups) over the internet every time a full backup needs to be done. So far so good! So why not implement this right away considering that the advantages are so obvious. Hold your horses…

During synthetic full backup, the process of “synthesizing” a full backup is done at the backup server end. In order to “combine” a previous full backup with subsequent incremental/differential backups, the backup server should have access to the encryption key used to encrypt the backup data. Note that in the online backup world the encryption is done at the client end (the production servers and the users’ desktop/laptops). One of the most debated topics in online backup is about the security of the data that is backed up – will the service providers have access to the backed up data of their customers? Almost all online backup solutions, including StoreGrid, encrypt the data before the data is sent over the internet to the service provider’s storage cloud. And during restores the encrypted data is first restored to the client and then decrypted at the client end. So unless the backup server is given access to the encryption password, temporarily at least, synthesizing a full backup from a previous full backup and subsequent incremental/differential backups would not be possible.

But there are workarounds that can be implemented which would avoid the need to decrypt the encrypted data in the backup server for synthesizing a new full backup. Let me describe the workaround we are planning to implement and the resultant additional risks this introduces…

Firstly, for every file, StoreGrid does a full backup and then subsequently does differential backups (which is the block level differences between the current file and the content of the original file that was backed up during the full backup). This is done because if we were to do subsequent incremental backups (that is the block level differenced between the current file and the content of the file the last time it was backed up either incrementally or fully) all the time, instead of differential backups, then it is very difficult to implement versioning.

This is because for restoring the latest file we need to maintain the full backup file and every incremental backup that was done. In the case of block level differential backups, the latest file can be restored using the full backup file and the latest differential backup that was done.

So versioning is easier as we can delete the differential backups that are not required to be kept. This is illustrated in Figure 1 below.

Figure 1

Figure 1

Considering the way we are doing full backups and differential backups, we plan to implement synthetic full backup without actually physically combining a previous full backup and a subsequent differential backup. Instead, as illustrated in the Figure 2 below, we would simply create a reference in the database for a synthetic full backup with the information about which previous full backup and the differential backup make up the synthetic full backup in question.

Figure 2

Figure 2

This information would have to be used only during restores. Thus by just keeping the references of full backups and differential backups required to make up a new synthetic full backup, we can eliminate the need to have the backup server decrypt the data for combining backups to synthesize a full backup.

What are the risks introduced by the above process? If we have to follow the above approach (having just references in the database without actually physically combining different backups) forever by actually doing only periodic synthetic full backup (to avoid a normal full backup), then, as illustrated in Figure 3 below, restores can become more complex and time consuming.

Figure 3

As during restore of a latest file, the first full backup file and every subsequent synthetic full backup file have to be restored along with the latest differential backup for that file. If this involved tens or hundreds of synthetic full backups then the restore process will surely become quite inefficient. Besides a simple restore of the latest file could mean restoring data which was stored months or years before. This introduces additional risks as even if one intermediate block of data from a synthetic full that was done months before is corrupted for some reason then all the backups done after that would be invalidated and cannot be restored. This is a serious risk. This risk can be eliminated either by physically synthesizing a full backup by decrypting the data when synthetic backup is done or by actually doing periodic full backups without relying on the synthetic full backup feature. The former option would mean that the backup server should have at least temporary access to the encryption key which introduces security risk. The latter option makes the restore process inefficient in addition to increasing the risk of losing data because of a small corruption in a block of data stored months before.

What is our take? We strongly believe that the fundamental philosophy behind having a robust and foolproof backup strategy is to have as much redundancy for the data as possible. Any backup strategy that sacrifices redundancy for storage efficiency or for reducing time taken for backups should be avoided if feasible. Hence, though StoreGrid would have support for the synthetic full backup feature in a few months time, we would strongly advise our partners to thoroughly analyze it and understand the implications before using this feature. Our recommended approach will always be to do periodic full backups of all the data. Perhaps, one can reduce the frequency of complete full backups by doing frequent synthetic full backups in combination with less frequent complete full backups. We would certainly not recommend completely doing away with a normal full backup altogether.

This was exactly the sentiment expressed by some of our partners when we spoke to them about this feature. Like in many other spheres of life, ‘natural’ is better than ’synthetic’, I guess!

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.