Posts Tagged ‘StoreGrid’

Synthetic Full Backup in the online backup world – Are we inviting trouble?

by Sekar Vembu on January 28th, 2009

We have been noticing an increase in the number of ‘prospective partners’ asking if StoreGrid supports the synthetic full backup feature. StoreGrid does not yet support this feature as we have always given a low priority to this feature in the past. But now that it is being frequently asked for, we have started implementing it and hope to have this feature in the next few months. Though we would always like to give our partners as much choice and flexibility while using StoreGrid for their online backup services business, this particular feature has been haunting me for sometime. I feel using synthetic full backup is a double edged sword; it may come to haunt you when things go wrong. In fact, some of our partners actually told us they would not use this feature at all because of the additional risks it introduces! Let me clarify some of these viewpoints and try to put all the pros and cons of the synthetic full backup feature on the table.

What is Synthetic Full Backup, anyway? Synthetic Full Backup is a way to create a new full backup without actually doing a full backup. The way it is done is by combining a previous full backup and the subsequent differential/incremental backups to “synthesize” a new full backup. Note that all of these are done at the backup server and hence it does not involve actual transfer of data from the clients to the backup server. Here is a definition of synthetic full backup on the web.

The advantage of using a synthetic full backup is that the client systems (the production servers and the user desktops/laptops) do not have to do a complete full backup periodically. This would reduce the load on client systems and the time taken for periodic full backups quite significantly. This is especially much more attractive in the online backup world because synthetic full backups eliminate the need to transfer large amount of data (involved during full backups) over the internet every time a full backup needs to be done. So far so good! So why not implement this right away considering that the advantages are so obvious. Hold your horses…

During synthetic full backup, the process of “synthesizing” a full backup is done at the backup server end. In order to “combine” a previous full backup with subsequent incremental/differential backups, the backup server should have access to the encryption key used to encrypt the backup data. Note that in the online backup world the encryption is done at the client end (the production servers and the users’ desktop/laptops). One of the most debated topics in online backup is about the security of the data that is backed up – will the service providers have access to the backed up data of their customers? Almost all online backup solutions, including StoreGrid, encrypt the data before the data is sent over the internet to the service provider’s storage cloud. And during restores the encrypted data is first restored to the client and then decrypted at the client end. So unless the backup server is given access to the encryption password, temporarily at least, synthesizing a full backup from a previous full backup and subsequent incremental/differential backups would not be possible.

But there are workarounds that can be implemented which would avoid the need to decrypt the encrypted data in the backup server for synthesizing a new full backup. Let me describe the workaround we are planning to implement and the resultant additional risks this introduces…

Firstly, for every file, StoreGrid does a full backup and then subsequently does differential backups (which is the block level differences between the current file and the content of the original file that was backed up during the full backup). This is done because if we were to do subsequent incremental backups (that is the block level differenced between the current file and the content of the file the last time it was backed up either incrementally or fully) all the time, instead of differential backups, then it is very difficult to implement versioning.

This is because for restoring the latest file we need to maintain the full backup file and every incremental backup that was done. In the case of block level differential backups, the latest file can be restored using the full backup file and the latest differential backup that was done.

So versioning is easier as we can delete the differential backups that are not required to be kept. This is illustrated in Figure 1 below.

Figure 1

Figure 1

Considering the way we are doing full backups and differential backups, we plan to implement synthetic full backup without actually physically combining a previous full backup and a subsequent differential backup. Instead, as illustrated in the Figure 2 below, we would simply create a reference in the database for a synthetic full backup with the information about which previous full backup and the differential backup make up the synthetic full backup in question.

Figure 2

Figure 2

This information would have to be used only during restores. Thus by just keeping the references of full backups and differential backups required to make up a new synthetic full backup, we can eliminate the need to have the backup server decrypt the data for combining backups to synthesize a full backup.

What are the risks introduced by the above process? If we have to follow the above approach (having just references in the database without actually physically combining different backups) forever by actually doing only periodic synthetic full backup (to avoid a normal full backup), then, as illustrated in Figure 3 below, restores can become more complex and time consuming.

Figure 3

As during restore of a latest file, the first full backup file and every subsequent synthetic full backup file have to be restored along with the latest differential backup for that file. If this involved tens or hundreds of synthetic full backups then the restore process will surely become quite inefficient. Besides a simple restore of the latest file could mean restoring data which was stored months or years before. This introduces additional risks as even if one intermediate block of data from a synthetic full that was done months before is corrupted for some reason then all the backups done after that would be invalidated and cannot be restored. This is a serious risk. This risk can be eliminated either by physically synthesizing a full backup by decrypting the data when synthetic backup is done or by actually doing periodic full backups without relying on the synthetic full backup feature. The former option would mean that the backup server should have at least temporary access to the encryption key which introduces security risk. The latter option makes the restore process inefficient in addition to increasing the risk of losing data because of a small corruption in a block of data stored months before.

What is our take? We strongly believe that the fundamental philosophy behind having a robust and foolproof backup strategy is to have as much redundancy for the data as possible. Any backup strategy that sacrifices redundancy for storage efficiency or for reducing time taken for backups should be avoided if feasible. Hence, though StoreGrid would have support for the synthetic full backup feature in a few months time, we would strongly advise our partners to thoroughly analyze it and understand the implications before using this feature. Our recommended approach will always be to do periodic full backups of all the data. Perhaps, one can reduce the frequency of complete full backups by doing frequent synthetic full backups in combination with less frequent complete full backups. We would certainly not recommend completely doing away with a normal full backup altogether.

This was exactly the sentiment expressed by some of our partners when we spoke to them about this feature. Like in many other spheres of life, ‘natural’ is better than ’synthetic’, I guess!

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Interview by BackupAnyTime

by Sekar Vembu on January 17th, 2009

I was interviewed by John O’Neill of Ireland based online backup service provider BackuupAnyTime back in August, 2008. This was part of a series of interviews John was doing with executives from the data backup industry. One of our customers happened to read it recently and he wrote us saying it may be a good idea to post my interview in our blog or at least have a post with a link to that interview. He felt the views I expressed in the interview are quite relevant to our customer and partner base. So here is the link to the interview “Backupanytime interview with Sekar Vembu of Vembu technologies

Hope you enjoy reading it.

Note: Backupanytime does not use our product, StoreGrid, for their backup service. I believe they use our competitor’s product.

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Why we are impressed with EMC

by Sekar Vembu on November 18th, 2008

I should say I am seriously impressed with EMC. I am talking about their announcement of a new subsidiary called Decho, which combines the two acquisitions they had made in the last one year: the online backup services startup, Mozy, and the Personal Information Management startup, Pi Corp. When EMC acquired Mozy I had thought EMC would use Mozy’s technology to come up with some cloud storage initiative for the enterprise and the mid-market segment. I also felt that would take away Mozy’s focus on consumer and small and medium business segment. Of course, it was probably wishful thinking too-because with our StoreGrid online backup solution we focus on the SMB market segment too.

We actually do not compete with Mozy head on as our focus has been on enabling MSPs and IT Solution providers to host and offer their own online backup service to their SMB customers. Now that EMC is creating a new subsidiary, Decho, which will exclusively focus on the consumer (and the SMB ???) segment we need to take note of that and be prepared to start competing with them sometime in the future. But it is always good to have a formidable competitor. That will help us motivate ourselves to think better and work harder to make StoreGrid a better platform for our partners to offer an online backup service.

Coming back to why I am impressed with EMC! Being such a large company primarily focusing on the enterprise and mid-market segment, it would have been an execution disaster if they had tried to keep Mozy ‘in house’ and focus on the consumer/SMB segment. Chuck Hollis, EMC’s VP, Global Marketing, puts it succinctly in his blog  post – as to why this is a great move by EMC.

“I think the decision to create a separate standalone entity speaks volumes as to how EMC’s thinking has matured: this is a market that’s important to EMC, we really don’t have this sort of thing in our DNA, better leave to people who DO understand this space, and give them what they need to be successful.”

I think it is next to impossible for EMC to position themselves in the SMB market given that the company was built on a model of selling to large corporations. With a separate business which will have its own management, organization & business model, they can now be a formidable force to reckon with in the consumer/SMB market segment.

Needless to say,  we are quite positive about the general growth in the market for online backup services and our ability to do well  (in a niche of our own, at the very least) by building a great online backup platform with StoreGrid. Our recent Amazon Cloud support reaffirms our commitment to keeping you at the cutting edge of technology.

Not that we are not worried about EMC….I’d rather say that it helps to have a ‘target Goliath’ – to stay focused and put up a good fight!

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

Backup data to the cloud, within the cloud and from the cloud

by Sekar Vembu on November 8th, 2008

The initial interest in our StoreGrid Cloud AMI solution for Amazon Web Services has been extremely encouraging. As we have more and more partners & customers testing it and discussing their ‘use cases’, we have been getting fantastic insights into the myriad possibilities the cloud throws up for all types of users. This includes MSPs and VARs offering online backup services and businesses running custom applications in the Amazon Cloud.

Backup to the cloud

The typical use case and the premise on which we started our Cloud AMI, this deployment  has service providers running the StoreGrid Cloud AMI as a backup server in Amazon EC2 by provisioning their own Amazon EBS storage. Thereafter, they install the StoreGrid Client in their customer PCs & Servers and configure it to backup to the StoreGrid Cloud AMI. This deployment is very compelling for service providers who do not want to host their customers’ backup data in their own data center. Compelling as it is, it is worth mentioning a few disadvantages with this approach…

One of the advantages a local service provider enjoys is her proximity to the customer. This is especially important when a customer has large amount of data, say 100 GB or more, to be backed up. The initial seed backup for that kind of data over the internet is going to take a long time. To circumvent this problem, StoreGrid supports a feature called “Local to Remote Server Migration (L2R)” which allows the service provider to go on-premise and do a local backup of the first full backup to an external drive. The service provider then manually copies the data from the external drive to the StoreGrid backup server deployed in her data center and then runs the “L2R” module in StoreGrid. This will ensure that the subsequent backups are done incrementally, i.e. only changed blocks are sent over the wire on subsequent backups. With an Amazon deployment, using this L2R feature would not be possible simply because one does not have physical access to the Amazon cloud. Hence, the first full backup has to be done over the internet, regardless of however long that will take. The same is the case when you have to do large restores. A StoreGrid online backup Service Provider with her own data center can do a quick server side restore to an external disk and deliver it to her customer. With the Amazon deployment, the restores have to be done only over the internet even if it is 100s of GB of data.

By no means am I trying to discourage service providers from using the Amazon cloud as the data center for their online backup service business. But it is best to take decisions after analyzing all pros and cons along with what exactly the customers’ needs are. It is also best to set the expectations of the end customer upfront so that the customer is fully aware of, and educated on what she is signing up to. That way you won’t have a “but I thought you’d ship me my data in 1 hour” kind of situation.

A hybrid approach – backup locally and to the cloud

In light of the above discussion, service providers who want to leverage the Amazon Cloud but have the benefit of quick on-site restores could explore a hybrid option wherein the StoreGrid backup server is deployed locally in a customer site and the StoreGrid Cloud AMI is run as a replication server in Amazon EC2. In this deployment model, the on-premise backup servers would be replicating the backup data to the replication server in the Amazon Cloud. A single replication server can receive data from multiple backup servers running across multiple customer sites. We have many service provider partners using the hybrid approach already with the StoreGrid replication server deployed in their own data center.

Backup within the cloud – backing up data from custom applications running in Amazon EC2

Very interestingly, there are also a few service providers and some end users who are deploying the StoreGrid Client in Amazon EC2 along with their custom applications (which are already running in EC2). We did not think about this use case initially but in retrospect its a fairly obvious opportunity…

Considering that many businesses are looking at running their custom applications in Amazon EC2, backing up application data (which is typically stored in the Amazon EBS volumes) from these custom applications are extremely important too! Even though Amazon supports backing up the EBS storage to Amazon S3 as a snapshot, this is not always sufficient. The reason being the snapshot backup of a whole EBS volume does not provide the granularity required for a partial data restore. With snapshot backup, you can only restore the whole volume data into a new EBS volume. However, with a StoreGrid client deployed in an Amazon EC2 instance running a custom application, businesses and IT solution providers, now have the option of configuring file level backups of the EBS volumes. This also applies to backing up data from any application which uses a relational database back end like MySQL or Microsoft SQL Server – since these database backups are supported by StoreGrid!

Where would these clients backup to? Typically, to a StoreGrid Cloud AMI deployed as a backup server – ideally, running in a different availability zone in the Amazon Cloud. The backing up of EBS volume at a granular file level would give enormous flexibility while trying to restore data partially. No wonder we are already generating some interest with this deployment option.

A reversal of roles – backup from the cloud to on-premise storage

Honestly, we didn’t see this coming…

An end user mentioned that they wanted to backup all their data in Amazon EBS to their on-premise storage. Read that again – from Amazon to their office!!! I was initially not convinced and wondered why someone would want to do that? Here’s why! Though he (the customer) liked running his applications in Amazon EC2 because of the benefits it offered, he was not wholly comfortable with all his customer data present only in the Amazon Cloud.

“What if the Amazon Cloud goes down or what if Amazon itself loses my application data because of some bug or an issue?”, he said. He asked me if he could deploy StoreGrid Client along with his application in Amazon EC2, have a StoreGrid backup server on-premise in his office, and simply backup the application data (stored in Amazon EBS) to the on-premise StoreGrid backup server. “Why Not?”, I thought to myself, and asked him to try it out – there’s no reason StoreGrid shouldn’t work for this kind of a deployment!

On top of this he also told me he would backup the backed up data to tape periodically and ship it for off-site storage. While I personally believe (and have told him so) him to have ‘data paranoia’, I fully understand that this is the nature of the beast! It all depends on the value you attribute to your data!

Needless to say, we are excited about all these possibilities. We are especially excited with the challenge of enhancing StoreGrid to seamlessly support such possibilities. We are gearing ourselves to explore these new frontiers!

I’d love to hear from our (current and prospective) partners and customers about their views and experiences. Got an Amazon story of your own? Do let us know.

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.

StoreGrid supports Amazon Cloud – Choice and Flexibility is our mantra

by Sekar Vembu on October 28th, 2008

Hot on the heels of Amazon removing the Beta tag and releasing Amazon EC2 for production, we are excited to announce the Beta release of Vembu StoreGrid Cloud AMI, which facilitates deploying StoreGrid in Amazon cloud computing infrastructure. This has been a long pending demand from our partner base, who are MSPs, VARs and IT Solution providers offering online backup services using StoreGrid.StoreGrid Cloud AMI in Amazon Web Services

StoreGrid Cloud AMI Beta is available for both Microsoft Windows Server and CentOS Linux Server. Also, the StoreGrid backup server uses the MySQL 5.0 database. All these are bundled together in the StoreGrid Cloud AMI to facilitate ease of deployment for our partners. Of course, we are working on lot more automation as we try to move into production release before the end of 2008.

Why is StoreGrid Cloud AMI relevant for our partners?

Our primary target market segment is Small and Medium Businesses. Considering the growing complexity of IT infrastructure it is our strong belief that it is not easy for software vendors to directly service SMB customers. Close proximity to the customer is extremely important when you service SMB customers. Hence the local VAR or an MSP is in the best position to provide IT services to a small and medium business customer. This is especially relevant when it comes to data backups and more specifically online backups. As we work with large number of partners servicing different types of small and medium businesses with different sets of requirements, it is an absolute must that any IT product or solution we build should provide the maximum flexibility when it comes to deployment options or other relevant functionality.

Given this context, we have always focused on giving as much choice to our partners as they go about augmenting their business with an online backup service powered by StoreGrid. Specifically, as cloud computing as a framework gains momentum, as an aspiring leader in the online backup category, we recognize the need to provide the choice of deploying StoreGrid in a leading cloud computing infrastructure – and nothing beats Amazon EC2 and Amazon S3 for a start.

Moreover, for the last two years we have primarily worked with partners who are willing to host StoreGrid in their own data center and offer online backup services to their customers. Many of our prospective partners had expressed interest in having a solution which they can host in a cloud computing environment like Amazon EC2/S3. With the release of StoreGrid Cloud AMI, we are responding to a long under-served market demand.

With StoreGrid Cloud AMI, any IT solution provider (MSPs, VARs) can now start an online backup service without any capital investment. All they have to do is to get an account in Amazon Web Services, instantiate an instance of StoreGrid Cloud AMI, create and mount the Amazon Elastic Block Store (EBS) volume as a backup storage and start offering online backup service to their customers. It is as simple as that. The backup data stored in Amazon EBS is periodically backed up as a snapshot to Amazon S3 for redundancy. On top of this partners who require another level of redundancy can instantiate StoreGrid Cloud AMI as a replication server and replicate the backup data to another Amazon EBS volume. This again can be backed up as a snapshot to Amazon S3.

Our existing partners or partners who prefer to deploy StoreGrid in their own data center can now use Amazon cloud infrastructure as a redundant storage for the backup data in their data center. All they have to do is to deploy StoreGrid Cloud AMI as a Replication Server in Amazon EC2 and configure their internally deployed StoreGrid backup server to replicate the backup data to the StoreGrid replication server running in Amazon EC2.

As I said, choice and flexibility of deployment is what we provide our partners. To summarize, with StoreGrid, our partners now can offer an online backup service in the following ways:

1. StoreGrid backup server and StoreGrid replication server deployed in their own data center with their own local storage.

2. StoreGrid backup server and StoreGrid replication server in Amazon EC2 with Amazon EBS volume as the mounted storage. And for additional redundancy data in the EBS volume is backed up as a snapshot to Amazon S3 storage.

3. StoreGrid backup server deployed in their own data center with local storage and StoreGrid replication server deployed in Amazon EC2 with Amazon EBS volume as the mounted storage for the replication data. Again for additional redundancy data in the EBS volume is backed up as a snapshot to Amazon S3 storage.

4. Another deployment which is also popular amongst some partners is to deploy StoreGrid backup server on-premise in the end customer location so that there is local copy of the backup data for quick restores. And these partners can now deploy StoreGrid Cloud AMI as a replication server and replicate the on-premise backup server to the Amazon EC2 deployed replication server.

You can learn about more technical details on using the StoreGrid Cloud AMI at http://www.vembu.com/storegrid/amazon-ec2-s3-cloud-online-backup.html

The above post was written by Sekar Vembu of Vembu Technologies. Vembu Technologies is a backup software vendor whose product, StoreGrid, powers the online backup services of a large number of service providers across the globe. Besides remote backup, StoreGrid is also used for on premise backups of workstations and servers at various companies & universities.