Moving Azure blob storage for the Rebranding/AzureSDK2.0 Project

In my last post, I talked about the work required to upgrade all of the cloud services to SDK 2.0 and rebrand “goldmail” as “pointacross” in the code. After all these changes were completed and turned over to QA, the real difficulties began – moving the data. Let’s talk about moving blob storage.

Storage accounts and data involved

We set up new storage accounts in US West with “pointacross” in the name, to replace the “goldmail” storage in US North Central. We decided to leave the huge amounts of diagnostics data behind, because we can always retrieve it later if we need to, but it’s not worth the trouble of moving it just to have it in the same Windows Azure storage tables and blob storage as the new diagnostics. So here’s what we had to move:

GoldMail main storage (includes the content libraries and diagnostics)
GoldMail Cloud Composer storage
GoldMail Saved Projects

When you use the PointAcross application (previously known as the Cloud Composer), you upload images and record audio. We store those assets, along with some other application information, in blob storage. This yields many benefits, but primarily it means you can work on a PointAcross project logged in from one computer, then go somewhere completely different and log in, and all of your assets are still there. You can also save your project after you publish your message, or while working on it – these go into the Saved Projects storage. Those two storage accounts have a container for each customer who has used that application or feature. Fortunately, we’re just in the beginning phases of releasing the PointAcross project widely, so there are only about a thousand containers in each storage account.

The main storage includes a bunch of data we use here and there, and the content library assets. There are a lot of files, but only about 20 containers.

Option 1 : Using AZCopy

So how do we move this data? The first thing we looked at is AZCopy, from the Windows Azure Storage team. This is a handy-dandy utility, and we used it to test migrating the data from one storage account to the other. You run it from the command window. Here’s the format of the command:

AzCopy [1]/[2] [3]/[4] /sourcekey:[5] /destkey:[6] /S

    [1] is the URL to the source storage account, like http://mystorage.blob.core.windows.net
    [2] is the container name to copy from
    [3] is the URL to the target storage account, like http://mynewstorage.blob.core.windows.net
    [4] is the container name to copy to
    [5] is the key to the source storage account, either primary or secondary
    [6] is the key to the destination storage account, either primary or secondary
    /S means to do a recursive copy, i.e. include all folders in the container.

I set up a BAT file called DoTheCopy and substituted %1% for the container name in both places, so I could pass it in as a parameter. (For those of you who are too young to know what a BAT file is, it’s just a text file with a file extension of .bat that you can run from the command line.) This BAT file had two lines and looked like this (I’ve chopped off the keys in the interest of space):

ECHO ON
AzCopy http://mystorage.blob.core.windows.net/%1% http://mynewstorage.blob.core.windows.net/%1%

I called this for each container:

E:\AzCopy> DoTheCopy containerName

I got tired of this after doing it twice (I hate repetition), so I set up another BAT file to call the first one repeatedly; its contents looked like this:

DoTheCopy container1
DoTheCopy container2
DoTheCopy container3

The AZCopy application worked really well, but there are a couple of gotchas. First, when it sets up the container in the target account, it makes it private. So if you want the container to be private but the blobs inside to be public, you have to change that manually yourself. You can change it programmatically or using an excellent tool such as Cerebrata’s Azure Management Studio.

The second problem is that those other two storage accounts have over a thousand containers each. So now I either have to type all of those container names in (not bloody likely!), or figure out a way to get a list of them. So I wrote a program to get the list of container names and generate the BAT file. This creates a generic list full of the command lines, converts it to an array, and writes it to a BAT file.

//sourceConnectionString is the connection string pointing to the source storage account
CloudStorageAccount sourceCloudStorageAccount = 
    CloudStorageAccount.Parse(sourceConnectionString);
CloudBlobClient sourceCloudBlobClient = sourceCloudStorageAccount.CreateCloudBlobClient();
List<string> outputLines = new List<string>();
IEnumerable<CloudBlobContainer> containers = sourceCloudBlobClient.ListContainers();
foreach (CloudBlobContainer oneContainer in containers)
{
    string outputLine = string.Format("DoTheCopy {0}", oneContainer.Name);
    outputLines.Add(outputLine);
}
string[] outputText = outputLines.ToArray();
File.WriteAllLines(@"E:\AzCopy\MoveUserCache.bat", outputText);

That’s all fine and dandy, but what about my container permissions? So I wrote a program to run after the data was moved. This iterates through the containers and sets the permissions on every one of them. If you want any of them to be private, you have to hardcode the exceptions, or fix them after running this.

private string SetPermissionsOnContainers(string dataConnectionString)
{
  string errorMessage = string.Empty;
  string containerName = string.Empty;
  try
  {
    CloudStorageAccount dataCloudStorageAccount = CloudStorageAccount.Parse(dataConnectionString);
    CloudBlobClient dataCloudBlobClient = dataCloudStorageAccount.CreateCloudBlobClient();

    int i = 0;
    List<string> containersToDo = new List<string>();

    IEnumerable<CloudBlobContainer> containers = dataCloudBlobClient.ListContainers();
    foreach (CloudBlobContainer oneContainer in containers)
    {                   
      i++;
      System.Diagnostics.Debug.Print("Processing container #{0} called {1}", i, oneContainer.Name);

      CloudBlobContainer dataContainer = dataCloudBlobClient.GetContainerReference(containerName);
      //set permissions
      BlobContainerPermissions permissions = new BlobContainerPermissions();
      permissions.PublicAccess = BlobContainerPublicAccessType.Blob;
      dataContainer.SetPermissions(permissions);
    }
  }
  catch (Exception ex)
  {
    errorMessage = string.Format("Exception thrown trying to change permission on container '{0}' "
        + "= {1}, inner exception = {2}",
      containerName, ex.ToString(), ex.InnerException.ToString());
  }
  return errorMessage;
}

Option 2: Write my own solution

Ultimately, I decided not to use AZCopy. By the time I’d written this much code, I realized it was just as easy to write my own code to move all of the containers from one storage account to another, and set the permissions as it iterated through the containers, and I could add trace logging so I could see the progress. I could also hardcode exclusions if I wanted to. Here’s the code for iterating through the containers. When getting the list of containers, if it is the main account, I only want to move specific containers. This is because I moved some that were static ahead of time. So for this condition, I just set up an array of container names that I want to process. For the other accounts, it retrieves a list of all containers and processes all of them.

private string CopyContainers(string sourceConnectionString, string targetConnectionString, 
  string accountName)
{
  string errorMessage = string.Empty;
  string containerName = string.Empty;
  try 
  {
    CloudStorageAccount sourceCloudStorageAccount = CloudStorageAccount.Parse(sourceConnectionString);
    CloudBlobClient sourceCloudBlobClient = sourceCloudStorageAccount.CreateCloudBlobClient();
    CloudStorageAccount targetCloudStorageAccount = CloudStorageAccount.Parse(targetConnectionString);
    CloudBlobClient targetCloudBlobClient = targetCloudStorageAccount.CreateCloudBlobClient();

    int i = 0;
    List<string> containersToDo = new List<string>();
    if (accountName == "mainaccount")
    {
      containersToDo.Add("container1");
      containersToDo.Add("container2");
      containersToDo.Add("container3");

      foreach (string oneContainer in containersToDo)
      {
        i++;
        System.Diagnostics.Debug.Print("Processing container #{0} called {1}", i, oneContainer);
        MoveBlobsInContainer(oneContainer, accountName, targetCloudBlobClient, sourceCloudBlobClient);
      }
    }
    else
    {
      IEnumerable<CloudBlobContainer> containers = sourceCloudBlobClient.ListContainers();                    
      foreach (CloudBlobContainer oneContainer in containers)
      {
        i++;
        System.Diagnostics.Debug.Print("Processing container #{0} called {1}", i, oneContainer.Name);
        MoveBlobsInContainer(oneContainer.Name, accountName, targetCloudBlobClient, sourceCloudBlobClient);
      }
    }
  }
  catch (Exception ex)
  {
    errorMessage = string.Format("Exception thrown trying to move files for account '{0}', " +
      "container '{1}' = {2}, inner exception = {3}",
      accountName, containerName, ex.ToString(), ex.InnerException.ToString());
  }
  return errorMessage;
}

And here’s the code that actually moves the blobs from the source container to the destination container.

private string MoveBlobsInContainer(string containerName, string accountName, 
  CloudBlobClient targetCloudBlobClient, CloudBlobClient sourceCloudBlobClient)
{
  string errorMessage = string.Empty;
  try
  {
    long totCount = 0;
    //first, get a reference to the container in the target account, 
    //  create it if needed, and set the permissions on it 
    CloudBlobContainer targetContainer = 
      targetCloudBlobClient.GetContainerReference(containerName);
    targetContainer.CreateIfNotExists();
    //set permissions
    BlobContainerPermissions permissions = new BlobContainerPermissions();
    permissions.PublicAccess = BlobContainerPublicAccessType.Blob;
    targetContainer.SetPermissions(permissions);

    //get list of files in sourceContainer, flat list
    CloudBlobContainer sourceContainer = 
      sourceCloudBlobClient.GetContainerReference(containerName);
    foreach (IListBlobItem item in sourceContainer.ListBlobs(null, 
      true, BlobListingDetails.All))
    {
      totCount++;
      System.Diagnostics.Debug.Print("Copying container {0}/blob #{1} with url {2}", 
        containerName, totCount, item.Uri.AbsoluteUri);
      CloudBlockBlob sourceBlob = sourceContainer.GetBlockBlobReference(item.Uri.AbsoluteUri);
      CloudBlockBlob targetBlob = targetContainer.GetBlockBlobReference(sourceBlob.Name);
      targetBlob.StartCopyFromBlob(sourceBlob);
    }
  }
  catch (Exception ex)
  {
    errorMessage = string.Format("Exception thrown trying to move files for account '{0}', "
      + "container '{1}' = {2}, inner exception = {3}",
        accountName, containerName, ex.ToString(), ex.InnerException.ToString());
  }
  return errorMessage;
}

You can hook this up to a fancy UI and run it in a background worker and pass progress back to the UI, but I didn’t want to spend that much time on it. I create a Windows Forms app with 1 button. When I clicked the button, it ran some code that set the connection strings and called CopyContainers for each storage account.

Did it work?

When we put all of the services in production, as our Sr. Systems Engineer, Jack Chen, published all of the services to PointAcross production, I ran this to move the data from the goldmail storage accounts to the pointacross storage accounts. It worked perfectly. The only thing left at this point is moving the Windows Azure SQL Databases (the database previously known as SQL Azure Winking smile ).

Tags: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: