GoldMail and Windows Azure

I am the director of software engineering at a small startup called GoldMail. We have a desktop product that lets you add images, text, Microsoft PowerPoint presentations, etc., and then record voice over them. You then share them, which uploads them to our backend, and we give you a URL. You can then send this URL to someone in an e-mail with a hyperlinked image, post it on facebook or twitter or linkedin, or even embed it in your website. When the recipient clicks the link, the GoldMail plays in any browser using either our Flash player or our html5 player, depending on the device. We track the views, so you can see how many people actually watched your GoldMail.

GoldMail is like video without the overhead. I use it here on my blog, and many of our customers use it for marketing and sales. (I also use it for vacation pictures.)

What can Azure do for us?

About a year and a half ago, I attended an MSDN event hosted by Bruno Terkaly talking about Windows Azure, and I was impressed, especially at the possibilities it provided for a startup. Rather than buying and hosting enough servers to handle your “Oprah moment” – the day she talks about your product on her show, followed by a huge increase in traffic which tests whether you really did scale your infrastructure correctly – you can start small with low cost, and then scale up as you need to. (Presumably, your rising infrastructure needs will mirror your rising revenue stream!). Also, we primarily use the Microsoft development stack, and it was very appealing to leverage the .NET skills we already had.

At the time, we had several servers maintained by a hosting company in Silicon Valley, and the cost was a substantial part of our monthly burn rate. I met with the VP of Operations (Samar Kawar) and we estimated what we thought it would cost to host our infrastructure entirely in Azure. The estimate was so low, we doubled it before presenting it to management.

Unfortunately, our contract with our hosting company was about to roll over for another year. We couldn’t do the migration before the end date of the contract, so the project was shelved. A month or two later, we discovered that the contract rolled over to a month-to-month contract. Not surprisingly, the project was suddenly resurrected and given top priority!

My total experience at that point with Azure was:

  1. attending a Microsoft event explaining the capabilities,
  2. reading the book Azure in Action by Brian Prince and Chris Hay, and
  3. spending a lot of time thinking about it in the shower.

So of course I told management we could finish the development in about 30 days, and have everything go into production within 60 days.

My boss thought I was completely nuts. So I told him I could get our Silverlight application to work in Azure in 15 minutes + deployment time. He was skeptical. So I showed him the quick way to turn a web application (or Silverlight, in this case) into a cloud project. I did this (which is outlined in the first page of this article):

  1. Opened the Visual Studio solution.
  2. Added a cloud project with no roles.
  3. Right-clicked on the cloud project and selected “Add web role in project”, selected the Silverlight project, and hit OK.
  4. Set the Azure configuration values for the storage account.
  5. Right-clicked on the cloud project and published it to Azure.
  6. Ran the application, and it worked.

My boss said, “Okay, go for it. I still think you’re nuts, but maybe not completely nuts.”

What do we have?

Let’s talk infrastructure. Here’s what we had:

  • Desktop client application (used to create the GoldMails)

  • Flash client application (used to play the GoldMails)
  • HTML5 client application (used to play the GoldMails on mobile devices)
  • Silverlight application (communicates with CRM system and SQLServer database)
  • A bunch of .NET 2.0 asmx web services used to talk to the SQL Server database from the client applications
  • Web services used by client applications to talk to our CRM system
  • Web applications for user management and affiliate management; these used the web services to communicate with our CRM system
  • Our company website
  • SQL Server database
  • Integration service to communicate between our CRM system and the SQL Server database.

We wanted to migrate everything except for the CRM system.

I set benchmarks for the project, and we set to work.

What’s the big plan?

Migrate the SQL Server database to SQL Azure. We did this using the SQL Azure Migration Wizard on codeplex; it worked great. I discovered some triggers and some CLR routines that wouldn’t migrate, but they weren’t critical, and were easily re-architected. We did this migration first, because you can’t test anything without the database!

Migrate the .NET 2.0 asmx web services. All access to the SQL Server (now SQL Azure) database goes through these services. I re-architected the services, separating them by product so we could scale them separately. And because no programmer worth his (or her) salt is going to miss an opportunity to upgrade to the latest version, I converted them to .NET 4 WCF services running in web roles.

I had never used WCF, so I had to stop and figure it out. I used two books for reference, Learning WCF by Michele Bustamante and Essential WCF by Resnick, Crane, and Bowen. Between those two books, I managed to grasp the principles and create simple WCF services. (Trying to learn WCF quickly was actually more painful than learning Azure.)

The WCF service for the players writes update requests to a queue, and a worker role pulls the entries from the queue and writes them to the database. I did this because I didn’t want to customer to have to wait for a response from the web service to continue with the customer’s workflow, and to even out some of the database access.

The WCF service used by the desktop application submits an entry to a queue after a customer shares a GoldMail. A worker role retrieves the entry from the queue and creates small versions of the customer’s large slides, to be used for our mobile player. I removed this function from the client application, which reduced the amount of time it took to share a GoldMail by 50%.

Migrate the Integration service (CRM <—> SQL Server). This was a SQL Server job with a lot of big queries. We migrated this to a worker role, feeding the changes into queues which were then processed separately. This was a critical path project; without this piece, none of the client applications could go into production. This was completely re-architected to introduce significant improvements in the process.

Migrate the Silverlight application. As I did in the demo to my boss, I added a cloud project, set the Silverlight application project as the web role, and updated the Azure configuration. This application is used by our desktop product.

Modify the desktop application. This had to be changed to call the services in Azure instead of the old .NET 2.0 asmx web services. Because all of the access methods are in a proxy layer that resides in one method, only that one method had to be changed to update our desktop product. Plus, as noted above, I removed the creation of the smaller images for the mobile player.

Modify the flash application. This had to be changed to call the new services. I had to do a bit of trial-and-error to figure out the right bindings for the WCF service so it would be callable from flash.

Migrate the html5 application. This was easily migrated to a web role.

Migrate the web services used to talk to the CRM system. We converted these to .NET 4 WCF services running in a web role.

Migrate the web applications (signup/affiliate) that update the CRM system. We changed these to access the new services and migrated the web applications to web roles.

Migrate our website. We had a component that was installed on the webserver that we were using to check for browser and machine type (Mac/PC) and redirect accordingly. I removed the component and changed the webpage that used it to check the user agent string instead. We also use URL Rewrites, so I had to figure out how to configure that in my web.config so it would work in Azure. Then I just added a cloud project, set the website as the web role, and updated the Azure configuration (just like the Silverlight application).

The devil is in the details

I figured out how to do the builds and set up the configurations and turned it over to our release manager/build engineer. I provided information to the other engineers about installing the Azure tools and how to set up a WCF service, set up the configurations for the projects, moved values from the web.config to the Azure configurations, and handled a hundred other small details.

As we progressed, we put the services and changes into staging for QA to test them while we were working on the next set of products. 33 days in, we migrated the major bits that used the SQL Azure database — the WCF services, the desktop application, the Flash player, the Silverlight application and the Integration service. It took us about a day to put all of it in production, primarily because the migration of the database took so long.

A week later, everything else was released to production. So the whole cycle took us about 40 days. We would never have been able to do this migration without a great team of people really pulling together and making it happen.

Of course, just because everything went into production, it didn’t mean we were finished. We had to deal with any problems that came up, right? Most of these were small and easily fixed, but we did hit one that caused us some grief : SQL Azure connection problems.

SQL Azure Connection Management

We had done load testing, but hadn’t seen any problems with SQL Azure. After going into production, the trace logs were full of exceptions from trying to open connections to the database or execute stored procedures. I talked to Microsoft about the issues, and they recommended putting in “exponential retry code”. (To be honest, I was a little disturbed that they had an official phrase for it.) They sent me to their connection management article to explain why.

Exponential retry code means if the call to SQL Azure fails, you call again immediately. Then if the second call fails, you wait a few seconds and try again. If it fails again, you wait longer and try again, etc. They do have a framework that they recommend you use, but I wasn’t aware of it at the time. I put retry code in using the brute force method – I added retry code to all of our services that call SQL Server. This helped the problem, but not enough. After being a squeaky wheel, Microsoft assigned someone from the SQL Azure team to me to look at the problem.

Microsoft took copies of our code, our database and our trace logging, and went off to ruminate on it. They came back about a month later and said, “It’s not you, it’s us.” I told them I wanted to break up. (Ha!). They were handling the case of large databases and large resource requirements – they had throttling and that sort of thing set up – but they didn’t have any minimum resource levels, and hadn’t given a great deal of thought to companies like ours with services that only connect periodically. Basically, we were being kept out of the playground by the bullies having all the fun.

Over the following months, we saw huge improvements in SQL Azure performance. A year and a half later, we rarely see connection problems, and when we do, they usually succeed on the first retry.

What was the final outcome of the migration?

Overall, the migration took us about two months. But in the interest of full disclosure, I have to admit it didn’t really take two months. I worked over 700 hours of overtime in that two months. I worked 9 a.m. to 2 a.m. pretty much every day, splitting my time between doing the programming, providing architecture advice and programming help to the other developers, and managing the project. It was very intense, and a lot of fun, and the outcome gave me great satisfaction.

I was really interested to see what our costs would be after doing the migration. We had sized all of our instances to match the servers we had, which was too large, but we figured it was better than too small. After the dust settled, we found that the cost of all of our services, databases, etc., was reduced by about 85% when we moved from a traditional IT environment to running on Windows Azure.

Over time, I added performance indicators and sized our instances more appropriately. Even after adding more hosted services, we are now paying 90% less than we used to pay for actual servers and hosting by running in the cloud.

Windows Azure is a great platform for startups because you can start with minimal cost and ramp up as your company expands. You used to have to buy the hardware to support your Oprah moment even though you didn’t need that much resource to start with, but this is no longer necessary. (See current Azure pricing here.)

An additional benefit that I didn’t foresee was the simplification of deployment for our applications. We used to manually deploy changes to the web services and applications out to all of the servers whenever we had a change. Manual upgrades are always fraught with peril, and it took time to make sure the right versions were in the right places. Now the build engineer just publishes the services to the staging instances, and then we VIP swap everything at once.

And here’s one of the best benefits for me personally – Since we went into production a year and a half ago, Operations and Engineering haven’t been called in the middle of the night or on a weekend even once because of a problem with the services and applications running on Windows Azure. And we know we can quickly and easily scale our infrastructure, even though Oprah has retired.

Note: GoldMail is now doing business as PointAcross. 3/7/2014

Tags:

6 Responses to “GoldMail and Windows Azure”

  1. vijay Says:

    cool ! all the best.

  2. OakLeaf Systems: Windows Azure and Cloud Computing Posts for 6 … | Cloud Computing Updates Says:

    […] Shahan (@robindotnet) posted GoldMail and Windows Azure on 6/12/2012. It begins as […]

  3. Azure for Developers Tutorial Step 3: Creating the WCF service, part 2 « RobinDotNet's Blog Says:

    […] on Azure, I started seeing lots of connection problems. These are detailed in my article about our migration to Azure. To summarize, Microsoft hadn’t done a lot of testing for the case of a client hitting the […]

  4. hicithore.blinkweb.com Says:

    I like this weblog so a lot, saved to my bookmarks .

  5. Global Windows Azure Bootcamp SF April 27th 2013 | RobinDotNet's Blog Says:

    […] Windows Azure. (For more information about how much fun I’ve had with Windows Azure, check out this blog post.) Sign up for the event closest to you and have a great […]

  6. Rebranding GoldMail and keeping up with Azure SDK updates | RobinDotNet's Blog Says:

    […] you know if you’ve read my post about our implementation of Windows Azure, I am a glutton for punishment, so I’ve commited to rebranding everything and upgrading […]

Leave a comment