Continous Integration Testing for a Rainy Day

Posted by

thunderstorm-smallIn an increasingly Cloud-centric world, we need to build and support an ever-growing list of SaaS SDLC integrations. That includes tools like Atlassian Cloud (JIRA), Microsoft’s Visual Studio Team Services, and HPE’s new Octane ALM offering.

Supporting Cloud-based integrations poses some interesting challenges:

  1. Cloud vendors typically choose a high cadence release cycle relative to on-premise deliveries. A frequent release schedule implies a higher risk of regressions sneaking into the API’s our connectors depend on.
  2. Often there is no pre-release instance available to validate our product workflows ahead of time.
  3. Releases take effect immediately. Compared with on-premise software, there is a relatively small window of opportunity to detect, investigate, and resolve issues before we receive support tickets from our customers.

Automated testing is a key ingredient of our success in building and delivering high quality, full-featured SDLC integration solutions to our customers. Our Continuous Integration system executes integration tests in rolling builds throughout the day and with every commit that we push to Gerrit.

The value of our testing infrastructure doesn’t end once our products ship. Besides helping us to validate support for new versions of on-premise tools, those same integration tests serve as an early warning system for regressions with supported on-demand tools.

Case in point, several weeks ago an integration test failed against an on-demand ALM instance in a routine rolling build late one Thursday evening. Subsequent builds the next morning failed consistently with the same error. Given there weren’t any changes to our code between the last successful build mid-Thursday and the recent failed builds, it immediately raised suspicion that the on-demand instance had been updated and an API regression had slipped in. Further investigation confirmed that the vendor had deployed an update to their on-demand instance Thursday afternoon.

The following week, we received a support ticket from one of our customers impacted by this regression. By then we already had:

  • a patch to workaround the issue and limit the impact to our product workflows.
  • an open and active support case with the vendor that included a standalone, reproducible test case.

Our customer received the interim patch in a Service Release build right away while we continued to coordinate with the vendor’s support team on a permanent fix. Thanks to fantastic and timely support from the vendor, we implemented a fix soon afterwards that re-enabled our product’s full capabilities against their on-demand ALM instance. That patch was delivered in a subsequent Service Release.

This example highlights several compelling advantages of our testing infrastructure and processes:

  1. Detecting an error soon after it’s introduced buys precious time for the engineering teams to troubleshoot the issue before it turns into a customer escalation.
  2. Our customers gain confidence in our ability to identify and resolve issues the moment they arise. Fast turnaround time on support tickets = satisfied customers.
  3. We provide value to our supported SaaS tool vendors by notifying them of regressions soon after they deploy a release. We are effectively performing additional integration testing of their products.
  4. Situations like this one create a positive-feedback loop for the engineering and support teams. It reinforces the value of our existing test assets and motivates us to introduce new tests.

Continuous integration testing: an accurate and timely forecast for stormy weather.