Resolving an Atlassian cloud outage can take days

Atlassian launched three new cloud-based products at the Team ’22 conference this week, but a major cloud outage hampered core services and distracted industry viewers from the new releases.

Atlassian cloud products, including Jira Software Problem Detection, Jira Service Management ITSM, Jira Work Management, Confluence Documentation, Opsgenie Incident Response, and the Access Single Sign-On tool became inaccessible to an unspecified subset of users Tuesday, and efforts to they continued to restore from noon Thursday.

“While running a routine maintenance script, a small number of sites were inadvertently disabled, preventing them from accessing their products and data,” the company said in a statement. “We know that our customers rely on our products to get their jobs done, and we are sorry for the disruption this has caused. We are working 24/7 to make the products fully available again.”

In a later statement, the company said the incident was not the result of a cyber attack and there was no unauthorized access to customer data. The company added that while hundreds of engineers are working to restore the sites, it is also adding recovery automation to allow for faster site recovery in the future.

“Due to the unique configuration of each site and the care we take to ensure secure data recovery, we estimate full resolution can take days, although we expect customers to see recovery by product sooner,” the second statement said.

The company will publish a post-mortem after the incident is resolved, according to the latest statement.

Another update to Atlassian’s official support account on Twitter Thursday afternoon Eastern Time appeared to indicate that some customers had suffered data loss.

“This is extremely concerning to us, as our mission-critical institutional knowledge resides in Confluence at this time,” one corporate client wrote in an email to SearchITOperations. The customer, who asked for anonymity, added: “This message goes against the ‘maintenance script has’ handicapped the message of a small number of sites that we have been getting time and time again. This would also explain why the recovery has taken days with so many technicians working ’24/7′.”

Impact of Atlassian outage remains uncertain

It’s too early to say what impact the outages will have on Atlassian, but industry observers agreed the timing was exceptionally bad, given the company’s continued emphasis on its cloud-based services over the past 18 months. Atlassian’s public statements over the past year have been particularly candid about the company’s increased emphasis on cloud tools and additional incentives for users to migrate from its on-premises tools, where it has discontinued its midmarket Server editions and increased enterprise license prices.

“Many Atlassian products come from acquisitions, and it’s not easy to move to a subscription model and integrate each product,” said Larry Carvalho, independent analyst at RobustCloud. “Multi-day downtime doesn’t do any good convincing customers to make a move.”

However, other experts preferred to wait and see how long the outage lasts and how it will be resolved before predicting its ultimate impact.

Andy Thurai, Zodiac ResearchAndy Thuraic

“It depends on how quickly they fix it, how big the problem was, and what promises they make,” said Andy Thurai, vice president and chief analyst at Constellation Research. “Any cloud, including AWS, will go through this. It all depends on how they interact with it.”

Atlassian had a bad record of reliability in its cloud services during its first self-managed attempt at SaaS years ago, but a move to microservices on AWS in 2019 and the introduction of enterprise security features and service-level agreements (SLA) mattered a lot. reassure early skeptics. The company has had a track record of cloud availability ever since, announcing a 99.95% uptime SLA for its Enterprise cloud edition at this week’s Team ’22 conference, along with an early access program for the scale its cloud instances to up to 50,000 users.

Initially, the outage did not change Atlassian users’ existing views of its cloud products, whether positive or negative.

“You expect power outages from time to time,” said Chris Riley, senior manager of developer relations at marketing technology firm HubSpot, which uses Jira Software Cloud but was not affected by this week’s outage. “But I can’t really remember a single malfunction [with Atlassian]†

Other IT professionals said this week’s downtime reinforced their reluctance to use the Atlassian cloud for production apps.

“I usually only use Atlassian Cloud products for testing,” said Rodney Nissen, senior Atlassian administrator at Activision Blizzard, which uses Jira Data Center on-premises. “The thing to remember with any cloud offering is that these systems are not magical, they are just someone else’s computer. They are subject to the same errors and problems that could plague any other system.”

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Leave a Comment