An estimated 400 Atlassian customers have had access cut for up to a week in an ongoing problem the company has blamed on a maintenance script.
The incident first occurred around April 5, with the company acknowledging the problem on Twitter two days later.
While running a maintenance script, a small number of sites were disabled unintentionally. We’re sorry for the frustration this incident is causing and we are continuing to move through the various stages for restoration. [1/3]
— Atlassian (@Atlassian) April 7, 2022
Several cloud services have now been down for a week for those customers, meaning they’ve lost access to Jira Software, Jira Work Management, Jira Service Management, Confluence, Opsgenie Cloud, Statuspage, and Atlassian Access.
Multiple users took to Twitter to vent their frustrations and try and find out what was going on.
While the company told them it had “mobilized hundreds of engineers across the organization to work around the clock to rectify the incident”, many expressed wondered why Atlassian would deploy so many engineers to work on a problem impacting “a small number of site”.
“a small number of sites” vs. “hundreds of engineers”
This sounds wrong on so many levels.
— Peter Schneider (@pschneider1968) April 8, 2022
Atlassian has more than 200,000 customers.
The company unable to say when services will be restored, attempting to assure customers that engineers are working on it around the clock via Twitter.
But many of the customers hit by the problem have complained about the lack of communication from Atlassian about what’s going on.
We’ve spoken with them but they cannot give us an RCA or an ETA of when we will be restored. JIRA and confluence. This is day 7 now and still down.
— lotoole (@LaurenceOToole2) April 11, 2022
In response to user Laurence O’Toole saying “Day 5 of #Atlassian outage. No explanation forthcoming yet. What a mess!”, the company apologised “for not being more proactive in our communication with you” adding that its first priority is getting sites back up and running.
“For all major incidents we have a post-incident review process to review the cause of the incident & technical changes that need to happen to prevent recurrence. We will be doing that & publishing it publicly,” Atlassian said in reply.
Atlassian is updating its status page every three hours and by April 10 it appeared that it had been able to restore partial access for some customers although a considerable among of restoration remains ahead.
In response to enquiries from Startup Daily, Atlassian’s PR company issued the following statement, attributed to a spokesperson:
“As part of scheduled maintenance on selected cloud products, our team ran a script to delete legacy data. This data was from a deprecated service that had been moved into the core datastore of our products. Instead of deleting the legacy data, the script erroneously deleted sites, and all associated products for those sites including connected products, users, and third-party applications. We maintain extensive backup and recovery systems, and there has been no data loss for customers that have been restored to date. This incident was not the result of a cyberattack and there has been no unauthorised access to customer data.
“We know this outage is unacceptable and we are fully committed to resolving this. Our global engineering teams are working around the clock to achieve full and safe restoration for our approximately 400 impacted customers and they are continuing to make progress on this incident. At this time, we have rebuilt functionality for over 35% of the users who are impacted by the service outage. We know we are letting our customers down right now and we are doing everything in our power to prevent future reoccurrence.”