Bringing new life to ATLAS data

The ATLAS collaboration breathes new life into its LHC Run-2 dataset, recorded from 2015 to 2018. Physicists will reprocess the entire dataset — nearly 18 PB of collision data — using an updated version of the ATLAS offline analysis software (Athena). ). Not only will this improve ATLAS’ physical measurements and searches, it will position the collaboration well for the upcoming challenges of Run 3 and beyond.

Athena converts raw signals recorded by the ATLAS experiment into more simplified datasets for physicists to study. The new and improved version has been in development for several years and includes multi-threading capabilities, more complex physics analysis functions, and improved memory consumption.

“Our goal was to significantly reduce the amount of memory needed to run the software, expand the types of physics analysis it could do, and most importantly, enable current and future ATLAS data sets. together,” said Zach Marshall, ATLAS Computing Coordinator. “These improvements are an important part of our preparations for future high-intensity LHC operations – particularly the High-Luminosity LHC (HL-LHC) to be run around 2028, putting extremely high demand on ATLAS’ computing resources.”

This latest version of Athena is already making good progress in reducing the computing resources required for data analysis. For example, the computationally intensive work of taking individual signals from the inner detector and linking them together to form particle trails is now two to four times faster. Less disk space is required to save the results and overall the software runs smoother.

In addition, physicists now have the ability to tackle “multi-threading” of events. “While previous software improvements allowed for greater parallelization in ATLAS data processing, this improvement allows us to process multiple events simultaneously and analyze multiple parts of a collision event simultaneously,” Marshall explains. “This change required tens of thousands of code changes; it significantly reduces the required memory usage and increases the throughput of events.”

Physicists will reprocess the entire dataset — nearly 18 PB of collision data — using an updated version of the ATLAS offline analysis software (Athena).

The software improvements also provide new ways for physicists to study their data. For example, investigators can now search for traces further away from the collision point by default. These could be signatures of long-lived particles and could lead to evidence of exciting physical processes beyond the Standard Model. While such searches were possible with the earlier version of the ATLAS software, the heavy computing resources they required meant that they could not always be performed.

Finally, physicists have also made improvements to the databases that contain all the time-dependent status information of the detector components. These databases – which Athena runs on – now contain a better understanding of how the detector works during Run 2. “Each period of data collection is an opportunity for us to learn more about the detector and its subsystems,” says Song-Ming Wang, ATLAS Data Preparation Coordinator. “Looking at these databases in retrospect will allow us to deliver even better performance.”

Now that the new Athena software is up and running, researchers have started to reprocess the entire Run-2 dataset. This will take several months as the dataset is quite large. One challenge in dealing with all this data is related to the way it is accessed and stored. Raw data files are stored on magnetic tape in the CERN data center and in global LHC Computing Grid centers around the world. Rather than recalling large chunks of the data set to be processed at once—which would require significant (and expensive) storage space—data is orchestrated so that only small percentages of it are processed at a time. Once completed, physicists will use the same strategy to reprocess the billions of simulated events used in physics analysis.

After all this work, ATLAS will have a significantly improved dataset that will allow sharper measurements, more powerful searches and easier combinations of past data with future data!

Learn more

Leave a Comment