Teach new tricks to established software

For nearly 20 years, the ATLAS collaboration has used the Athena software framework to turn raw data into something physicists can use in analysis. Built on the Gaudi framework, Athena was written to be flexible and robust, using configurable strings of “algorithms” to process both simulated and real data. This modularity has allowed Athena to be used for many different tasks, from simulating the detector’s response to understanding its behavior, through calibration studies and physical analysis.

Figure 1
Figure 1: 42 years of chip trend data by Karl Rupp (Image: Karl Rupp)

However, the world of IT has evolved since Athena’s inception. In particular, as shown in Figure 1, there has been a paradigm shift from increasing compute performance (in blue) by increasing processor frequency (in green), to keeping the frequency constant but by increasing the number of computing processors, or cores (in black). This industry-wide shift has affected the processors in our phones, laptops and desktops, as well as the processors in the global computing resources that ATLAS relies on, found in the centers. data scattered around the world.

But harnessing the full potential of modern processors requires software that supports parallel data processing, or “multithreading,” something Athena was never originally designed for. In addition, as the energy and brightness of the LHC beam continued to increase, the memory required to process each collision event also increased. As a result, ATLAS was not getting the best possible CPU performance from its resources, sometimes even keeping the cores idle, so as not to run out of memory. For LHC run 2 (2015-2018), ATLAS software developers implemented a more memory efficient multiprocess version of Athena (AthenaMP). Yet, faced with the demands of the high-luminosity LHC (from 2028), it was clear that something more drastic had to be done.

After several years of development, ATLAS Collaboration has launched a new “multithreaded” version of its analysis software, Athena.

In 2014, the ATLAS collaboration launched a project to rewrite Athena to be natively multithreaded (AthenaMT). Since writing multithreaded software can be very difficult, they decided early on that the framework should protect normal developers as much as possible from rewriting. It involved a dedicated team of core developers from ATLAS (and other experiments using Gaudi) thinking about new ways to process data. In traditional serial data processing, algorithms are executed in a strictly predefined order, processing events one by one as they are read from disk. AthenaMT needed to be more flexible, able to process multiple events in parallel while simultaneously analyzing multiple parts of a collision event (eg, tracking and calorimetry).

Figure 2
Figure 2: An example of multithreaded execution in AthenaMT. Four threads are displayed, each corresponding to a row. Different events are displayed with different colors and different algorithms are displayed with different shapes. Algorithms are executed as soon as their input data is available and a thread is free. (Image: ATLAS / CERN collaboration)

To do this, the team created a “planner,” which would examine the input data required for each algorithm and run the algorithms in the order required. For example, if a particular reconstruction algorithm needs certain inputs, it will not be scheduled to run until the algorithms producing that input are completed. The data entry can be either “event data” from the detector reading or “condition data” about the detector itself (eg, gas temperature, alignment, etc.). ). Figure 2 shows it in practice: each event has a different color and each shape represents a different algorithm. The scheduler uses the available resources to run algorithms as their inputs become available.

In addition to adding new features, such as the scheduler, all existing code had to be updated (and revised simultaneously) in order to run in multiple threads. As Athena has over four million lines of C ++ code and over one million lines of Python code, this was a huge task that involved hundreds of ATLAS physicists and developers for several years. The effort was guided by a core ATLAS software team, who also provided documentation and organized tutorials.

Figure 3
Figure 3: Memory usage of MT ATLAS reconstruction (including data quality monitoring) as a function of the number of threads in version 22 (blue triangles) compared to that of MP reconstruction as a function of the number of workers in version 21 (open green circles). Solid lines are linear adjustments. (Image: ATLAS / CERN collaboration)

The results have paid off. Figure 3 compares the AthenaMP software used in run 2 (“version 21”) to the new multithreaded AthenaMT (“version 22”) that will be used in run 3 (starting next year). From here you can see that migrating to a multithreaded approach (and other changes) has increased memory consumption per thread / process compared to Run-2 software. It also significantly improves memory usage: an eight-process task using the Run-2 software uses approximately 20 GB of memory, while an eight-threaded task using the Run-3 configuration uses approximately 9 GB. With this huge reduction in memory comes no reduction in event throughput – AthenaMT and AthenaMP run almost exactly the same time per event.

AthenaMT is now ready for Run 3 and is currently being used to re-analyze all Run-2 data from ATLAS. As this first step is completed, Athena’s global modernization mission continues. How will Athena work on new hardware architectures? And what new reconstruction and simulation techniques could be used? The ATLAS team has already started to explore interesting options in the development of computer hardware and software for the future.


Leave a Comment