ALCF Summer Students Gain Hands-On Experience with AI and Supercomputing – HPCwire

The Argonne Leadership Computing Facility’s (ALCF) summer internship program was held virtually for a second consecutive year, utilizing online collaboration tools to reframe many elements of a traditional laboratory research experience.

ALCF Summer Students (left to right) Sahil Bhola, Mansi Sakarvadia, Gaurav Verma, and Pam Savira.

“There’s no substitute for working with people in a lab,” said ALCF Director Michael E. Papka. “But we’ve tried to approximate the in-person mentorships, group meetings, and hands-on activities using the considerable tools we have on hand, to deliver valuable experiences for everyone involved—including the staff scientists who are mentoring these young investigators.”

Each year, the ALCF, a U.S. Department of Energy (DOE) Office of Science User Facility at Argonne National Laboratory, brings in a new class of summer students, ranging from high school seniors to doctoral candidates, through programs such as DOE’s Science Undergraduate Laboratory Internships (SULI) program and Argonne’s Research Aide program, to work alongside staff mentors on real-world research projects that address issues at the forefront of scientific computing. ALCF hosted 33 summer student researchers this summer, including computer science undergraduate Mansi Sakarvadia, whose research involved optimizing deep learning applications for supercomputers.

“I really wanted to see if working in a large laboratory setting was a future direction I was interested in, and I found out it was,” said Sakarvadia.

This year’s summer student projects included the following:

Using Jupyter Notebooks for Simulation Science

For domain scientists and students, it can be challenging to use high-performance computing (HPC) resources, and many of them prefer accessing those resources using a web-based interface, such as a Jupyter Notebook, instead of a command-line interface.

International student Pam Savira, who studies computer science at the University of St. Thomas, looked for ways to allow researchers to easily create and interactively analyze large-scale simulations using Jupyter Notebooks.

“We implemented an in-situ workflow that allows data to be processed as it is being generated,” said Savira. “We ran the executable file containing MPI code in a separate notebook across multiple nodes of a computer cluster, and then piped the output into the same notebook and converted the data into numbers for subsequent analysis.”

Savira said she and her team plan to implement machine learning (ML) on the analysis part of their project, particularly in the area of feature tracking.
“This experience helped me make informed decisions on whether to pursue a PhD degree and gave me a taste of what a research process looks like,” said Savira. “Research is not a linear process. Feeling comfortable with uncertainties surrounding the process is necessary to remain calm and think clearly. Certainly, this experience granted me that.”

Developing Infrastructure for Data and AI Models

With the burgeoning number of ML and artificial intelligence (AI) applications in HPC, it is critical to have a unified platform that can store ML/AI models and datasets and related components to enable interoperability and to assist in pipeline development.

Stony Brook University PhD student Gaurav Verma worked with colleagues at ALCF and DOE’s Lawrence Livermore National Laboratory on an HPC-FAIR project that aims to develop a generic data registration and retrieval framework to make training data and AI models “findable, accessible, interoperable, and reusable,” or FAIR.

The HPC-FAIR framework significantly speeds the research of ML-based approaches for analyzing and optimizing scientific applications running on heterogeneous supercomputers. Verma is working to develop uniform representation to demonstrate data and AI-models to integrate into C++/Python applications, and to design and develop application programming interfaces (APIs) to load and save data sets and AI-models. These APIs will incorporate HPC and scientific ML workloads.

“I am looking forward to continuing my work with ALCF and to contribute my research capabilities in the best way possible,” said Verma. “This experience has motivated me to target higher avenues, including a PhD in computer science, that can contribute to the HPC and scientific community.”

FLOPs-Aware Deep Learning

Sakarvadia, who also studies computer science and mathematics at University of North Carolina at Chapel Hill, focused on constructing neural networks that optimize for peak FLOPs (floating-point operations per second) on a given architecture and measuring their impact on the performance of model convergence and prediction.

Sakarvadia’s research explored novel strategies for batching, layer decomposition, and model parallelism to maximize single device throughput for training and inference.

“I established a clear methodology to scale neural networks to enable improved hardware utilization and showed that my high-FLOPs networks had little to no degradation on model accuracy compared to their respective original networks,” said Sakarvadia, who plans to test her work on one or more ALCF GPU architectures next summer. “Overall, I learned many new strategies for how not to get stuck in my research process.”

Scalable Deep Reinforcement Learning for Scientific Simulations

Sahil Bhola, a recent graduate of University of Michigan’s aerospace engineering program, implemented a multi-fidelity environment to accelerate policy learning by using information from multiple environments. The idea was to spend more time learning on a source task which is relatively inexpensive and use the knowledge to support learning on the computationally expensive target environment. Bhola’s team leveraged the available distributed architecture via distributed learning to further accelerate policy learning by on-line policy sharing amongst RL agents.

“The idea of applying reinforcement learning techniques to fluid dynamics problems for specific controls and designs was really interesting to me,” said Bhola.

Assistant Computational Scientist Romit Maulik was already working on a similar project in Argonne’s Laboratory for Applied Mathematics, Numerical Software, and Statistics, so Bhola reached out to Maulik, who agreed to mentor his research.

“We decided to use a hierarchy of environments, and essentially a multi-fidelity approach, and execute those actions on a low-cost environment,” said Bhola. “We would then transfer the knowledge we gained from that environment to the next environment, thus improving our learning process.”

Bhola and Maulik found they were able to explore more on the Low-Fidelity environment, and upon transferring, they could gain rich information from the High-Fidelity environment — narrowing down on the policy

“Although traditional RL can get computationally expensive can get computationally expensive, I think the community is moving in a direction that promotes computationally cheaper and more affordable learning algorithms, and I was really proud to be on a team moving in that direction,” said Bhola.

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://​ener​gy​.gov/​s​c​ience.

Source: Logan Ludwig, The Argonne Leadership Computing Facility

Spread the love

Leave a Reply

Your email address will not be published.