In classic UC San Diego fashion, an overheard conversation in a campus coffee cart has turned into an interdisciplinary project that makes computer-intensive courses more exciting, saving more than $1 million dollars so far. The effort will give UC San Diego graduate and undergraduate students — and their professors — better hardware and software ecosystems for exploring real-world, data-intensive and compute-intensive projects and problems in their courses.
It all started when Larry Smarr, a computer science and engineering professor at UC San Diego, waited for coffee in the “Bear” courtyard of the Jacobs School of Engineering just over three years ago. While standing in line, Smarr heard a student say, “I can’t get a job interview if I haven’t been running TensorFlow on a GPU for a real problem.”
While this one student’s riddle may sound extremely technical and very specific, Smarr heard a common need; and he saw an opportunity. In particular, Smarr realized that innovations arising from a US National Science Foundation (NSF)-funded research project he leads — the Pacific Research Platform (PRP) — could be leveraged to create better computing infrastructure for university courses that are highly rely on machine learning, data visualizations, and other topics that require a lot of computing resources. This infrastructure would make it easier for professors to offer courses that challenge students to solve real-world data and compute-intensive problems, including things like what he heard from the coffee cart: running TensorFlow on a GPU for a real-world problem.
Fast forward to 2022, and Smarr’s spark from an idea has grown into a cross-campus collaboration called the UC San Diego Data Science/Machine Learning Platform or the UC San Diego JupyterHub. Through this platform, the low-cost, high-performance computational building blocks combining hardware and software that Smarr and his PRP collaborators designed for use in compute-intensive research across the country now also form the backbone of dynamic computing ecosystems for UC San Diego students and professors who machine using learning, data visualization, and other computer and data-intensive tools in their courses. The platform has been widely used in every division on campus, including courses taught in biological sciences, cognitive sciences, computer science, data science, engineering, health sciences, marine sciences, medicine, music, natural sciences, public health, and more.
It is a unique collaborative project that leverages federally funded innovations in computational research for use in the classroom. To make the leap from research to classroom applications, a creative and hardworking interdisciplinary team from UC San Diego came together. UC San Diego’s IT services/academic technology services have been booming. Senior architect Adam Tilghman and lead programmer David Andersen led the implementation, with leadership and financial support from UC San Diego CIO Vince Kellen and Academic Technology Senior Director Valerie Polichar. The project has already helped the campus avoid more than $1 million dollars in cloud computing spending, Kellen said.
At the same time, the project provides the UC San Diego community with tools to encourage the back-and-forth flow of students and ideas between classroom projects and follow-up research projects.
“Our students gain access to the same level of computing power that would normally only be available to a researcher using an advanced system like a supercomputer. The students explore much more complex data problems because they can,” said Smarr, who was also the founder. director of the California Institute for Telecommunications and Information Technology (Calit2), a partnership between UC San Diego and UC Irvine. Calit2 is now expanding to include UC Riverside.
One of the many professors from across campus who use the UC San Diego Data Science/Machine Learning Platform for courses is Melissa Gymrek, who is a professor in both the Department of Computer Science and Engineering and the Department of Medicine’s Division of Genetics.
Her students write and run code in a software environment called Jupyter Notebooks that runs on the UC San Diego platform. “They can write code in the notebook and press Run and see the results. They can build numbers to visualize data. We’re focusing a lot more on data visualizations now,” says Gymrek.
One of the thousands of UC San Diego students who have used the platform extensively is Xuan Zhang. Through the data- and visualization-intensive courses in CSE 284, Zhang realized that the higher-order genetic structures at the center of her chemistry Ph.D. thesis – R-Loops – may be regulated by the short tandem repeats (STRs) that are at the heart of much of the research in Gymrek’s lab. Without the computer infrastructure for real course problems, Zhang believes she wouldn’t have made the research connection.
After taking Gymrek’s course, Zhang also realized that she could apply to obtain her own independent research profile on the UC San Diego Data Science/Machine Learning Platform to access and build on all of her courses. to build. (If Jupyter Notebooks are hosted in the commercial cloud, students generally lose access to their data-intensive courses when class ends unless they download the data themselves.)
“I thought it was just for the course, but then I realized that Jupyter Notebooks are available for research, without losing access through the UC San Diego Jupyter hub,” said Zhang.
This educational infrastructure also benefits professors.
“With these Jupyter Notebooks you can automatically embed the assessment system. That saves a lot of work,” says Gymrek. You can indicate how many points a student gets if they get the code right, she explained. Before using this system, students sent PDFs of their problem sets, which took more time to mark. “It was hard to get past a dozen students. Now you can scale,” Gymrek said. She has even been able to expand access to her personal genomics graduating class to over 50 students, up from a dozen before she had access to these new tools.
Direct upload of assignments and grades to the campus learning management system, Canvas, is now also available.
“The platform is really transforming education. Unlike many innovations in learning technology, classes in every division of UC San Diego have used the Data Science/Machine Learning Platform. Thousands of students use it every year. It is innovation with real impact, preparing our students in many, sometimes unexpected, areas to be leaders and innovators when they graduate,” said Polichar.
Commodity hardware for research and education
“If you build your distributed supercomputer, like the PRP, on basic hardware, you can follow Moore’s Law,” explains Smarr.
Following this basic hardware strategy, Smarr and his PRP employees developed hardware designs that increase performance while prices fall over time. The computational building blocks developed by the PRP, which were reused by UC San Diego’s ITS, are rack-mount PCs with multi-core CPUs, eight graphics processing units (GPUs), and optimized for data-intensive projects, including accelerating machine learning on the GPUs. . These PCs run a wide variety of advanced software to help students program the system, record their results in Jupyter Notebooks, and run a variety of data analysis and machine learning algorithms on their problems.
Building on this standard hardware approach to high-performance computing, UC San Diego has been able to build a dynamic and innovative “on-premises” ecosystem for data and compute intensive courses, rather than relying solely on commercial cloud computing services.
“The commercial cloud doesn’t provide an ecosystem that gives students the same platform from course to course, or the same platform they have in their courses as they do in their research,” Tilghman says. “This is especially true in the graduate field where students start working in a course context and then continue that work in their research. It’s that continuity, even starting as a lower division undergraduate, all the way up. I think that’s one of the innovative benefits we give at UC San Diego.”
UC San Diego professors and students interested in learning more about the Data Science/Machine Learning Platform can find additional details and contact information on their website.
“I’ve been doing this for 50 years,” said Smarr. “I don’t know of many examples where I’ve seen such a close intertwining of research and education, all in a circle.”
This alignment of research and education fuels UC San Diego’s culture of innovation and relevance.
“It is essential to the nation that students across campus learn and work on computing infrastructure relevant to their future, whether in industry, academia or the public sector,” said Albert P. Pisano, dean of the UC San Diego Jacobs School of Engineer. “These information technology ecosystems created and deployed on campus are critical to enabling our students to leverage innovations to serve society.”
Visit the Pacific Research Platform website to watch a video that provides an overview of the Pacific Research Platform (PRP) and a selection of research projects the platform has made possible.
Larry Smarr is principal investigator of the PRP and related grants (NSF Awards OAC-1541349, OAC-1826967, CNS-1730158, CNS-2100237) administered by the Qualcomm Institute, the UC San Diego Division of Calit2.