Rutgers University has received and installed in phases a supercomputer that should significantly enhance the university’s ability to do large-scale computational-based research and data analytics, and will establish New Jersey's reputation in advanced computing.
The supercomputer, which is named “Caliburn,” is the most powerful system in the state, the university said a release. It was built with a $10 million award to Rutgers from the New Jersey Higher Education Equipment Leasing Fund. The lead contractor is High Point Solutions (Sparta), which was chosen after a competitive bidding process. The system manufacturer and integrator is Super Micro Computer, of San Jose, California.
The updated Top 500 ranking of world’s most powerful supercomputers ranks Rutgers’ new academic supercomputer as #2 among Big Ten universities, #8 among U.S. academic institutions as a whole, #49 among academic institutions globally and #165 among all supercomputers worldwide.
“This new system will give Rutgers the high-performance computing capacity that our world-class faculty needs and deserves, particularly as the use of computation and big data have become key enablers in nearly every field of research,” said Christopher J. Molloy, senior vice president for research and economic development. “We are extremely appreciative of the state’s support for this initiative, which is a great investment in the university and ultimately the future of New Jersey.”
Manish Parashar, distinguished professor of computer science at Rutgers and founding director of the Rutgers Discovery Informatics Institute (RDI2), is heading the project. Parashar and Ivan Rodero, associate director of technical operations at RDI2, designed the system with a unique architecture and capabilities.
For example, the computer is among the first in the world to use a new network communications interconnect developed by Intel (Omni-Path), and it is among the first clusters to use the Intel Omni-Path Fabric and the latest Intel processors. The idea, Parashar said, is to be able to move data very quickly. “With big data, it is more expensive to move data than it is to compute on the data, so we needed to come up with a way to be able to move data quickly. That’s why we are integrating this technology into the system.”
“This computer was designed to do computational analysis and data analytics on a very large scale,” Parashar added. For example, he said, you can run simulations of very complex items like drug designs or genome sequencing. “When studying these things, you need to be able to run them on very large systems because you want to be able to do a very high-resolution simulation, so you can understand complex physics. And when you want to see what happens over time, you want to be able to run them for a long time.”
Parashar and Rodero designed the computer to be able to perform a mix of large computations, but also to deal with big data. “We worked with Super Micro to come up with the combinations of technologies for the system,” Parashar told us. “We have also been working with the Rutgers community for some time and researchers from other places. We understand the mix of types of things people would like to do. We quickly realized that one out-of-the-box system wouldn’t meet everyone’s needs.”
RDI2 is phasing out Excalibur, Rutgers' IBM Blue Gene/P HPC system installation. Excalibur consisted of two full racks of IBM BlueGene/P. Each rack had 1,024 quad-core processors, with each processor having 2 gigabytes (GB) of dedicated random-access memory (RAM). There are 2,048 total processors, 8,192 total cores and just over 4 terabytes (TB) of total memory. Excalibur, a university-wide resource, was served by a 228 TB General Parallel File System (GPFS), with 1.2 TB of faster solid-state drive (SSD) storage.
Parashar said that the IBM system had run through its lifetime in terms of its computational power, and Rutgers needed a new state-of-the-art system in order to remain competitive. The old IBM system will be kept for academic use, if warranted. “We are exploring who can use the system. It is still operational; it is just not state-of-the-art right now. We have to look at how much use we can still get out of it because it costs money to power it up and run it.”
Along with users at Rutgers, Caliburn will be accessible to researchers at other New Jersey universities and to industry users. RDI2 will work with the New Jersey Big Data Alliance, which was founded by Rutgers and seven other universities in the state, to create an industry user program.
The Caliburn project was built in three phases. Phase I went live in January, and provides approximately 150 teraFLOPS (TFLOPS) of computational and data analytic capabilities, as well as one petabyte of storage, to faculty and staff researchers throughout the university.
Phase II of the construction added a new self-contained modular data center at Rutgers University–New Brunswick. Phase III encompassed the installation of the Caliburn supercomputer and final elements of the network, which provides users with high-speed access. The Super Micro solution is based on a FatTwin SuperServer system. It has 560 nodes, each with two Intel Xeon E5-2695 v4 (Broadwell) processors, 256 GB of RAM and a 400 GB Intel NVMe drive. Overall, the system has 20,160 cores, 140 TB of memory and 218 TB of nonvolatile memory. The performance is 603 TFLOPS, with a peak performance of 677 TFLOPS.
To date, there have been more than 100 users from 17 departments university-wide. The system has delivered over 2.6 million computing hours and 100 TB of storage to the Rutgers community over the past few months. Among the heaviest users have been researchers at the Waksman Institute of Microbiology, the Department of Physics in Camden, Department of Physics & Astronomy in New Brunswick, Department of Chemistry in Newark and the Center for Integrative Proteomics Research.
“Because our previous architecture only supported one type of problem, building this architecture lets us reach out to a broader set of users, not only in the traditional sciences but in the humanities and in the business school and other places,” Parashar stated.
“Also, because of the size—and you are talking about over 20,000 small computers working together, which is more than double of what we had before—we can solve problems that are much larger, for example, understanding smart infrastructure, social media and drug design. We can do things on much larger scales,” he concluded.