Nvidia’s $129000 170 TeraFlop minisupercomputer for Artificial Intelligence

Early customers of Nvidia’s DGX-1, which combines machine-learning software with eight of the chip maker’s highest-end graphics processing units (GPUs), say the system lets them train their analytical models faster, enables greater experimentation, and could facilitate breakthroughs in science, health care, and financial services.

Data scientists have been leveraging GPUs to accelerate deep learning—an AI technique that mimics the way human brains process data—since 2012, but many say that current computing systems limit their work. Faster computers such as the DGX-1 promise to make deep-learning algorithms more powerful and let data scientists run deep-learning models that previously weren’t possible.

The DGX-1 isn’t a magical solution for every company. It costs $129,000, more than systems that companies could assemble themselves from individual components. It also comes with a fixed amount of system memory and GPU cards. But because the relevant parts and programs are preinstalled in a metal enclosure about the size of a medium suitcase, and since it pairs advanced hardware with fast connectivity, Nvidia claims the DGX-1 is easier to set up and quicker at analyzing data than previous GPU systems. Moreover, the positive reception the DGX-1 has attracted in its first few months of availability suggests that similar all-in-one deep-learning systems could help organizations run more AI experiments and refine them more rapidly. Though the DGX-1 is the only system of its kind today, Nvidia’s manufacturing partners will release new versions of the supercomputer in early 2017.

The DGX-1’s 3U chassis holds a dual 16-core Xeon E5-2698 v3 arrangement, 512 GB of DDR4-2133 LRDIMMs, four Samsung PM863 1.92 TB storage drives, dual 10 gigabit Ethernet (10GBase-T) as well as four EDR Infiniband connections. This system serves not only to feed the Teslas, but to further drive home NVIDIA’s scalability goals as well, with the Infiniband connections in particular put in place to allow for high-performance DGX-1 clusters. Of course with so much hardware on hand you’ll need a lot of power to drive it as well – 3200W, to be precise – as the 8 P100s alone can draw up to 2400W.

Fewer than 100 companies and organizations have bought DGX-1s since they started shipping in the fall, but early adopters say Nvidia’s claims about the system seem to hold up. Jackie Hunter, CEO of London-based BenevolentAI’s life sciences arm, BenevolentBio, says her data science team had models training on the system the same day it was installed. She says the team was able to develop several large-scale models designed to identify suitable molecules for drugs within eight weeks. These models train three to four times faster on the DGX-1 than on the startup’s other GPU systems, Hunter says. “We had multiple models that originally took weeks to train, but we can now do this in days and hours instead,” she adds.

Massachusetts General Hospital has a DGX-1 in one of its data centers and has one more on order. It says it needs GPU supercomputers such as the DGX-1 to crunch large volumes of dissimilar types of data. MGH’s Center for Clinical Data Science, which is coördinating access to the hospital’s DGX-1 across the Boston-area PartnersHealthCare system, says projects using the supercomputer will involve analyzing pathology and radiology images, electronic health records, and genomic information.

“If you’re incorporating not just x-rays, but a whole host of clinical information, billing information, and social media feeds as indicators of a patient’s health, you really do need large amounts of GPU computing power to crush that,” says center director Mark Michalski.

Nvidia has more information on DGX1 and their deep learning solutions