Nvidia cuQuantum and Xanadu Pennylane Enable Supercomputer Quantum Simulation

Scientists are accelerating quantum simulations for the first time at supercomputing scale, thanks to NVIDIA cuQuantum with Xanadu’s PennyLane. Shinjae Yoo is leading the computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory where many researchers are running quantum computing simulations on a supercomputer for the first time, thanks to new software.

The Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC), is using the latest version of PennyLane, a quantum programming framework from Toronto-based Xanadu. The open-source software, which builds on the NVIDIA cuQuantum software development kit, lets simulations run on high-performance clusters of NVIDIA GPUs.

They are using 256 NVIDIA A100 Tensor Core GPUs on Perlmutter to simulate about three dozen qubits — the powerful calculators quantum computers use. That’s about twice the number of qubits most researchers can model these days.

I, Brian Wang of Nextbigfuture, interviewed Sam Stanwyck, quantum computing product lead at NVIDIA. We discussed the new Nvidia, Xanadu and Brookhaven work.

Quantum simulation is being used to solve problems in high-energy physics, machine learning, chemistry and materials science.

“When we started work in 2022 with cuQuantum on a single GPU, we got 10x speedups pretty much across the board … we hope to scale by the end of the year to 1,000 nodes — that’s 4,000 GPUs — and that could mean simulating more than 40 qubits,” Lee O’Riordan (senior quantum software developer at Xanadu) said. O’Riordan co-authored last year a paper on a method for splitting a quantum program across more than 100 GPUs to simulate more than 60 qubits, split into many 30 qubit sub-circuits.

They used new techniques to show that circuits in the Quantum Approximate Optimization Algorithm (QAOA) with p entangling layers can be simulated by circuits on a fraction of the original number of qubits. They investigate the practical feasibility of applying the circuit cutting procedure to large-scale QAOA problems on clustered graphs by using a 30-qubit simulator to evaluate the variational energy of a 129-qubit problem as well as carry out a 62-qubit optimization. Paper – Fast quantum circuit cutting with randomized measurements.

It is more than just increasing the simulated qubits. There is qubit gate depth. Real quantum hardware lose the ability to support and sustain calculations beyond dozens to perhaps one hundred quantum gate depth. Simulated qubits do not lose quantum entanglement so they can go for millions to billions of gate depth. Gate depth can be equated to the number of steps in an algorithm. Being able to run longer and more complicated programs means bigger problems can be solved.

It is difficult to scale up qubits in quantum simulation. Regular supercomputers face exponential computing power scaling to get each additional qubit. Speeding things up by hundreds of times just gets four more qubits. There are other algorithms and shortcuts which allow higher numbers of qubits to be simulated. Those other approaches allow 100 to 150 qubits to be simulated. There are reasons of why a researcher would choose a more precise simulation versus a different kind of simulation.

The other factor is how fast does the simulation operate. Speeding up the overall simulation of say the same 36 qubit simulation from two years down to one week or a couple of days, is also a huge improvement.

Quantum simulation on classical supercomputers work together with real quantum hardware. The quantum simulations can guide and check the expected and actual results from the hardware. When quantum hardware surpasses simulation in terms of numbers of qubits then the simulations will be used to check different sections of the hardware.

Quantum hardware and quantum simulation are used to handle sections of problems where quantum algorithms improve the answer. There is no need to use quantum computation on parts of problems where regular computers will work fine.

Nvidia has experience over the past few decades where GPU (graphical processing units) have been co-processsors to regular CPUs (central processing units). There are TPU (tensor processing units) and other AI specialized processors. Nvidia is developing the full software stacks to make it easy for programmers to switch between the different specialized computing platforms.

In the future and now, there will be many different kinds of computer processing. The different systems might be in the same laptop or cellphone or in the same data center or certain processing could be available via cloud to another data center.

It is the nature of the problem and the value of the problem that determines how the switching needs to be done. You will start trying to solve on the cheapest and most abundant processing of regular CPUs and then a controlling or supervising system would determine when the overall solving time would be less by taking the overhead of sending part of a problem for solution on a GPU and another part to get to a TPU and another to a QPU (quantum processing unit). The Quantum processing could be simulated and there could be many kinds of real quantum hardware.

Nvidia and Xanadu are part of a quantum industry of a hundreds companies working on the quantum hardware, software and simulation.