The deep learning revolution has been enabled in large part by GPUs, and more
recently accelerators, which make it possible to carry out computationally
demanding training and inference in acceptable times. As the size of machine
learning networks and workloads continues to increase, multi-GPU machines have
emerged as an important platform offered on High Performance Computing and
cloud data centers. As these machines are shared between multiple users, it
becomes increasingly important to protect applications against potential
attacks. In this paper, we explore the vulnerability of Nvidia’s DGX multi-GPU
machines to covert and side channel attacks. These machines consist of a number
of discrete GPUs that are interconnected through a combination of custom
interconnect (NVLink) and PCIe connections. We reverse engineer the cache
hierarchy and show that it is possible for an attacker on one GPU to cause
contention on the L2 cache of another GPU. We use this observation to first
develop a covert channel attack across two GPUs, achieving the best bandwidth
of 3.95 MB/s. We also develop a prime and probe attack on a remote GPU allowing
an attacker to recover the cache hit and miss behavior of another workload.
This basic capability can be used in any number of side channel attacks: we
demonstrate a proof of concept attack that fingerprints the application running
on the remote GPU, with high accuracy. Our work establishes for the first time
the vulnerability of these machines to microarchitectural attacks, and we hope
that it guides future research to improve their security.