Evan Carlin

We run an internal GPU server at work. The scientists share the resource and access them through Jupyter servers. Sometimes we run into issues where GPUs report being in use even though users aren't actively using them. This is caused by Jupyter kernels left idling that once upon a time were using the GPU.

Running nvidia-smi will tell you the PID of each process and what GPU it is using. You can then feed that into the script below which will tell you which container is running that PID.

Someday I'll figure out if there is a fix to idling kernels holding resources. But for now this will do.

#!/bin/bash
set -eou pipefail

pid=$1
for c in $(docker ps --format '{{.Names}}'); do
    if docker container top "$c" '-o pid' | grep "$pid" > /dev/null 2>&1; then
        echo "$c"
        break
    fi
done