We run an internal GPU server at work. The scientists share the resource and access them through Jupyter servers. Sometimes we run into issues where GPUs report being in use even though users aren't actively using them. This is caused by Jupyter kernels left idling that once upon a time were using the GPU.
Running nvidia-smi
will tell you the PID of each process and what
GPU it is using. You can then feed that into the script below which
will tell you which container is running that PID.
Someday I'll figure out if there is a fix to idling kernels holding resources. But for now this will do.
#!/bin/bash
set -eou pipefail
pid=$1
for c in $(docker ps --format '{{.Names}}'); do
if docker container top "$c" '-o pid' | grep "$pid" > /dev/null 2>&1; then
echo "$c"
break
fi
done