Going one level deeper: The proc filesystem (procfs)

At work our CTO has started doing "Software Life Stories". The idea is he will bring in someone who has had a career as a programmer to talk to us over lunch. For our first installment he brought in his friend Tom Cargill.

Tom has been a programmer for longer than I've been alive and had a wealth of information to share. One question I asked him was if he could point to specific things he did that made him a better programmer One of the answers he gave was that he understood what he was working on at one level deeper than the current problem he was solving. For example, he said while working on debuggers he understood the machine code beneath the language he was writing a debugger for.

As part of Tom's career he worked at the Bell Labs. While there he helped invent the proc filesystem. He mentioned this in passing saying that it helped him build a debgger. I had heard of proc before but didn't know anything about it. A few days later I was checking in on the health of our servers and applications. The script we use to do that had identified a problem. To understand what might be going on I first needed to understand what the script was doing. While reading I saw "/proc/$pid/cwd". Shoot there's proc again. Time to take Tom's advice and go one level deeper to understand what the heck it is. Or at the very least scratch the surface of the next level.

This article seems like a good overview of procfs. Take that from someone who doesn't know anything about procfs so everything in there could be a lie. The high-level view is it dumps kernel datastrcutures to files so applications can learn more about what is going on in the system. So, the "/proc/$pid/$cwd" above is getting the current working directory of $pid. Interesting! Systems that know about themselves and you can ask questions of are one's I like working on.

I messed around some with the examples in the article. This one jumped out to me:

~$ cat
# In another terminal window
~$ pgrep -x cat
187843
~$ echo foo > /proc/187843/fd/0
# foo will appear in the other terminal window

Cool. So, we can write to fd/0 (STDIN) of the process. Since that process is cat it writes out to STDOUT whatever is input on STDIN. I wonder if we can read from STDOUT?

~$ tail -f /proc/187843/fd/1
# In another terminal window
~$ echo foo > /proc/187843/fd/0
# crickets in the window running tail...

Hmm, nothing is ever quite as easy as I hope it will be. I don't understand why I can't just listen to STDOUT but it has something to do with the fact that cat is connected to a tty. In an effort to not go too deep I'll save that rabbit hole for another day.

As I said earlier Tom mentioned that that he helped to invent this filesystem. I looked up the paper about the invention of /proc. It is shockingly readable and short for a technical paper so I definitely think it is worth a read.

If you happen to be trying to run the examples on your own machine you may have found that /proc doesn't exist. On some flavors of Unix there is no /proc. This got me curious about the reasons why. It seems that it can be unsafe. FreeBSD had it at one point but removed it. There is much hand waving aobut the reasons why it was removed. My takeway from that is that to build anything mission critical on top of proc I'll need to understand it a bit more and some of the race conditions that can crop up when accessing the files in there*.

That's all for today. I'll call it 5% of a full level deeper.

* While looking into procfs I came across a really neat YouTube video of how file descriptors can be used to prevent race conditions when accessing files. I rarely watch YouTube videos about coding but that video is worth a watch and makes me want to see more of his videos.