adventures in linux

Today I wasted a lot of time on a Computer Maintenance Issue. I haven’t had to do this in a while, so I thought I’d document the experience for posterity.

This afternoon I was installing gstreamer 0.10 in /usr/local for fun, and I needed to install a dependency. I launched synaptic, and it took fooooorrrrever and never showed up. But the disk was cranking away, so obviously something was happening. I tried logging out, and logging back in, and it hanged on the splash screen. Uh oh. So then I rebooted, and again it hanged on the splash screen. The disk would churn in churn, although I couldn’t see any swap activity or which process was at fault. After 20 minutes it eventually logged in, but the panel was crashed. Things were Bad.

I spent a lot of time looking for errors, watching logs, all of the usual suspects. Other profiles worked, so it was something about my user account that was fucking up the login. I spent a lot of time moving .gnome2, .gconf, and all that stuff. I tried replacing those dirs with Known Good dirs from other accounts. No matter what I did, I got the same behavior.

I eventually contacted my brother Peter, who suggested I do everything I had already done. I asked him if he could find a tool that would show me what process was accessing the disk, because that would help narrow down what was going on.

He pointed me to iostat, which doesn’t tell you which process is doing what, but it gives details about how much is being read, and how much is writes. Interestingly, nearly all the activity was reads.

This got me thinking that something was happening when programs started that caused them to do massive reads. I logged into an account and let it hang. Then I switched to a text terminal and ran gnome-terminal –display=0:0 to launch it on the desktop. Then I ran strace gedit, and looked at what it was doing. As it launched, it started reading every single directory in my 40 gig home directory. Read read read.

Then it clicked: fonts. Earlier that day I had been using scribus, and it wasn’t seeing all my fonts. Since it looked like a KDE program I ran kfontinst on my ~/.fonts dir. That was around the time I started having problems! If gedit was reading every directory for fonts on startup, that meant kfontinst had somehow told the font server that my home directory was a font directory, and the font server was dutifully searching the whole fucking tree.

I poked around the dot files in my home dir, and sure enough, .fonts.conf had a tiny little line at the bottom:
<dir>.</dir>
I deleted it, restarted gedit, and it came right up. After that I just had to restore my settings from the backups, and everything was back to normal.

This took a long, long time, because my normal debugging methods weren’t working. Nothing was showing up in logs because nothing was technically wrong. And although the disk was churning, it was all reads, so I couldn’t locate a gigantic exploding file. And because every GNOME program was triggering the problem, I couldn’t tell that it was happening to all of them instead of one bad app.

Maybe I did something wrong, but I haven’t had to deal with this shit in a long time. It’s hard to resist giving a big FUCK YOU to KDE for ruining my evening — but in the end it was probably my fault. At least I didn’t go to the next step, which was going to be recreating my entire profile from scratch.