MAY
19
2012

An Unexpected Journey

Recently, when building qt5, I'd started noticing some very strange errors from the configure script. The errors seemed to indicate that an awk script was being used as a filename - very strange. Even stranger was that other people weren't hitting this issue - just me. Never a good sign. Today, I finally got around to debugging it and the issue was rather weird.

My initial thought was that my version of bash was incompatible with the script in some way, so I copied the code for the function that was erroring and made a standalone version - it worked fine. I then tried to run the configure script using sh -x to watch it was doing, but unfortunately that seemed to confuse the script. Finally, I started to read the code. The relevant function reads:


$awk ' BEGIN {

lots of awk

} '

Finally, I spotted that $awk was not being set. If you look at the script then you can see that the qt5 configure looks for gawk, nawk then finally awk. So, the obvious step was to see if I had a version of awk installed. Using 'which' told me I didn't have one, but using rpm -q told me I did - curiouser and curiouser.

I checked what the gawk package installed and I could see that at least part of it was there, but running it gave an IO error - very weird. In fact, I could see that I was running a symlink that pointed to a file that wasn't there. At this point, I assumed a bad update of some kind, so I ran:

zypper install gawk

It said my gawk was the lastest. I then tried

zypper install -f gawk

to force the install and it gave an IO error. Spot the pattern? Every
time I try to do stuff to the gawk binary I get an IO error. At this point, I looked at /var/log/messages and saw a lot of messages like this:

May 19 18:51:06 linux-h33o kernel: [ 58.719815] EXT4-fs error (device
sda6): ext4_ext_check_inode:403: inode #393297: comm rpm: bad header/extent:
invalid magic - magic 0, entries 0, max 0(0), depth 0(0)

Not good.

This seemed like some kind of file system corruption, so I backed up my files immediately before I carried on investigating. Running fsck from a rescue system showed a few errors, but nothing major, and all were fixable. After allowing the repair I rebooted, sure in the knowledge that the problem was solved.

All I needed to do was reinstall the corrupted gawk package:

linux-h33o:/home/rich/src # zypper install -f gawk Loading repository
data...
Reading installed packages...
Forcing installation of 'gawk-4.0.0-3.1.2.x86_64' from repository 'openSUSE-12.1-Oss'.
Resolving package dependencies...

The following package is going to be reinstalled: gawk

1 package to reinstall.
Overall download size: 820.0 KiB.
No additional space will be used or freed after the operation.  Continue? [y/n/?] (y):
Installing:
gawk-4.0.0-3.1.2 [error] Installation of gawk-4.0.0-3.1.2 failed: (with
--nodeps --force)
Error: Subprocess failed. Error: RPM failed: error:
unpacking of archive failed on file /bin/gawk: cpio: rename failed -
Input/output error error:
gawk-4.0.0-3.1.2.x86_64: install failed

Oh dear. It's not good when fsck says the file system is ok, but the driver disagrees.

At this point, I started doing some serious googling to figure out wtf was going on. Happily, I came across the following bug report that let me resolve the issue https://bugzilla.kernel.org/show_bug.cgi?id=32182 . By following the debugfs steps described, I was able to kill the bad inode. Thank fully this fixed my file system.

So what can we learn from this? Obviously, we can learn that the configure script wasn't the source of the problem, but simply the way it manifested. It also shows that the configure script has a bug in that it doesn't report when awk is missing. We can also learn that a bit of googling can solve a lot of problems.

A final point to note if anyone is considering doing evil things with debugfs like this is that as soon as I figured out I had a corrupt file system I made a backup. This gives you a nice warm glow inside as you know you can attempt things that would otherwise be insanely risky.