(Continued from Part 2)

Now for most Exchange administrators, there’s not a lot worse than when one of your storage groups isn’t mounting. Worse things would include the RAID array dying and the server catching fire (maybe one as a result of the other), or a user who decides that the server room doesn’t need air conditioning when nobody’s working in there and shuts it off over a long weekend.

Not that the last one has ever happened to anyone. *Cough*

Unfortunately, because I was an idiot and didn’t copy the error messages at the time (I was more worried about getting the server back up and running), I can only summarize what happened.

  • Tried repeatedly to mount the database. As they say, if it doesn’t work the first time, it probably won’t work the seventh. Turns out, ‘they’ were right.
  • Ran ‘chkdsk /r’ on the RAID array containing the transaction logs, and then on the array with the .edb – no love, still no mounting
  • Tried every possible way to get eseutil /r to replay the transaction logs to the database, only to find that both were corrupt. Great.
  • Tried to restore the last backup using Backup EXEC. It didn’t work.
  • Admitted defeat and ran eseutil /p on the database.

Here’s the kicker: when running eseutil with the /p switch on a database that wasn’t shutdown cleanly or had the /r switch run on it first, all of the data in the transaction logs gets discarded. However, when they’re corrupt anyways, there’s really not a lot to lose.

When eseutil finally finished it’s repair after over an hour of grinding away, the database finally mounted. Heaving a sigh of relief I double-checked the tape and went home for night know I’d done all I could do. Surprisingly no one reported any missing emails the next morning, and I was able to grab a full backup of the server without issue.

When mid-afternoon rolled around, IBM Dude showed up with the ‘front diagnostic panel’, aka ‘the switch assembly’. We powered down the server, he ripped things apart, pulled out the old part, popped in the new one, and turned on the server.

Or at least, tried to turn it on.

–Click– *WHIRRRRRRRrrrrrr* –Click–

Fantastic. It looked like the first replacement switch assembly had the same problem. Ripping things apart again, IBM Dude swapped the replacement with the freshly ordered spare. Crossing our fingers, he tried the button again.

–Click– *WHIRRRRRRRrrrrrr* –Click–

Crap. At this point, he cut his losses and called for help. The suggestion? Replace the system board.

IBM Dude ordered the part, I booted the server again, once more relying on scotch tape to do it’s thing, and we made plans to have the board replaced the following Tuesday afternoon.

Will the gong show continue? Find out in Part 4.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>