I came back from vacation the other day to find that some computers on our primary domain (example.local) were unable to access shares on a secondary domain (test.local) located in another building, accessed via a wireless link). When attempting to open the share (or just browse to the Domain Controller), the following error would appear:

Share Error

"There are currently no logon servers available to service the logon request."

Google’ing did no good, as there were only vague references to DNS issues and WINS servers (the later of which we don’t use). As nothing had changed in the environment recently, I was at a bit of a loss. I could ping the DC (Homer) in question, and even RDP to it, but I couldn’t for the life of me access the share. NSLOOKUP behaved normally, but then I had a thought — the DC that I couldn’t access was also acting as a DNS server (the primary one for test.local) with example.local as a Secondary Zone (which, of course, contained the DNS entries for the computers that were having trouble accessing the secondary domain). When I loaded the DNS manager and clicked on that zone, I was immediately greeted with an error stating the following:

DNS Error

Turns out, there *was* a DNS problem!

The problem was that I had removed a DNS server over a year ago and it was still referenced as the primary DNS server for this zone. For some reason, the Windows DNS service had just now decided this was a problem and stopped grabbing copies of the zone from the functional secondary DNS server.

To fix this, I simply right-clicked on the zone, chose Properties, and then removed the offending server IP from the General tab and updated with the correct servers and order. As soon as I finished, the computers had no trouble accessing that DC again. Magic!

Yes, it is possible. It’s not pretty by any means (a proper Class 2 SSL Certificate is the best way to go), but it can be done. Click Continue Reading for the process.

More »

For a while now I was having problems opening Word and Excel (2007 and 2010) documents on my work computer. Most of the time everything would work, but every now-and-again I’d go to open something and Word or Excel would report that it was “Downloading <filename>”, and simply get stuck. Although I could click the little ‘X’ to cancel and close the window, the process for either Word or Excel would stay active, and any attempts to kill it would fail. In the end, I’d have to hard power off the computer to get it to shutdown, and then do a cold boot.

'Downloading' an Excel Workbook

Oh, 'Downloading' message, how I hate thee.

I wasn’t really bothered by it until a few of my users started reporting the same problem. I had a look in to it, and after a lot of fiddling, came across two Microsoft Knowledge Base articles that eventually led me to a solution.

An Office program is slow or may appear to stop responding (hang) when you open a file from a network location

The program stops responding when you try to open or to save a file in an Office 2002 program, in an Office 2003 program and in an Office 2007 program

By adding the registry value from the first KB article linked above (EnableShellDataCaching), and by removing the Group Policy object that was creating a persistent drive mapping and replacing it with a login script (below) to map the drive, I haven’t had any further reports of the problem.

REM Login Script – Paste these lines in to a batch file, and add that .bat file to a GPO

net use z: /delete
net use z: \10.0.0.100share
Note the use of the IP Address, rather than the Fully Qualified Domain Name (FQDN) – this was essential to getting things working in the end.

Oi. Symantec is definitely giving me a lot to blog about recently.

I logged in to one of our public file servers today for a weekly inspection, and as is someone common was greeted with a dozen reports from Symantec Endpoint 11 of infected files being deleted. It’s not uncommon for our clients to open malicious attachments, visit shady websites, and generally make a mess of things, but a combination of good ACL’s, Deep Freeze, and SEP 11 on the server have kept things clean.

So, after reading through the alerts and verifying SEP cleaned all of the detected files, I ran Live Update followed by a Full System Scan, as is standard procedure. Out of curiosity, I watched the first part of the scan process, when I noticed it pause on these files:

c:windowshide_evr2.sys

c:windows9129837.exe

d:autorun.inf

The first two file names made me worried, and the third a little more so, if only because D: is another RAID array and therefore has no reason to have an Autorun.inf. After a little digging, however, I found that none of these files seemed to exist on the server. Now I started thinking ‘rootkit’.

Sure enough, a quick Google later showed that yes, these files are common to a number of different rootkit variants. As such, I busted out my usual toolkit of malware detection/removal utilities and took the server offline.

As I dug deeper in to the server, though, I still couldn’t find any traces of the mentioned files. I tried several different rootkit tools, browsing the hard drive contents from a Linux LiveCD, and even a few tools to check ADS (Alternate Data Streams), but had no luck.

At this point, I was fairly convinced that the server was clean, however why would Symantec report those files as present, unless…. Digging a little further in to the results from Google, I found this forum thread: http://www.antionline.com/showthread.php?t=278671 – apparently, during the initial part of the scan, Endpoint doesn’t actually report just the files that it’s scanning, it also reports the name of the files it’s looking for.

So, a little life lesson - don’t assume that Symantec will do anything that makes sense. And, when in double, Google is still you’re friend – you just need to look harder.

Sample Symantec Endpoint scan showing a non-existent file

Sample Symantec Endpoint scan showing a non-existent file

The TL;DR version: The scan status on Symantec Endpoint 11 doesn’t just show the actual files on the computer, but it also shows non-existent files that it’s looking for. When in doubt – verify manually!

This morning, I received an email from a charity I do some consulting for saying that they were getting a Low Disk Space warning on their primary terminal server. After remoting in, I confirmed that on the 120GB primary partition, there was less than 100MB free. Odd, considering that the server only has about 40GB worth of user files on it.

A quick check (done by selecting likely folders in the root of the drive and opening the properties window) confirmed that C:ProgramData was using an extra 40GB space that it shouldn’t. Further digging revealed that C:ProgramDataSymantecSymantec Endpoint ProtectionXfer contained somewhere in the neighbourhood of 48,000 file, each ~20KB in size.

Solution? Delete and recreate the Xfer folder, then run Live Update again. Low disk space problem solved, but would someone at Symantec care to explain just what the hell happened?

Update: Found a temporary fix here: http://www.symantec.com/connect/forums/symatec-ep-making-alot-files-under-xfer-folder

Apparently, the issues results from EndPoint rescanning files in quarantine every time new definitions arrive. If you have a lot of files in quarantine, your disk space will disappear that much faster. Go figure. Apparently they’ve fixed some instances of this, but not others, as it was supposed to have been solved in MR4, but is still present in MR4 and MR5.

When it comes to naming conventions, everyone will give you a different answer. Some people will say the names should be based on location, like “LIVINGROOM”, “BEDROOM”, “SHOWER”, etc…. Others will say they should be named based on what they do, as in “WEBSERV1″, “PRINTSERV3″, “PRONSTOR99″, etc…. A lot of people tend to name their machines after asset tags, or the people who use them.

Myself? I like to name machines after comic book characters. My current lineup is: “CALVIN”, “HOBBES”, “SATCHEL”, “BUCKY”, “OPUS”, “BILL-THE-CAT”, and this server, “STEVE-DALLAS”.

What can I say?

If you came here looking for information on where to find the power button on an IBM x3400 or x3500, check this post instead.
(Continued from Part 3)

So Tuesday afternoon rolled around. I ran a manual backup of the Exchange server before IBM Dude came around and did a test restore to make sure everything was working, much like I should have done last time. As soon as he arrived, we powered down the server and swapped out the board. After everything was back in place, we crossed our fingers and pressed the power button.

-Click- WHIIIIIIIRRRRRRRRRRRRR

As the server powered on, we noticed two things. One was that the server sounded like a hurricane. With most servers, be they IBM or Dell, when you first turn them on all of the fans will spin up to full power, then settle down. In this case, the fans spun up, then stayed up. We could barely hear each other. The other thing we noticed, however, was an error message on-screen:

1604 Machine type mismatch detected

Neither of us panicked, though – we still had to flash the BIOS so we could put in the correct Machine Type and Serial numbers. The fans were starting to get annoying, though.

After the machine booted off the update CD, I plugged in the right numbers, double-and-triple-checking them, then let it do it’s thing. When it rebooted, the fans were as loud as ever, and, unfortunately, the error persisted.

1604 Machine type mismatch detected

Popping in to BIOS, I double-checked the Machine Type – it was set correctly. We both scratched our heads, and then noticed that the part number on the new board was different from the old one. In fact, after IBM Dude did a little searching, he found the new board was actually for an x3500, although it was supposedly a valid substitutable part. Regardless, and believing we’d found the problem, he ordered a new board of the correct part number and promised he’d be back Friday with the correct part. In the mean time, the server was still running, albeit a little slower and a lot louder, but at least now the power button was fixed and tape was no longer required.

More »

I’ll just sum this one up, as it’s pretty boring, but there are some important details.

1) Computer account for a Domain Controller/Global Catalogue/Exchange server (virtualized in Hyper-V) becomes corrupt, including the underlying metadata.

2) That server cannot ‘see’ the domain, with numerous errors from both Exchange and the Event Viewer stating that it cannot replicate due to DNS problems.

3) Rolling the .vhd back to a previous week results in the same issue.

4) When attempting to demote/dejoin/join/promote the server from/to the domain, the computer account is deleted, but not the metadata, and the server cannot be joined to the domain again.

Solution? Backup the Mailbox Databases from the First and Second Storage Groups, as well as their transaction logs, then create a new .vhd, reinstall the OS, join it to the domain, add the newly created computer account to the ‘Exchange Server’ and ‘Exchange Install Domain Servers’ groups, install Exchange using “setup.com /m:recoverserver” (make sure that you’ve manually installed the prerequisites, such as IIS+IIS 6 Management Console, etc… before doing this), then copy the Mailbox databases back to the default install location. After that, correct the permissions on the Mailbox folder if needed (simply inherit the permissions from the parent object) and reboot the server. When it finishes booting, open the Exchange Management Console and mount the Storage Groups (note: you may have to open the properties on both groups and uncheck the option that prevents Exchange from automatically mounting the databases on boot).

Simple, right?

(Continued from Part 2)

Now for most Exchange administrators, there’s not a lot worse than when one of your storage groups isn’t mounting. Worse things would include the RAID array dying and the server catching fire (maybe one as a result of the other), or a user who decides that the server room doesn’t need air conditioning when nobody’s working in there and shuts it off over a long weekend.

Not that the last one has ever happened to anyone. *Cough*

Unfortunately, because I was an idiot and didn’t copy the error messages at the time (I was more worried about getting the server back up and running), I can only summarize what happened.

  • Tried repeatedly to mount the database. As they say, if it doesn’t work the first time, it probably won’t work the seventh. Turns out, ‘they’ were right.
  • Ran ‘chkdsk /r’ on the RAID array containing the transaction logs, and then on the array with the .edb – no love, still no mounting
  • Tried every possible way to get eseutil /r to replay the transaction logs to the database, only to find that both were corrupt. Great.
  • Tried to restore the last backup using Backup EXEC. It didn’t work.
  • Admitted defeat and ran eseutil /p on the database.

Here’s the kicker: when running eseutil with the /p switch on a database that wasn’t shutdown cleanly or had the /r switch run on it first, all of the data in the transaction logs gets discarded. However, when they’re corrupt anyways, there’s really not a lot to lose.

When eseutil finally finished it’s repair after over an hour of grinding away, the database finally mounted. Heaving a sigh of relief I double-checked the tape and went home for night know I’d done all I could do. Surprisingly no one reported any missing emails the next morning, and I was able to grab a full backup of the server without issue.

When mid-afternoon rolled around, IBM Dude showed up with the ‘front diagnostic panel’, aka ‘the switch assembly’. We powered down the server, he ripped things apart, pulled out the old part, popped in the new one, and turned on the server.

Or at least, tried to turn it on.

–Click– *WHIRRRRRRRrrrrrr* –Click–

Fantastic. It looked like the first replacement switch assembly had the same problem. Ripping things apart again, IBM Dude swapped the replacement with the freshly ordered spare. Crossing our fingers, he tried the button again.

–Click– *WHIRRRRRRRrrrrrr* –Click–

Crap. At this point, he cut his losses and called for help. The suggestion? Replace the system board.

IBM Dude ordered the part, I booted the server again, once more relying on scotch tape to do it’s thing, and we made plans to have the board replaced the following Tuesday afternoon.

Will the gong show continue? Find out in Part 4.

To the person who found my blog my searching for “where is the power button on ibm x3400″:

Behind the front cover (you have to remove it – it pops off on the left-hand-side), in the top-left, immediately beside the power light, as below:

x3400-power

It's the white button beside the green power light