This is a post I was meant to post quite some time ago, but forgot. Nevertheless…
If you have administrated a SCOM environment with both Wintel and UNIX/Linux machines, I am betting you have experienced some gray agents, specifically for UNIX/Linux machines.
The issue was, the server was definitely online, however according the SCOM, the server was offline or at least in a gray state. Below are the steps below I took resolve the gray agent for the machine, the server was Red Hat (RHEL) 6.x.
Steps to diagnose the issue:
- Could I ping the server from any of the SCOM management servers? Yes.
- Could I ping the server from its resource pool? Yes.
- Was there communication between ports 22 and 1270? Yes.
- Was I able to establish a Putty session via port 22? Yes.
- Was the SCOM process running on the server? Hmm, that’s a funny error…
Next are the steps I took to resolve the issue:
- Restart SCOM process, “sxcadmin” … Cannot, “RETURN CODE: 1”
- Googling, many members in the community have also had this error, and have isolated the issue to a corrupted CIM.Socket and SCX-CMID.PID file(s).
- Delete the files:
4. Let’s restart the SCX Agent…
5. Well that did not work either, check to see if port 1270 is evening listening…
6. Okay, let’s kill all processes associated scxadmin service…
7. Now let’s start the scxadmin process, and check again to see if port 1270 is listening…
8. Perfect! And what does SCOM say?
Problem solved! There are ways to automate this process, and this was achieved with the use of SCORCH and Runbooks. I will post that solution soon. Stay tuned.
Happy SCOM’ing! =)