Tag: bbcutil

Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.

ATTENTION, RMC LEVEL 1 AGENT: This ticket will be automatically worked by the Automation Bus. Pls do not take ownership until further notice.
Node : solaris.setaoffice.com
Node Type : Sun SPARC (HTTPS)
Severity : major
OM Server Time: 2015-04-01 14:37:24
Message : Failure: The event flow is broken on solaris.setaoffice.com for the last 60min. Please follow the instructions.
Msg Group : ITO
Application : HealthCheck
Object : OVO-agent
Event Type :
not_found

Instance Name :
not_found

Instruction : (Please carry out instructions in order and record output in ticket)

1) check if there is any maintenance ongoing for the respective system. Set an scheduled outage if yes.

2) check if the system is reachable – login to the server in question and ping the OVO management server. if not pingable, inform the second line or technical lead

3) if the system is reachable, generate a test alert on the node in question.

4) if the test alert is not received, do opcagt -kill; then remove temp queue files (/var/opt/OV/tmp/OpC/*q on Unix or on windows,
…\tmp\OpC\*q); then do opcagt -start on the system. Generate another test alert on the node in question.

5) if the the test alert is not received, refer the call to OVO monitoring support team.

Check which host is the HPOM manager and try to ping it

root@solaris:/ # /opt/OV/bin/ovconfget | grep OPC_PRIMARY_MGR
OPC_PRIMARY_MGR=hpommanager.omc.hp.com

root@solaris:/ # ping hpommanager.omc.hp.com
hpommanager.omc.hp.com is alive

Try also to use the tool bbcutil and check the status. If everything is also okay, the manager is having trouble reaching the managed host.

root@solaris:/ # bbcutil -ping https://hpommanager.omc.hp.com

https://hpommanager.omc.hp.com: status=eServiceOK
coreID=d2ebdec9-48ff-40ec-bf76-eb233981c3a0
bbcV=11.14.014 appN=ovbbccb appV=unknown version
conn=9 time=1199 ms

HP OpenView Error: ovomanagementserver.hp.com: (bbc-289) status=eSSLError time= (above 10000 ms)

The HTTPS Operation Agent has been installed on a remote node and certificates correctly granted. Checking the connectivity with the bbcutil command for encrypted communication leads to the following error when executed on each side, agent or server:

root@linux:~ # /opt/OV/bin/bbcutil -ping ovomanagementserver.omc.hp.com

ovomanagementserver.hp.com: (bbc-289) status=eSSLError time=19609 ms

root@linux:~ # /opt/OV/bin/bbcutil -ping http://ovomanagementserver.hp.com

http://ovomanagementserver.hp.com: status=eServiceOK
coreID=52def78a-c60c-7546-0044-8256d848046c
bbcV=06.21.501 appN=ovbbccb appV=06.21.501
conn=1521 time=86 ms

Cause
As the non-encrypted communication does not fail, the problem should be tied only to the SSL chain. Looking at the output of the “bbcutil” command reveals that the time it takes trying to contact this node (19609 ms) is above the default timeout defined for the SSL Handshake process which is 10 seconds.

Fix
Increasing the SSL Handshake Timeout period to a higher value, clears the situation. Use the following command in both sides, server and agent, to increase the value to 5 minutes:

# ovconfchg -ns bbc.http -set SSL_HANDSHAKE_TIMEOUT 300000

Source: http://h30499.www3.hp.com/hpeb/attachments/hpeb/itrc-162/119064/1/356770.pdf