OSG-RSV troubleshotting

Description

Our osg-rsv wasn't reporting to central colector. We restarted this services as follows:
/etc/init.d/osg-rsv stop
/etc/init.d/condor-cron stop
/etc/init.d/condor-cron start
/etc/init.d/osg-rsv start
it restarted well, looking at the jobs
condor_cron_q
but looking at its logs, we find a lot of errors
tail -f tail -f /OSG/osg-rsv/logs/consumers/gratia-script-consumer.err

sh: line 1: -osg-ce.sprace.org.br-org.osg.general.osg-version.18528.py:
command not found
 sh: /opt/osg-1.0.0/osg-rsv/output/gratia/2008-10-11T22:46:52Z: No such
 file or directory
 sh: line 1: -osg-ce.sprace.org.br-org.osg.general.osg-version.13681.py:
 command not found

In this case, the solution was clean old files on

rm -f $VDT_LOCATION/osg-rsv/output/gratia/*
restarted all agents again and
/etc/init.d/apache restart
It seems fine now:
tail -f /opt/osg-1.0.0/osg-rsv/logs/consumers/gratia-script-consumer.out
2008-10-19 05:54:37 BRST Gratia:                           handshake records sent successfuly: 1
2008-10-19 05:54:37 BRST Gratia:                           handshake records failed: 0

and you can double check the date on probes at https://osg-ce.sprace.org.br:8443/rsv/ with https://lcg-sam.cern.ch:8443/sam/sam.py?sensors=OSGCE&regions=OpenScienceGrid&vo=ops&order=SiteName&funct=ShowSensorTests

As a subproduct of this we checked an error on condor-cron. Again it restarted well, but..

 tail -f /OSG/condor-cron/local.osg-ce/log/MasterLog

10/17 08:43:05 attempt to connect to <192.168.1.150:9619
> failed: Connection refused (connect errno = 111).
10/17 08:43:05 ERROR: SECMAN:2003:TCP connection to <192.168.1.150:9619 > failed

It was a error in our port setup

vim /OSG/condor-cron/etc/condor_config
COLLECTOR_HOST  = $(CONDOR_HOST):9618
restart condor-cron again

Updates

Marco at 21/10/2008

we changed our condor_cron configuration to avoid condor-cron publish schedd to our production Condor.
/etc/init.d/osg-rsv stop
/etc/init.d/condor-cron stop
vim /OSG/condor-cron/etc/condor_config
COLLECTOR_HOST  = 
/etc/init.d/condor-cron start
/etc/init.d/osg-rsv start
This error is harmless, when you start condor-cron:
tail -f /OSG/condor-cron/local.osg-ce/log/MasterLog
ERROR: Unable to find collector info in configuration file!!!

Marco at 22/10/2008

To fix the osg-rsv issue above, we followed
cd $VDT_LOCATION/osg-rsv/bin/probes
mv  OSG_RSV_Probe_Base.pm OSG_RSV_Probe_Base.pm-old
wget http://rsv.grid.iu.edu/downloads/pre-release/Probes-2.3.5/OSG_RSV_Probe_Base.pm

-- MarcoAndreFerreiraDias - 19 Oct 2008


This topic: Main > WebHome > LogBook > EntryDescriptionNo60
Topic revision: r3 - 2008-10-22 - MarcoAndreFerreiraDias
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback

antalya escort bursa escort eskisehir escort istanbul escort izmir escort