r3 - 22 Oct 2008 - 06:06:22 - MarcoAndreFerreiraDiasYou are here: TWiki >  Main Web > LogBook > EntryDescriptionNo60

OSG-RSV troubleshotting

Description

Our osg-rsv wasn't reporting to central colector. We restarted this services as follows:
/etc/init.d/osg-rsv stop
/etc/init.d/condor-cron stop
/etc/init.d/condor-cron start
/etc/init.d/osg-rsv start
it restarted well, looking at the jobs
condor_cron_q
but looking at its logs, we find a lot of errors
tail -f tail -f /OSG/osg-rsv/logs/consumers/gratia-script-consumer.err

sh: line 1: -osg-ce.sprace.org.br-org.osg.general.osg-version.18528.py:
command not found
 sh: /opt/osg-1.0.0/osg-rsv/output/gratia/2008-10-11T22:46:52Z: No such
 file or directory
 sh: line 1: -osg-ce.sprace.org.br-org.osg.general.osg-version.13681.py:
 command not found

In this case, the solution was clean old files on

rm -f $VDT_LOCATION/osg-rsv/output/gratia/*
restarted all agents again and
/etc/init.d/apache restart
It seems fine now:
tail -f /opt/osg-1.0.0/osg-rsv/logs/consumers/gratia-script-consumer.out
2008-10-19 05:54:37 BRST Gratia:                           handshake records sent successfuly: 1
2008-10-19 05:54:37 BRST Gratia:                           handshake records failed: 0

and you can double check the date on probes at https://osg-ce.sprace.org.br:8443/rsv/ with https://lcg-sam.cern.ch:8443/sam/sam.py?sensors=OSGCE&regions=OpenScienceGrid&vo=ops&order=SiteName&funct=ShowSensorTests

As a subproduct of this we checked an error on condor-cron. Again it restarted well, but..

 tail -f /OSG/condor-cron/local.osg-ce/log/MasterLog

10/17 08:43:05 attempt to connect to <192.168.1.150:9619
> failed: Connection refused (connect errno = 111).
10/17 08:43:05 ERROR: SECMAN:2003:TCP connection to <192.168.1.150:9619 > failed

It was a error in our port setup

vim /OSG/condor-cron/etc/condor_config
COLLECTOR_HOST  = $(CONDOR_HOST):9618
restart condor-cron again

Updates

Marco at 21/10/2008

we changed our condor_cron configuration to avoid condor-cron publish schedd to our production Condor.
/etc/init.d/osg-rsv stop
/etc/init.d/condor-cron stop
vim /OSG/condor-cron/etc/condor_config
COLLECTOR_HOST  = 
/etc/init.d/condor-cron start
/etc/init.d/osg-rsv start
This error is harmless, when you start condor-cron:
tail -f /OSG/condor-cron/local.osg-ce/log/MasterLog
ERROR: Unable to find collector info in configuration file!!!

Marco at 22/10/2008

To fix the osg-rsv issue above, we followed
cd $VDT_LOCATION/osg-rsv/bin/probes
mv  OSG_RSV_Probe_Base.pm OSG_RSV_Probe_Base.pm-old
wget http://rsv.grid.iu.edu/downloads/pre-release/Probes-2.3.5/OSG_RSV_Probe_Base.pm

-- MarcoAndreFerreiraDias - 19 Oct 2008

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r3 < r2 < r1 | More topic actions
 
Home
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback