Job Robots Troubleshooting
Description
All jobRobots are aborted in our farm, looking at the page
http://jobrobot.web.cern.ch/JobRobot/aborted_081019.html#T2_BR_SPRACE
BrokerHelper: no compatible resources
request expired
First we checked some corruption in our CMSSW installation, running a crab using the same version of CMSSW pointed in
http://jobrobot.web.cern.ch/JobRobot/summary_081019.html
following instructions at (sorry, in portuguese yet!)
http://www.sprace.org.br/Twiki/bin/view/Main/EntryDescriptionNo53
Checked how this jobs enter in our farm. At time being Job Robot submits using the /cms/Role=production . Check to wich local user gums maps its
grep "/cms/Role=production" /OSG/globus/var/globus-gatekeeper.log
local_user=uscms003
and look at condor a job running at this moment:
condor_q|grep uscms003
320821.0 uscms003 10/19 06:27 0+00:05:26 R 0 0.0 data
I collect some informations when one of them was running:
condor_q -l 319995.0
Out ="/home/uscms003/.globus/job/osg-ce.sprace.org.br/29607.1224322051/stdout"
Arguments ="--dest-url=https://rb127.cern.ch:20309/tmp/con...
prace.org.br:2119.3108.1211"
The last message on logs before it dies is:
tail -f /home/uscms003/.globus/job/osg-ce.sprace.org.br/29607.1224322051/stdout
2008-10-18 07:04:41 INFO: globus-url-copy
file:///opt/osg-1.0.0/globus/tmp/gr...
https://rb127.cern.ch:20309/tmp/con....
1211
failed for unknown reason. wait status is 256, return value might be 1
2008-10-18 07:04:41 ERROR: 6: failed to send file back, retry limit exceeded.
Fatal.
Updates
Fulano em dd/mm/aaaa
Coloca o que fez.
Ciclano em dd/mm/aaaa
Mais comentarios
--
MarcoAndreFerreiraDias - 19 Oct 2008