Tags:
,
view all tags
---+ Phedex e OSG fora do ar. ---++Description 07h44 e o nosso Prodution Component Status do PhEDEx esta down a 10h25min. O site da OSG també deu o seguinte retorno do seus testes <pre> Authentication: Pass 2006-10-09 09:04:29 GMT Hello World: Fail Command: globus-job-run spgrid.if.usp.br:2119 /bin/sh -c "echo Hello World ; echo Hello_World_DONE" Reason: Timeout ; output : /usr/local/globusc/globus/bin/globus-job-run: line 1: 18198 Killed /usr/local/globusc/globus/bin/globusrun -q -o -r "spgrid.if.usp.br:2119" -f /tmp/globus_job_run.osggridcat.rsl.18125 ; status : 246 2006-10-09 09:06:09 GMT CONDOR Batch System: -Batch Query: Pass 2006-10-09 09:04:35 GMT -Batch Sub: Pass 2006-10-09 09:04:35 GMT -Batch Cancel: Fail Command: globus-job-clean -force -r spgrid.if.usp.br:2119/jobmanager-fork https://spgrid.if.usp.br:_port_range_port_/number1/number2 Reason: Unknown ; output: Could not clean up job. ; status: 245 2006-10-09 09:04:36 GMT gsiftp: Pass 2006-10-09 09:06:14 GMT Web Service Hello World: Pass 2006-10-09 09:07:17 GMT </pre> ---++Updates Vou restartar o serviço da phedex. Pelo que me parece o grid proxy é válido: <pre> [root@spdc00 root]# su - phedex [phedex@spdc00 phedex]$ grid-proxy-info subject : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy/CN=proxy issuer : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221/CN=proxy/CN=proxy identity : /DC=org/DC=doegrids/OU=People/CN=Eduardo Gregores 407221 type : full legacy globus proxy strength : 1024 bits path : /home/phedex/gridcert/proxy.cert timeleft : 11:16:16 </pre> então: <pre> [phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.Prod stop [phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.Prod start FileDownload: pid 29035 already running in /home/phedex/state/download-master-prod FileDiskExport: pid 29041 already running in /home/phedex/state/exp-disk-prod InfoDropStatus: pid 29047 already running in /home/phedex/state/info-ds-prod FilePFNExport: pid 29053 already running in /home/phedex/state/exp-pfn-prod </pre> mas mesmo às 08h27 não conseguimos entrar no serviço com UP. Restartei novamente. <pre> [phedex@spdc00 phedex]$ tail -n 10 /home/phedex/logs/download-master 2006-09-30 22:01:30: FileDownload[6579]: xstats: to=T2_SPRACE_Buffer from=T1_CERN_Load fileid=3610 state=100 size=2074217787 time_assigned=3856.96 time_all=2835.94 time_preclean=0.22 time_transfer=594.32 time_validate=2223.72 time_postclean=7.23 lfn=/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070 from_pfn=srm://srm.cern.ch:8443/srm/managerv1?SFN=/castor/cern.ch/cms/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070 to_pfn=srm://spdc00.if.usp.br:8443/srm/managerv1?SFN=/pnfs/if.usp.br/data/cms/store/test/2006/06/16/IntegrationLargeSample/0000/LoadTest_T1_CERN_0070 2006-09-30 22:01:31: FileDownload[6579]: Stopped all pending jobs </pre> O log da spgrid sobre os problemas com o monitoramento da OSG dão <pre> [mdias@spgrid mdias]$ tail -f /OSG/globus/var/globus-gatekeeper.log PID: 25742 -- Notice: 5: and local gid: 524 TIME: Mon Oct 9 08:28:32 2006 PID: 25742 -- Notice: 0: executing /usr/local/opt/OSG/globus/libexec/globus-job-manager TIME: Mon Oct 9 08:28:32 2006 PID: 25742 -- Notice: 0: GATEKEEPER_JM_ID 2006-10-09.08:28:32.0000025742.0000000000 for /DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100 on 129.79.4.64 TIME: Mon Oct 9 08:28:32 2006 PID: 25742 -- Notice: 0: GRID_SECURITY_CONTEXT_FD=11 TIME: Mon Oct 9 08:28:32 2006 PID: 25742 -- Notice: 0: Child 25771 started sh: line 1: /var/tmp/gratia.log: Permission denied </pre> o que parece normal.Vou tentar restartar o SC4 <pre> [phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.SC4 start FileDownload: removing old stop flag /home/phedex/state/download-master/stop FileDownload: pid 19841 started in /home/phedex/state/download-master FileDiskExport: removing old stop flag /home/phedex/state/exp-disk/stop FileDiskExport: pid 19847 started in /home/phedex/state/exp-disk InfoDropStatus: removing old stop flag /home/phedex/state/info-ds/stop InfoDropStatus: pid 19853 started in /home/phedex/state/info-ds FilePFNExport: removing old stop flag /home/phedex/state/exp-pfn/stop FilePFNExport: pid 19859 started in /home/phedex/state/exp-pfn FileRecycler: removing old stop flag /home/phedex/state/download-recycle/stop [phedex@spdc00 phedex]$ FileRecycler: pid 19865 started in /home/phedex/state/download-recycle [phedex@spdc00 phedex]$ Master -config ~/SITECONF/local/PhEDEx/Config.Prod start FileDownload: pid 29035 already running in /home/phedex/state/download-master-prod FileDiskExport: pid 29041 already running in /home/phedex/state/exp-disk-prod InfoDropStatus: pid 29047 already running in /home/phedex/state/info-ds-prod FilePFNExport: pid 29053 already running in /home/phedex/state/exp-pfn-prod [phedex@spdc00 phedex]$ tail -n 20 /home/phedex/logs/download-master 2006-09-30 22:01:31: FileDownload[6579]: Stopped all pending jobs 2006-10-09 11:35:54: FileDownload[19841]: (re)connecting to database </pre>
Edit
|
Attach
|
P
rint version
|
H
istory
:
r2
<
r1
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r1 - 2006-10-09
-
MarcoAndreFerreiraDias
Home
Site map
Main web
Sandbox web
TWiki web
Main Web
Users
Groups
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
P
View
Raw View
Print version
Find backlinks
History
More topic actions
Edit
Raw edit
Attach file or image
Edit topic preference settings
Set new parent
More topic actions
Account
Log In
Copyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback