Third party transfers srmcp srm://A srm://B
are failling (except to FNAL).
iptables -vL
iptables -I INPUT -p TCP --dport 2811 -m state --state NEW -j ACCEPT iptables -I INPUT -p TCP --dport 20000:25000 -m state --state NEW -j ACCEPT iptables -I INPUT -p udp --dport 20000:25000 -j ACCEPT
openssl verify -CApath /etc/grid-security/certificates/ /etc/grid-security/hostcert.pem
srm.batch
and gridftpdoor.batch
)
set printout default 4Investigating looking at srm log
tail -f /opt/d-cache/libexec/apache-tomcat-5.5.20/logs/catalina.out
"connection timeout"
on FTS transfers is necessary to change (server and pools):
vim /opt/d-cache/config/gridftpdoor.batch set context -c performanceMarkerPeriod 10 /opt/d-cache/bin/dcache restart gridftp-spraid01DomainNot solved our main problem yet.
srcmp srm://osg-se.sprace.org.br:8443/pnfs/sprace.org.br/data/mdias/test.1srm://osg-se.sprace.org.br:8443/pnfs/sprace.org.br/data/mdias/test.1, it solved, at the same file above:
set context -c gsiftpAdapterInternalInterface 192.168.1.152But not solved our main problem Configuration removed: works without it
/opt/d-cache/config/srm.batch
:
set context -c srmVacuum false set context -c srmPutReqThreadPoolSize 500 set context -c srmCopyReqThreadPoolSize 500 set context -c srmGetLifeTime 28800000 et context -c srmPutLifeTime 28800000 set context -c srmCopyLifeTime 28800000 set context -c remoteGsiftpMaxTransfers 550and restarted our admin. None success. Configuration removed: works without it
ifconfig eth0 txqueuelen 20000
. Quality didn't increase.
I had a look at cmswiki, where there were more details of the problem, which was solved by using MyProxy instead of delegation. The problem is that the current delegation library is not able to handle new style proxy certificates, which are generated by default with 'grid-proxy-init'. See https://savannah.cern.ch/bugs/index.php?34026 We rarely experience this problem, because we usually use voms-proxy-init, which still generates old style proxy certificates by default. The workaround is to use grid-proxy-init -old One can reproduce the problem by generating an old style proxy after a new style proxy: $ grid-proxy-init $ mv /tmp/x509up_u$(id -u) /tmp/grid-proxy $ voms-proxy-init -cert /tmp/grid-proxy -key /tmp/grid-proxy $ grid-proxy-info subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=szamsu/CN=452476/CN=Akos Frohner/CN=201855275/CN=proxy The problematic credential had similar DN (see CN=1234/CN=proxy): /DC=org/DC=doegrids/OU=People/CN=Paul Rossman 364403/CN=117294575/CN=proxyTo implement this workaround we changed to delegation, not using myproxy in our FTS. You need to remove both the
-passfile
and the -myproxy
options from the PhEDEx ConfigPart.FTSDownload
configuration.
None positive results.
srmcp -2 -debug=true srm://cmssrm.fnal.gov:8443/srm/managerv2?SFN=/11/store/PhEDEx_LoadTest07/LoadTest07_Debug_BR_SPRACE/US_FNAL/69/mediumfile.txt gsiftp://osg-se.sprace.org.br:2811//mdias/testes/mediumfile_from_fnal.gsiftp.osg-se.2 -protocols=gsiftp srmcp -2 -debug=true srm://srm-cms.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/cms/store/test/smale.txt srm://osg-se.sprace.org.br:8443/pnfs/sprace.org.br/data/mdias/testes/smallfile_from_cern.osgse srmcp -2 -debug=true srm://gridka-dCache.fzk.de:8443/srm/managerv2?SFN=/pnfs/gridka.de/cms/test/mediufile.sh srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/mdias/testes/mediufile.txtwhere
$ srmls srm://osg-se.sprace.org.br:8443/pnfs/sprace.org.br/data/mdias/testes/smallfile_from_cern.osgse 3033 /pnfs/sprace.org.br/data/mdias/testes/smallfile_from_cern.osgse $ srmls srm://osg-se.sprace.org.br:8443/pnfs/sprace.org.br/data/mdias/testes/mediumfile_from_fnal.gsiftp.osg-se.2 616920 /pnfs/sprace.org.br/data/mdias/testes/mediumfile_from_fnal.gsiftp.osg-se.2
set context -c gsiftpPoolManagerTimeout 5400 set context -c gsiftpMaxRetries 80to
set context -c gsiftpPoolManagerTimeout 3600 set context -c gsiftpMaxRetries 3to decrease some load in our gridftp and pnfs server.
02/27 09:59:41,214 FTP Door: Transfer error. Sending kill to pool spraid01_3 for mover 11950 02/27 09:59:41 Cell(GFTP-osg-se-Unknown-114@gridftp-osg-seDomain) : CellMessage From : [>spraid01_3@spraid01Domain:*@spraid01Domain:PoolManager@dCacheDomain:*@dCacheDomain] 02/27 09:59:41 Cell(GFTP-osg-se-Unknown-114@gridftp-osg-seDomain) : CellMessage To : [*@dCacheDomain:PoolManager@dCacheDomain:*@gridftp-osg-seDomain:>GFTP-osg-se-Unknown-114@gridftp-osg-seDomain] 02/27 09:59:41 Cell(GFTP-osg-se-Unknown-114@gridftp-osg-seDomain) : CellMessage Object : (33)=Unexpected Exception : org.dcache.ftp.FTPException: Stream ended before EOD 02/27 09:59:41 Cell(GFTP-osg-se-Unknown-114@gridftp-osg-seDomain) : 02/27 09:59:41,359 FTP Door: Transfer error. Removing incomplete file 000100000000000000D4ACF8: /pnfs/sprace.org.br/data/mdias/testes/mediumfile_from_fnal.gsiftp.osg-se.2_trivial 02/27 09:59:41,451 FTP Door: Failed to delete 000100000000000000D4ACF8: Not in trash: 000100000000000000D4ACF8 02/27 09:59:41,452 FTP Door: Transfer error: 451 Aborting transfer due to session terminationInsufficient number of streams? Let's improve it in our
gridftpdoor.batch
files (pool and server)
set context -c gsiftpMaxStreamsPerClient 20 #10 set context -c gsiftpMaxLogin 300 #100we tried again use our internal interface , to speed up
set context -c gsiftpAdapterInternalInterface 192.168.1.151 #was "" set context -c gsiftpIoQueue WAN #was ""and increased our memory in
dCacheSetup
java_options="-server -Xmx2048m -XX:MaxDirectMemorySize=2048m #was 512malso we shutdown our billing for a moment
billingToDb=no
$ more /etc/sysctl.conf #Tunning net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_timestamps = 0 # turns TCP timestamp support off, default 1, reduces CPU use net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_sack = 0 # turn SACK support off, default on net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 vm.min_free_kbytes = 65536 vm.overcommit_memory = 2(We used
RTT*Max_Bandwidth*1000/8
to guess these numbers, were Maximum bandwidth was 1000 and using 125 ms to FNAL ) where this changes can be made without reboot:
$ sysctl -pWe also changed
$/sbin/ifconfig eth0 txqueuelen 10000 $/sbin/ifconfig eth1 txqueuelen 10000
globus-url-copy -vb gsiftp://cmsstor89.fnal.gov:2811///WAX/11/store/PhEDEx_LoadTest07/LoadTest07_Prod_FNAL/LoadTest07_FNAL_B4 gsiftp://spraid01.sprace.org.br:2811//mdias/testes/fnal_test Source: gsiftp://cmsstor89.fnal.gov:2811///WAX/11/store/PhEDEx_LoadTest07/LoadTest07_Prod_FNAL/ Dest: gsiftp://spraid01.sprace.org.br:2811//mdias/testes/ LoadTest07_FNAL_B4 -> fnal_test 3932160 bytes 0.13 MB/sec avg 0.13 MB/sec inst
Our traceroute is (at spraid01)
$ traceroute cmsstor89.fnal.gov traceroute to cmsstor89.fnal.gov (131.225.205.211), 30 hops max, 38 byte packets 1 200.136.80.1 (200.136.80.1) 0.413 ms 0.358 ms 0.332 ms 2 143-108-254-241.ansp.br (143.108.254.241) 0.783 ms 0.738 ms 0.735 ms 3 143-108-254-50.ansp.br (143.108.254.50) 0.991 ms 0.948 ms 1.037 ms 4 ansp-whren-stm.ampath.net (198.32.252.229) 109.343 ms 109.513 ms 109.422 ms 5 max-ampath.es.net (198.124.194.5) 140.242 ms 147.058 ms 140.413 ms Icmp checksum is wrong 6 clevcr1-ip-washcr1.es.net (134.55.222.57) 148.033 msIcmp checksum is wrong 147.962 msIcmp checksum is wrong 147.975 ms Icmp checksum is wrong 7 chiccr1-ip-clevcr1.es.net (134.55.217.54) 157.163 msIcmp checksum is wrong 157.065 msIcmp checksum is wrong 157.060 ms Icmp checksum is wrong 8 fnalmr1-ip-chiccr1.es.net (134.55.219.122) 158.444 msIcmp checksum is wrong 158.557 msIcmp checksum is wrong 158.798 ms 9 fnalmr2-ip-fnalmr3.es.net (134.55.41.42) 158.240 ms 158.397 ms 158.205 ms 10 te4-2-esnet.r-s-bdr.fnal.gov (198.49.208.230) 158.373 ms 158.360 ms 158.376 ms 11 131.225.15.201 (131.225.15.201) 158.558 ms 158.443 ms 158.439 ms 12 vlan608.r-s-hub-fcc.fnal.gov (131.225.102.3) 158.424 ms 158.318 ms 158.325 ms 13 s-cms-fcc2.fnal.gov (131.225.15.54) 159.239 ms 159.613 ms 159.146 ms 14 cmsstor89.fnal.gov (131.225.205.211) 159.448 ms 159.104 ms 158.964 ms
We checked if we have hardware problems, looking for errors , dropped, overuns, frame or carrier failures at
# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:11:43:E5:06:3A inet addr:200.136.80.6 Bcast:200.136.80.255 Mask:255.255.255.0 inet6 addr: fe80::211:43ff:fee5:63a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:89324608 errors:0 dropped:0 overruns:0 frame:0 TX packets:107777879 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:10000 RX bytes:2694181555 (2.5 GiB) TX bytes:4063241173 (3.7 GiB) Base address:0xecc0 Memory:df9e0000-dfa00000
Some additional changes done:
ethtool -g eth0 ethtool -G eth1 rx 4096
bing
locally:
wget http://debian.inode.at/debian/pool/main/b/bing/bing_1.1.3.orig.tar.gz tar -xvzf bing_1.1.3.orig.tar.gz cd bing_1.1.3 make suUsing = -S
ctr-c
to stop:
./bing -S 100 200.136.80.5 cmssrm.fnal.govSame errors at
mtr cmssrm.fnal.gov -s 1000
Using this we estimate our package loss (osg-se) and we compare with shell.ift.unesp.br
for ((i=1024;i< 65507 ;i+=1024)); do export loss=`ping cmssrm.fnal.gov -c 20 -s $i |grep loss|cut -d' ' -f6`; echo $i $loss; donein this graph:
Also checked our network card speed configuration:
ethtool eth0 Speed: 1000Mb/smay be due our normal activities this link is full.
[root@osg-se mdias]# ./pathchar cmssrm.fnal.gov pathchar to cmssrm.fnal.gov (131.225.207.12) can't find path mtu - using 1500 bytes. doing 32 probes at each of 45 sizes (64 to 1500 by 32) 0 localhost | 341 Mb/s, 157 us (350 us) 1 200.136.80.1 (200.136.80.1) | ?? b/s, 191 us (729 us) 2 143-108-254-241.ansp.br (143.108.254.241) | 981 Mb/s, 83 us (0.91 ms), 3% dropped 3 143-108-254-50.ansp.br (143.108.254.50) | 63 Mb/s, 54.2 ms (109 ms), 4% dropped 4 ansp-whren-stm.ampath.net (198.32.252.229) | ?? b/s, 3.86 ms (148 ms), 16% droppedan the same to FZK
[root@osg-se mdias]# ./pathchar gridka-dCache.fzk.de pathchar to gridka-dCache.fzk.de (192.108.45.38) can't find path mtu - using 1500 bytes. doing 32 probes at each of 45 sizes (64 to 1500 by 32) 0 osg-se (200.136.80.5) | 278 Mb/s, 157 us (356 us) 1 200.136.80.1 (200.136.80.1) | ?? b/s, 190 us (727 us) 2 143-108-254-241.ansp.br (143.108.254.241) | ?? b/s, 104 us (0.93 ms), 6% dropped 3 143-108-254-50.ansp.br (143.108.254.50) | 56 Mb/s, 54.2 ms (109 ms), 7% dropped 4 ansp-whren-stm.ampath.net (198.32.252.229) | ?? b/s, 6.98 ms (123 ms), 7% dropped
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
![]() |
perda_pacotes.jpeg | r1 | manage | 18.8 K | 2009-03-04 - 16:24 | UnknownUser |
antalya escort bursa escort eskisehir escort istanbul escort izmir escort