Upgrade to OSG 1.0

We started our upgrade to 1.0:
  • stoping our gatekeeper, in our computing element
/etc/init.d/xinetd stop
  • In our Computing element, getting pacman
cp -a  /opt/osg-0.8.0 /home/mdias/osg_backup0.8
cd /opt
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-3.26.tar.gz
tar --no-same-owner -xzvf pacman-3.26.tar.gz
cd pacman-3.26
source setup.sh
cd /opt
mkdir osg-1.0.0
cd /opt/osg-1.0.0
vdt-control --off
disabling init service tomcat-55... ok
disabling cron service gratia-condor... ok
disabling init service condor... ok
disabling init service syslog-ng... ok
disabling init service tomcat-5... ok
disabling init service osg-rsv... ok
disabling init service apache... ok
disabling init service condor-devel... ok
disabling cron service vdt-update-certs... ok
disabling init service MLD... ok
disabling cron service gums-host-cron... ok
disabling cron service edg-mkgridmap... ok
disabling init service globus-ws... ok
disabling init service mysql... ok
disabling inetd service gsiftp... ok
disabling inetd service globus-gatekeeper... ok
disabling init service gris... ok
disabling cron service vdt-rotate-logs... ok
disabling cron service fetch-crl... ok

export OLD_VDT_LOCATION=/opt/osg-0.8.0
pacman -get OSG:ce
source setup.sh
pacman -get OSG:Globus-Condor-Setup
the last step install condor batch. We have to inspect
 vim  /opt/osg-1.0.0/condor/etc/condor_config
Managed fork issues: installing and configuring it
pacman -get  OSG:ManagedFork
 $VDT_LOCATION/vdt/setup/configure_globus_gatekeeper --managed-fork y --server y
Gums server:
pacman -get  OSG:gums
 vim /opt/osg-1.0.0/MonaLisa/Service/VDTFarm/ml.properties
$VDT_LOCATION/vdt/setup/configure_monalisa --prompt
vim  /opt/osg-1.0.0/MonaLisa/Service/CMD/ml_env
JAVA_HOME=/opt/osg-1.0.0/jdk1.5  <-------was jdk1.4
(Look this after services are on. It was changed.) Gums host cron
vdt-register-service -name gums-host-cron --enable
vdt-register-service: updated cron service 'gums-host-cron'
vdt-register-service: desired state = enable
vdt-register-service: cron time     = '9 8,14,20,2 * * *'
vdt-register-service: cron command  = '/opt/osg-1.0.0/gums/scripts/gums-host-cron'

 vdt-control --enable gums-host-cron
some gums configuration
cp /opt/osg-1.0.0/post-install/prima-authz.conf /etc/grid-security/.
cp /opt/osg-1.0.0/post-install/gsi-authz.conf /etc/grid-security/.
/opt/osg-1.0.0/tomcat/v55/webapps/gums/WEB-INF/scripts/gums-add-mysql-admin "/DC=org/DC=doegrids/OU=People/CN=Marco Dias 280904"
We have to inspect these files

Reusing the old configuration

cd /opt/osg-1.0.0/monitoring
source ../setup.sh
export OLD_VDT_LOCATION=/opt/osg-0.8.0/
./configure-osg.py -e
vim extracted-config.ini
we can test it and repair
./configure-osg.py -v -f ./extracted-config.ini
 cp /opt/osg-0.8.0/monitoring/grid3-user-vo-map.txt
vim /opt/osg-1.0.0/monitoring/extracted-config.ini
installing it.
  ./configure-osg.py -c -f ./extracted-config.ini 
removing and creating simbolic links
clcmd umount /OSG/
unlink /OSG 
ln -s /opt/osg-1.0.0 /OSG
vim /etc/exports
 /etc/init.d/nfs restart
Set Globus-Base-WSGRAM-Server
/opt/osg-1.0.0/vdt/setup/configure_prima_gt4 --enable --gums-server osg-ce.sprace.org.br
/etc/init.d/globus-ws stop
/etc/init.d/globus-ws start


vim /opt/osg-1.0.0/condor-cron/local.osg-ce/condor_config.local
chown condor: /var/lock/subsys/condor-cron/
we also add in our
vim $VDT_LOCATION/condor-cron/etc/condor_config file

Turning on services

vdt-control --on
enabling cron service fetch-crl... ok
enabling cron service vdt-rotate-logs... ok
enabling cron service vdt-update-certs... ok
skipping init service 'gris' -- marked as disabled
enabling inetd service globus-gatekeeper... ok
enabling inetd service gsiftp... ok
enabling init service mysql... ok
enabling init service globus-ws... ok
skipping cron service 'edg-mkgridmap' -- marked as disabled
enabling cron service gums-host-cron... ok
enabling init service MLD... ok
enabling init service condor-cron... ok
enabling init service apache... ok
skipping init service 'osg-rsv' -- marked as disabled
enabling init service tomcat-55... ok
enabling init service syslog-ng-sender... ok
enabling init service condor... ok
enabling cron service gratia-condor... ok

Configurating CEmon and seeing if GIP is working

$VDT_LOCATION/vdt/setup/configure_cemon --consumer https://osg-ress-1.fnal.gov:8443/ig/services/CEInfoCollector --topic OSG_CE

Some notes: gums used the old configuration, but gets the old password also, in your gums.config

so we had to inspect the vdt-install.log, looking where did it set the password and insert in that field.

An issue with condor: condor_status ok, but condor_q freezes. We suspected a port conflict with condor-cron. We configurated:

COLLECTOR_HOST  = $(CONDOR_HOST):9619 #was 9618
HIGHPORT = 65100 #commentaded 9700
LOWPORT = 65001 #commented 9600
CREDD_PORT                      = 9622 #was 9620
STORK_PORT                      = 9623 #was 9621
and in our condor
HIGHPORT = 65000
LOWPORT = 63001
but the real problem was a bug in 7.0.2 version. We installed again
cd /OSG/
cp -a condor condor.old
cd condor
 rm -rf *
cd ..
tar -xvzf condor-7.0.3-linux-x86-rhel3.tar.gz
 cd condor-7.0.3
./condor_configure --install --maybe-daemon-owner --make-personal-condor
--install-log ../post-install/README --install-dir /OSG/condor
 cd /OSG/condor/etc/
 mv condor_config condor_config.bck
cp /OSG/condor.old/etc/condor_config .

  • Installing RSV
cd /opt/osg-1.0.0/monitoring/
vim extracted-config.ini
enabled = True
rsv_user = mdias
enable_ce_probes = True
ce_hosts = osg-ce.sprace.org.br
enable_gridftp_probes = True
gridftp_dir = /tmp
enable_gums_probes =  False
gums_hosts = osg-ce.sprace.org.br
enable_srm_probes = True
srm_hosts = osg-se.sprace.org.br
srm_dir = /pnfs/sprace.org.br/data/mdias
use_service_cert = False
proxy_file = /tmp/x509up_u537
enable_gratia = True
print_local_time = True
setup_rsv_nagios = False
setup_for_apache = True
now start services
 vdt-control --off
./configure-osg.py -c -f ./extracted-config.ini 
vdt-control --on 
and check it
tail -f $VDT_LOCATION/osg-rsv/logs/consumers/gratia-script-consumer.out
also looking at https://osg-ce.sprace.org.br:8443/rsv

  • In our nodes, stop condor
clcmd /etc/init.d/condor stop
clcmd mount /OSG
in a NFS shared directory,
cd /home/mdias
wget http://physics.bu.edu/pacman/sample_cache/tarballs/pacman-3.26.tar.gz
tar --no-same-owner -xzvf pacman-3.26.tar.gz
and then We've created a script like that:
source /opt/OSG-wn-client/setup.sh
vdt-control --off
cd /home/mdias/pacman-3.26
source setup.sh
mv /opt/OSG-wn-client /opt/OSG-wn-client.old
mkdir /opt/OSG-wn-client
cd /opt/OSG-wn-client
pacman -trust-all-caches -get OSG:wn-client
mv /var/log/glexec /var/log/glexec.old
mv  /etc/glexec  /etc/glexec.old
mkdir /opt/glexec
cd /opt/glexec
pacman -trust-all-caches    -get http://vdt.cs.wisc.edu/vdt_181_cache:Glexec
sed -i 's/yourmachine.yourdomain/osg-ce.sprace.org.br/g'  /etc/glexec/contrib/gums_interface/getmapping.cfg
source setup.sh
vdt-control --on
mkdir /opt/glexec/glite/etc
cp /OSG/glite/etc/vomses /opt/glexec/glite/etc/.
so, We start to run it in our nodes:
/root/bin/clcmd /home/mdias/worknodeinstall.sh
/root/bin/clcmd rm -rf /opt/OSG-wn-client.old
we have to edit one /etc/glexec/glexec.conf to put "linger =no" line in [glexec] section and copy to the other nodes.
clcmd cp -f /home/mdias/glexec.conf /etc/glexec/.
Also we have to edit our computer element , before reusing old configuration:
 vim /opt/osg-1.0.0/monitoring/extracted-config.in
glexec_location = /opt/glexec/glexec-osg
Testing glexec
cd /etc/glexec
source setup.sh 
voms-proxy-init -voms cms:/cms
export GLEXEC_CLIENT_CERT=/tmp/x509up_u537  
cd /opt/glexec/
glexec-osg/sbin/glexec /usr/bin/id

A srm error has forced us to upgrade our dcache to make srm compatible. So in our storage element (osg-se):

/opt/init.d/dcache-pool stop
/opt/init.d/dcache-core stop
wget http://www.dcache.org/downloads/1.8.0/dcache-server-1.8.0-15p7.noarch.rpm
wget http://www.dcache.org/downloads/1.8.0/dcache-srmclient-1.8.0-15p7.noarch.rpm
rpm -Uvh dcache-server-1.8.0-15p7.noarch.rpm dcache-srmclient-1.8.0-15p7.noarch.rpm
Restarting pools and dcache-core :
 /opt/d-cache/bin/dcache-core start
/opt/d-cache/bin/dcache-pool start
Restarting like that
[root@spraid02 ~]# /opt/d-cache/bin/dcache-core start
This script is deprecated and will be removed in a future
release. Please use /opt/d-cache/bin/dcache start instead.
Starting dcache services: 
Starting gridftp-spraid02Domain  Done (pid=1147)
[root@spraid02 ~]# /opt/d-cache/bin/dcache-pool start
This script is deprecated and will be removed in a future
release. Please use /opt/d-cache/bin/dcache start pool instead.
WARNING: the variable DCACHE_HOME is not set.
WARNING: Using deprecated value of DCACHE_BASE_DIR as DCACHE_HOME
start dcache pool: Starting spraid02Domain  Done (pid=1271)

  • In our storage elements
 unlink /opt/osg-0.8.0
 ln -s /OSG /opt/osg-1.0.0

configuring dcache information provider:

 yum install postgresql.i386
What is the hostname of your dCache admin interface?
Configuration saved.  If you would like to alter any choices without 
re-running this configuration script, you may find these answers in:

 chown daemon:root $VDT_LOCATION/gip/etc/dcache_storage.conf

interesting links


Marco at 07/10/2008

Apply a GIP patch
--- gip/libexec/services_info_provider.py    (revision 2030)
+++ gip/libexec/services_info_provider.py    (working copy)
@@ -102,6 +102,16 @@
 def print_srm(cp, admin):
     sename = cp.get("se", "unique_name")
     sitename = cp.get("site", "unique_name")
+    # BUGFIX: Resolve the IP address of srm host that the admin specifies.
+    # If this IP address matches the IP address given by dCache, then we will
+    # print out the admin-specified hostname instead of looking it up.  This
+    # is for sites where the SRM host is a CNAME instead of the A name.
+    srm_host = cp_get(cp, "se", "srm_host", None)
+    if srm_host:
+        try:
+            srm_ip = socket.gethostbyname(srm_host)
+        except:
+            srm_ip = None
     #vos = [i.strip() for i in cp.get("vo", "vos").split(',')]
     vos = voListStorage(cp)
     ServiceTemplate = getTemplate("GlueService", "GlueServiceUniqueID")
@@ -121,8 +131,11 @@
         hostname = hostname.split(',')[0]
             hostname = socket.getfqdn(hostname)
+            hostname_ip = socket.gethostbyname(hostname)
-            pass
+            hostname_ip = None
+        if hostname_ip != None and hostname_ip == srm_ip and srm_host !=
+            hostname = srm_host
         info = {
                 "serviceType"  : "SRM",
                 "acbr"         : acbr[1:], 

one more patch
--- vdt/setup/configure_gip    (revision 2047)
+++ vdt/setup/configure_gip    (working copy)
@@ -1337,6 +1337,19 @@


+    # BUGFIX: se_access is not filled in if you are using dynamic_dcache, but
+    # this global variable must be populated for the CESE bind to work right
+    # later on
+    if ($srm && $dynamic_dcache==1) {
+        foreach $vo (@vo_list){
+            my $se_access_root = $vo_access_roots{$vo};
+            if ($se_access_root!~/^$/) {
+                $se_access{$vo} = $se_access_root;
+            }
+        }
+    }
     my $service_config_file =
     safe_write($service_config_file, $service_contents);
     vdt_install_log("===== BEGIN osg-info-static-service.conf =====\n");
@@ -1536,8 +1549,9 @@

             if ($srm && exists $se_access{$vo} && defined $sa_path) {
+                # BUGFIX: use $se_access{$vo} instead of
                 $contents = $contents."\ndn: GlueCESEBindSEUniqueID=$se_host,
-                                     ."GlueCESEBindCEAccesspoint:
+                                     ."GlueCESEBindCEAccesspoint:

now run configure_gip again. Another patch solves a error runing $GIP_LOCATION/libexec/token_info_provider.py

--- gip/lib/python/gip_cese_bind.py    (revision 2047)
+++ gip/lib/python/gip_cese_bind.py    (working copy)
@@ -55,6 +55,8 @@
     ce_list = getCEList(cp)
     se_list = getSEList(cp)
     access_point = cp_get(cp, "vo", "default", "/")
+    if not access_point:
+        access_point = "/UNAVAILABLE"
     for ce in ce_list:
         for se in se_list:
             info = {'ceUniqueID' : ce, 
It's not necessary run configure_gip again

Marco at 07/16/2008

Cemon not reporting to BDII database. Parag Mhashilkar gently helped us with it

 Coincidentally, I have seen similar error in catalina.out. The
 admin their claimed that they managed to fix the problem by
 putting xercesImpl.jar in $VDT_LOCATION/tomcat/v55/common/endorsed.

 The claim is that, this missing jar file is resulting in some
 strange interference between gums and cemon installation. You can
 find this jar at couple of places within vdt itself.

 Once you have this file in above dir, stop tomcat, make sure that
 everything in $GLITE_LOCATION/var/cemonitor is deleted (manually)
 and start tomcat.

Marco at 07/18/2008

in order to our dcache GIP works, we havet to edit:
 more /OSG/monitoring/extracted-config.ini 
dynamic_dcache = 1
cd /OSG/monitoring
./configure-osg.py -c -f ./extracted-config.ini

-- MarcoAndreFerreiraDias - 23 Jun 2008

