Alpgen Production Workflow

Alpgen Production Workflow

Introduction

This Twiki documents a sequence on how to make a large-scale ALPGEN event production at SPRACE upon request. It assumes the following conditions:

Access to an account on the access server at SPRACE.
Know-how on setting up a CMSSW work area (here we document CMSSW 3_1_X, please modify instructions accordingly).
Know-how on setting up and using the CONDOR batch system.
Presence of grids for that given process. Grids are files used by ALPGEN to allow a more efficient exploration of the phase space. They are to be supplied by the requester.
Presence of efficiencies for that given process. The ALPGEN production comprises three phases: generation of weighted events, generation of unweighted events, and generation of matched events. Generally, a request is going to be made in terms of a number of MATCHED events (or, equivalently, in terms of a physical luminosity). The number of events you actually generate has to take into account all three efficiencies, see more about that below.

Main Sequence

Make a grid directory and get grids. In this example, we are generating a Z + bbbar + jets sample at 7 TeV c.o.m. We get the grids and efficiencies from Maurizio Pierini.

mkdir ZbbGrids_7TeV
wget http://cmsdoc.cern.ch/~mpierini/cms/alpgen_7TeV/PRODUCTION_zbb.tar.gz
tar -xzf PRODUCTION_zbb.tar.gz 
mv PRODUCTION ZbbGrids_7TeV
rm PRODUCTION_zbb.tar.gz

Make a working directory and subdirectories for each task. In this example, Maurizio wants a Z + bbbar + N jets, N from 0 to 3. So we set up a structure for that: one top-level directory and 4 subdirectories for each jet multiplicity.

mkdir Zbb7TeV 
mkdir Zbb7TeV/zbb_0j
mkdir Zbb7TeV/zbb_1j
mkdir Zbb7TeV/zbb_2j
mkdir Zbb7TeV/zbb_3j

Copy relevant files. Note that the "scripts" directory below is actually a place-holder for wherever you have the scripts shown in this page. The commands given below are just a short-handed way to copy all the relevant files to their places. The structure you're aiming for is to have, inside each subdirectory, an ALPGEN input file (usually named ''input'') and all the python (.py) files. The shell script files are to stay at the top-level directory

cp scripts/*.py scripts/*.sh Zbb7TeV
seq 0 3 | xargs -i bash -c "cp ZbbGrids_7TeV/PRODUCTION/input_zbb_{}j Zbb7TeV/zbb_{}j/input"
ls -d Zbb7TeV/*j | xargs -i bash -c "cp Zbb7TeV/*.py {}"

Fix input files. You usually need to fix the input files, at least for the number of warm-up iterations and number of events asked. Sometimes you need to add the random seeds setup explicitly as well. Usually you want to setup the inputs for NO warm-up iterations, and a reasonable number of events (use 5M is good if the farm is yours, 2.5M or even less if the farm is particularly busy).

emacs Zbb7TeV/zbb_0j/input 
emacs Zbb7TeV/zbb_1j/input 
emacs Zbb7TeV/zbb_2j/input 
emacs Zbb7TeV/zbb_3j/input

Fix scripts. You need to fix the location of the grids and executables in the doProduction.sh script, and the number of jobs asked in the doGrandProduction.sh script. Try to not submit more than 200 jobs per batch.

emacs Zbb7TeV/doProduction.sh 
emacs Zbb7TeV/doGrandProduction.sh

Run and pray.

cd Zbb7TeV
./doGrandProduction.sh
condor_q

Merge, shift, pack and deploy. You need to merge all the unweighted event files into a single one, and format shift that to the Les Houches Event (.lhe) format.

cd Zbb7TeV
python sprace_ALPGEN_merge.py 0

Check if the unw.par file is correctly created, and check if there are enough events. Create another batch of jobs if needed.

python sprace_ALPGEN_merge.py 1

This will create the .unw file with the actual events, and will suggest a command line for packing (.tar.gz) the original .wgt and .par files.

cmsRun test_ALPGEN_source_cfg.py

This will shift the unweighted events from .unw to .lhe files. Both .lhe and the .tar.gz created in the step above are suitable for uploading to MCDB.

FAQ

The requester has not given me the efficiencies I need. What to I do?

Weighted event efficiency: run an ALPGEN session in imode 1, with the same grids you were given, but asking for a small (~100K) number of events. Take note of the number of weighted events actually produced (the number of weighted events is the number of lines in the .wgt file). The efficiency is the number of events obtained over the number of events asked.

Unweighted event efficiency: run an ALPGEN session in imode 2 over a weighted events input file (.wgt file) containing a known number of events. The _unw.par file produced will have at the very end a line like the following:

 3548  1.0542532 ! unwtd events, lum (pb-1)

The first number (3548 in this case) is the number of unweighted events produced. The efficiency is the number of unweighted events over the number of weighted events.

Matching efficiency: using a .unw file containing a known number of unweighted events, run the testAlpgenComplete_cfg.py script, and check for the number given in the line:

********* Fraction of events that fail fragmentation cuts =  0.95000 *********

This number is 1 minus the efficiency - so, in the above example, the matching efficiency is 0.05.

How do I calculate the number of events / jobs I need?

Suppose the requester has asked for 1M events in a given channel. Assume the weighted production efficiency E_w is 10%, the unweighting efficiency E_u is 20%, and the matching efficiency E_m is 3%. So the total efficiency is E = E_w*E_u*E_m = 6E-04=. So the total number of events you have to produce is 1E06/6E-04 ~ 1.7E09, or roughly 1.7B events. Each job should ask for some 2.5M events, and you shouldn't be submitting more than ~ 200 jobs to the farm in a single batch. So, the number of batches needed is 1.7E09/(200*2.5E6) ~ 3.4. Rounding up, we come to the conclusion that 4 batches of 180 jobs, each job asking for 2.5M, is the ideal configuration for this request.

How do I setup the parameters of the production?

All ALPGEN parameters: setup in the input file, which usually resides in each working subdirectory.
Number of events per job: setup in the input file as well.
Number of jobs: setup in the doProduction.sh script - the last two parameters given when you execute the script define a numeric sequence that is used to label and submit the jobs. For instance, if the two parameters are 11 and 20, the script will submit 10 jobs labeled from 11 to 20.
Paths to executables and grids: setup in the doProduction.sh script - the first two parameters control the executable name and the grid name. Also, there are some internal parameters in the script - look at it before you run it!

Scripts

doGrandProduction.sh - script to run the doProduction.sh script for different channels.
doProduction.sh - script to submit a batch of jobs to SPRACE. Calls sprace_ALPGEN_submit.py for the heavy lifting.
fixFile.sh - postproduction script. Adjust and use this for your preproduction needs.
sprace_ALPGEN_merge.py. - merging script.
sprace_ALPGEN_submit.py. - main job submission script.
testAlpgenSource_cfg.py.txt - format shift (UNW to LHE).
theBigScript.sh - runs all the postproduction scripts.

WARNING The scripts linked below are completely out-of-date. Updated versions are always found at /hdacs/shared/scripts.

-- ThiagoTomei - 19 Jan 2010

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who
sh	doGrandProduction.sh	r1	manage	0.9 K	2009-11-11 - 11:18	ThiagoTomei
sh	doProduction.sh	r1	manage	0.7 K	2009-11-11 - 11:18	ThiagoTomei
sh	fixFile.sh	r1	manage	0.3 K	2009-11-11 - 11:19	ThiagoTomei
txt	sprace_ALPGEN_merge.py.txt	r1	manage	3.1 K	2009-11-11 - 11:19	ThiagoTomei
txt	sprace_ALPGEN_submit.py.txt	r1	manage	3.3 K	2009-11-11 - 11:19	ThiagoTomei
txt	testAlpgenSource_cfg.py.txt	r1	manage	0.5 K	2009-11-11 - 11:20	ThiagoTomei
sh	theBigScript.sh	r1	manage	1.2 K	2009-11-11 - 11:20	ThiagoTomei

Topic revision: r3 - 2010-01-19 - ThiagoTomei

antalya escort bursa escort eskisehir escort istanbul escort izmir escort