All datasets available in SPRACE (which are AOD or AODSIM) can be found in this link. For 2012, we have available the following real data datasets:
Run2012 datasets can be found in this link
Summer12 MC datasets can be found in this link
JSON files for 2012 Runs at 8 TeV can be found in this link.
These links should be useful.
In general, we advocate the following strategy:
In this way, we break the analysis in a hierarchical way: run on large datasets in the GRID, make smaller datasets to run on SPRACE, make Pattuples / ntuples to run in your computer. I think this is the more efficient strategy.
compareJSON.py --sub <mostRecent.json> <dataAlreadyUsed.json> <fileForNewDataOnly.json>
/MET/Run2012A-PromptReco-v1/AOD
with the rsanalyzer_JetMET_skimming_Run2012A_cfg.py
configuration file. We're setting up a task with around 75 jobs, and we will copy the output to the remote directory /MET_Run2012A-PromptReco_v1_2012May10
, which lives in srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/cms/store/user/yourUserName/MET_Run2012A-PromptReco_v1_2012May10
. Naturally, you have to setup these values for the ones you want. [CRAB] jobtype = cmssw scheduler = glite use_server = 0 [CMSSW] datasetpath=/MET/Run2012A-PromptReco-v1/AOD pset=rsanalyzer_JetMET_skimming_Run2012A_cfg.py total_number_of_lumis=-1 number_of_jobs = 75 lumi_mask=fileForNewDataOnly.json get_edm_output = 1 [USER] copy_data = 1 return_data = 0 storage_element = T2_BR_SPRACE user_remote_dir = /MET_Run2012A-PromptReco_v1_2012May10 ui_working_dir = myWorkingDirName [GRID] ce_white_list = T2_BR_SPRACE
myWorkingDirName/res/lumiSummary.json
. This file represents exactly the data over which you ran over, taking into account failed jobs, blocks of data which were not yet available, etc. crab -status -c myWorkingDirName crab -getoutput -c myWorkingDirName crab -report -c myWorkingDirName
lumiCalc2.py
script: lumiCalc2.py -b stable -i lumiSummary.json overview
mergeJSON.py previousData.json dataYouJustRanOver.json --output=totalData.json
The following picture shows this process schematically:
Naturally, it depends on your specific analysis channel. Remember that the goal is to separate the analysis hierarchically - run over large datasets using the GRID, preselect/reduce them to more manageable sizes, bring them to SPRACE and run the rest of the analysis more or less locally. If you make a very complicated preselection in the GRID, it starts to become comparable to make the whole analysis there, and defeats the whole idea. So, some general points:
TriggerResultsFilter
module. You can see an example of trigger-based skimming in this link
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis. For instance
-- ThiagoTomei - 30 May 2012
I | Attachment | History | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|---|
![]() |
Slide1.png | r1 | manage | 125.6 K | 2012-05-14 - 18:57 | ThiagoTomei |
antalya escort bursa escort eskisehir escort istanbul escort izmir escort