Difference: AnalysisSprace (1 vs. 11)

Revision 102013-08-14 - trtomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 98 to 98
 

PATtuple making

Changed:
<
<
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis. The idea is that you
>
>
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis.

SpracePackage

The SPRACE Package is acessible in GitHub and contains some code used for the EXOTICA analyses in SPRACE.

See: https://github.com/trtomei/SpracePackage

  -- ThiagoTomei - 30 May 2012

META FILEATTACHMENT attachment="Slide1.png" attr="" comment="" date="1337021869" name="Slide1.png" path="Slide1.png" size="128636" stream="Slide1.png" user="Main.ThiagoTomei" version="1"
Added:
>
>
META TOPICMOVED by="trtomei" date="1376478739" from="Main.AnalysisSPRACE2012" to="Main.AnalysisSprace"

Revision 92012-05-30 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 98 to 98
 

PATtuple making

Changed:
<
<
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis. For instance
>
>
In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis. The idea is that you
  -- ThiagoTomei - 30 May 2012

Revision 82012-05-30 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 41 to 41
  In this way, we break the analysis in a hierarchical way: run on large datasets in the GRID, make smaller datasets to run on SPRACE, make Pattuples / ntuples to run in your computer. I think this is the more efficient strategy.
Changed:
<
<

Strategy for Real Data - skimming

>
>

Skimming

 
  1. Get the most recent JSON file for the link above
  2. If you have already run on some data, do the difference in between the data you've already run upon and the new data with:

Line: 96 to 96
 
  • You can skim on trigger bits. To do that, you use the TriggerResultsFilter module. You can see an example of trigger-based skimming in this link.
  • You can skim on basic RECO quantities. You use normal EDFilters for that. For inspiration, you can look at the official CMS skimming page. Notice that those are the official skims - what we are doing here is mimicking that scheme.
Changed:
<
<
-- ThiagoTomei - 14 May 2012
>
>

PATtuple making

In CMS we use the Physics Analysis Toolkit (PAT) to steer our analyses. It is a set of standard EDModules and configuration files that act as building blocks for you to build you analysis. For instance

-- ThiagoTomei - 30 May 2012

 
META FILEATTACHMENT attachment="Slide1.png" attr="" comment="" date="1337021869" name="Slide1.png" path="Slide1.png" size="128636" stream="Slide1.png" user="Main.ThiagoTomei" version="1"

Revision 72012-05-14 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Added:
>
>
 

Available Datasets:

All datasets available in SPRACE (which are AOD or AODSIM) can be found in this link. For 2012, we have available the following real data datasets:

Line: 37 to 39
 
  1. Make Pattuples contataining everything you need for your analysis. Run on these using Condor. Save at SPRACE
  2. Make basic ROOT ntuples containing very basic information for optimization // plots. Run on these at the interactive access server and/or your laptop.
Added:
>
>
In this way, we break the analysis in a hierarchical way: run on large datasets in the GRID, make smaller datasets to run on SPRACE, make Pattuples / ntuples to run in your computer. I think this is the more efficient strategy.
 

Strategy for Real Data - skimming

  1. Get the most recent JSON file for the link above
  2. If you have already run on some data, do the difference in between the data you've already run upon and the new data with:
       compareJSON.py --sub <mostRecent.json> <dataAlreadyUsed.json> <fileForNewDataOnly.json>
       
Changed:
<
<
  1. Setup a CRAB job with the file for the new data only:
>
>
  1. Setup a CRAB job with the file for the new data only. In this example, we're running on the /MET/Run2012A-PromptReco-v1/AOD with the rsanalyzer_JetMET_skimming_Run2012A_cfg.py configuration file. We're setting up a task with around 75 jobs, and we will copy the output to the remote directory /MET_Run2012A-PromptReco_v1_2012May10, which lives in srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/cms/store/user/yourUserName/MET_Run2012A-PromptReco_v1_2012May10. Naturally, you have to setup these values for the ones you want.
 
[CRAB]
jobtype = cmssw
Line: 68 to 72
 [GRID] ce_white_list = T2_BR_SPRACE
Changed:
<
<
In this example, we're running on the /MET/Run2012A-PromptReco-v1/AOD with the rsanalyzer_JetMET_skimming_Run2012A_cfg.py configuration file. We're setting up a task with around 75 jobs, and we will copy the output to the remote directory /MET_Run2012A-PromptReco_v1_2012May10, which lives in srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/cms/store/user/yourUserName/MET_Run2012A-PromptReco_v1_2012May10. Naturally, you have to setup these values for the ones you want.
  1. Do the usual CRAB thing to get the output, but you also want the final report:
>
>
  1. Do the usual CRAB thing to get the output, but you also want the final report. This will produce a JSON file which resides in myWorkingDirName/res/lumiSummary.json. This file represents exactly the data over which you ran over, taking into account failed jobs, blocks of data which were not yet available, etc.
 
crab -status -c myWorkingDirName
crab -getoutput -c myWorkingDirName
crab -report -c myWorkingDirName
Changed:
<
<
This will produce a JSON file which resides in myWorkingDirName/res/lumiSummary.json. This file represents exactly the data over which you ran over, taking into account failed jobs, blocks of data which were not yet available, etc. This is the "dataAlreadyUsed.json" that you should use for the next time! To get the amount of luminosity that you ran over, use the lumiCalc2.py script:
>
>
  1. To get the amount of luminosity that you ran over, use the lumiCalc2.py script:
 
lumiCalc2.py -b stable -i lumiSummary.json overview
Added:
>
>
  1. You should add this data to the set of data you already ran over. You do this by merging the JSON files. The syntax is:
    mergeJSON.py previousData.json dataYouJustRanOver.json --output=totalData.json
       

The following picture shows this process schematically:
Slide1.png

What should I use for this skimming step?

Naturally, it depends on your specific analysis channel. Remember that the goal is to separate the analysis hierarchically - run over large datasets using the GRID, preselect/reduce them to more manageable sizes, bring them to SPRACE and run the rest of the analysis more or less locally. If you make a very complicated preselection in the GRID, it starts to become comparable to make the whole analysis there, and defeats the whole idea. So, some general points:

  • You can skim on trigger bits. To do that, you use the TriggerResultsFilter module. You can see an example of trigger-based skimming in this link.
  • You can skim on basic RECO quantities. You use normal EDFilters for that. For inspiration, you can look at the official CMS skimming page. Notice that those are the official skims - what we are doing here is mimicking that scheme.

-- ThiagoTomei - 14 May 2012

 
Changed:
<
<
-- ThiagoTomei - 09 May 2012
>
>
META FILEATTACHMENT attachment="Slide1.png" attr="" comment="" date="1337021869" name="Slide1.png" path="Slide1.png" size="128636" stream="Slide1.png" user="Main.ThiagoTomei" version="1"

Revision 62012-05-14 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 37 to 37
 
  1. Make Pattuples contataining everything you need for your analysis. Run on these using Condor. Save at SPRACE
  2. Make basic ROOT ntuples containing very basic information for optimization // plots. Run on these at the interactive access server and/or your laptop.
Added:
>
>

Strategy for Real Data - skimming

  1. Get the most recent JSON file for the link above
  2. If you have already run on some data, do the difference in between the data you've already run upon and the new data with:
       compareJSON.py --sub <mostRecent.json> <dataAlreadyUsed.json> <fileForNewDataOnly.json>
       
  3. Setup a CRAB job with the file for the new data only:
    [CRAB]
    jobtype = cmssw
    scheduler = glite
    use_server = 0
    
    [CMSSW]
    datasetpath=/MET/Run2012A-PromptReco-v1/AOD
    pset=rsanalyzer_JetMET_skimming_Run2012A_cfg.py
    total_number_of_lumis=-1
    number_of_jobs = 75
    lumi_mask=fileForNewDataOnly.json
    get_edm_output = 1
    
    [USER]
    copy_data = 1
    return_data = 0
    storage_element = T2_BR_SPRACE
    user_remote_dir = /MET_Run2012A-PromptReco_v1_2012May10
    ui_working_dir = myWorkingDirName
    
    [GRID]
    ce_white_list = T2_BR_SPRACE
       
In this example, we're running on the /MET/Run2012A-PromptReco-v1/AOD with the rsanalyzer_JetMET_skimming_Run2012A_cfg.py configuration file. We're setting up a task with around 75 jobs, and we will copy the output to the remote directory /MET_Run2012A-PromptReco_v1_2012May10, which lives in srm://osg-se.sprace.org.br:8443/srm/managerv2?SFN=/pnfs/sprace.org.br/data/cms/store/user/yourUserName/MET_Run2012A-PromptReco_v1_2012May10. Naturally, you have to setup these values for the ones you want.
  1. Do the usual CRAB thing to get the output, but you also want the final report:
crab -status -c myWorkingDirName
crab -getoutput -c myWorkingDirName
crab -report -c myWorkingDirName
This will produce a JSON file which resides in myWorkingDirName/res/lumiSummary.json. This file represents exactly the data over which you ran over, taking into account failed jobs, blocks of data which were not yet available, etc. This is the "dataAlreadyUsed.json" that you should use for the next time! To get the amount of luminosity that you ran over, use the lumiCalc2.py script:
lumiCalc2.py -b stable -i lumiSummary.json overview
 -- ThiagoTomei - 09 May 2012 \ No newline at end of file

Revision 52012-05-11 - JoseRuiz

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 11 to 11
  Run2012 datasets can be found in this link
Changed:
<
<
Summer12 MC datasets can be found in
>
>
Summer12 MC datasets can be found in this link
 

JSON files

JSON files for 2012 Runs at 8 TeV can be found in this link.

Revision 42012-05-11 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 14 to 14
 Summer12 MC datasets can be found in

JSON files

Changed:
<
<
JSON files for 2012 Runs at 8 TeV can be found in this link.
>
>
JSON files for 2012 Runs at 8 TeV can be found in this link.
 

Analysis steps

Revision 32012-05-09 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 9 to 9
 
  • Double Electron
  • MET
Added:
>
>
Run2012 datasets can be found in this link

Summer12 MC datasets can be found in

 

JSON files

JSON files for 2012 Runs at 8 TeV can be found in this link.

Revision 22012-05-09 - ThiagoTomei

Line: 1 to 1
 
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Line: 11 to 11
 

JSON files

Added:
>
>
JSON files for 2012 Runs at 8 TeV can be found in this link.

Analysis steps

These links should be useful.

Online list of runs, triggers, etc.

General Analysis Strategy

In general, we advocate the following strategy:

  1. Download datasets to SPRACE (optional)
  2. Skim on basic reconstructed quantities // trigger bits. Run on GRID with CRAB. Save at SPRACE
  3. Make Pattuples contataining everything you need for your analysis. Run on these using Condor. Save at SPRACE
  4. Make basic ROOT ntuples containing very basic information for optimization // plots. Run on these at the interactive access server and/or your laptop.
 -- ThiagoTomei - 09 May 2012

Revision 12012-05-09 - ThiagoTomei

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="AnalysisOpenSpace"

Analysis in SPRACE 2012

Available Datasets:

All datasets available in SPRACE (which are AOD or AODSIM) can be found in this link. For 2012, we have available the following real data datasets:

  • Single Muon
  • Double Electron
  • MET

JSON files

-- ThiagoTomei - 09 May 2012

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback

antalya escort bursa escort eskisehir escort istanbul escort izmir escort