Tags: , view all tags

Introduction to Searches

Performing a search for Physics beyond the standard model (a.k.a New Physics) at the LHC is no easy task. Here we try to give a brief outline of how this works.

Choose a prospective New Physics signal

For instance, if you're interested in Extra Dimensions (ED), you would choose one of the different flavours of ED developed by the theorists - Warped ED, Universal ED, etc. Then, you have to choose a particular channel / signature - for instance, the Warped ED graviton decaying into a pair of Z bosons, leading to a jets + dilepton signature. For the rest of this example, let's focus on exactly this channel: RS to ZZ to 2lep+jets.

Even if you plan on doing a model-independent analysis later, it is useful to have a model as benchmark. This gives you at least an approximate idea of what kind of New Physics signal you are going to search for. Also, in this step you will be able to at least pinpoint the basic standard model backgrounds that you will face. In the example, it is clear that standard model Z+jets production is going to be a very relevant background. Other processes which would contribute to the background would be fully leptonic ttbar + jets and multi vector bosons standard model production.

Simulation of signal samples

Since you want to know how your signal would look like in CMS, you need to simulate it. For the more common BSM models the basic generators like Pythia and Herwig can do the generation for you. For more specialized models, you may need to rely on other software, like JHUGenerator, CompHEP and others. For preliminary studies, generator-level events are good enough; for complete studies, simulation of the CMS detector with Geant and full reconstruction of the events is required.

This is also the moment to familiarize yourself with the characteristics of your signal, plotting some basic variables like, pt, eta, phi, mass of different objects in your events. In our example, a quick investigation would reveal that our signal is characterized by a same-flavour, opposite-sign dilepton system, with 1) very high pt 2) very low delta R between the two leptons 3) invariant mass around the mass of the Z boson. Additionally, our signal usually presents a single, high-pt jet with invariant mass around the mass of the Z boson as well.

Design your triggers

The CMS detector is equipped with a trigger system, which analyzes the data taken on real time and selects events with high probability of coming from interesting physics processes to be saved on permanent media. Events not selected by the trigger are irremediably lost! You should make sure that a fair fraction of your simulated signal samples are accepted by some trigger. In negative case, you should design a new trigger for your search, or you will have no data to analyze!

Data collection

The data collected by CMS is grouped in large sets named Primary Datasets (PDs). Each PD contains events which passed a set of closely related triggers. For our example, in 2012 we would take events from the DoubleMu and DoublePhotonHighPt (which, despite its name, contains events which passed the DoubleEle33 triggers). You would also select events which passed specifically the trigger of your choice. You should also veto events where not all parts of the detector were operating properly (with the golden JSON file) or where some characteristic of the events makes it look a lot like detector noise (many standard event filters do this).

Simulation of standard model backgrounds

This step is usually done centrally by specialized teams in CMS, since the backgrounds are the same for many people. You should pick simulated samples which contain the processes that are more likely to be a background to your search. You should also keep the characteristics of your signal in mind. In our case, since the signal presents high-pt Z bosons, it is better to choose a specialized Z boson simulated sample with only high-pt Zs; in this way, there will be more simulated events to model the background.

Choice of discriminating variables

A comparison of our simulated background to the simulated signal should make clear which variables better discriminate between the two. In our example, the jet invariant mass turns out to be a good discriminator - since the accompanying jet in Z+jets events is not coming from a Z boson, its mass is usually very low, when compared to the mass of the jet in the signal samples. The ranges of the variables which are optimal for signal selection are what is called the signal region. For our example, the signal region is: leptonic Z with mass in the (70,110) GeV range, hadronic Z (jet) with mass in the (70,110) GeV range.

Comparison of SM background and data in control regions

It is always important to validate your simulated background in some way. Usually, this is done by comparing the simulation with data in a control region - one where your signal is not expected to appear. Usually those control regions are taken to be ranges of discriminating variables close to the ones where most of the signal is expected to appear. Agreement between the simulation and the data in those control regions is fundamental; without that, you cannot really say that any decision you take by looking at the simulation is right. In our case, the control region is: leptonic Z with mass in the (70,110) GeV range, hadronic Z (jet) with mass in the (40,70) GeV range.

Aditionally, if you have multiple control regions, you can try to find relations in between to constrain the behaviour of your background. For instance, maybe you can demonstrate that the background in your signal region is always expected to be proportional to the same background in a control region! If so, you just have to measure that constant of proportionality, multiply it by the real data in the control region (which is expected to be almost purely background, otherwise it would not be a control region!), and you have an estimate of the same background in the signal region. This is what is called a data-driven method.

Estimation of SM background in signal region

In one way or another, you have to estimate the SM background in the signal region. Purely data-driven methods are the most reliable, but they may not always be possible. The next best thing is a properly validated SM simulation. The less you rely on simulation, the better; ratios of simulated distributions, for instance, are more reliable than the distributions themselves.

In our case, we don't have a purely-data driven method. However, we are going to take the ratio of the dilepton-jet mass distributions in the signal region and in the control region, for the SM background. We expect the simulation to not get this ratio that wrong. We multiply this ratio by the same distribution for the data in the background region, and we use that product as the estimate of the SM background in the signal region.