Skip to content

For developing and reproducing ML + HEP projects.

License

Notifications You must be signed in to change notification settings

CarlosSarasty/JetNet

 
 

Repository files navigation

For developing and reproducing ML + HEP projects.


JetNetInstallationQuickstartDocumentationContributingCitationReferences


CI Documentation Status Codestyle pre-commit.ci status

PyPI Version PyPI Downloads DOI DOI


JetNet

JetNet is an effort to increase accessibility and reproducibility in jet-based machine learning.

Currently we provide:

  • Easy-to-access and standardised interfaces for the following datasets:
  • Standard implementations of generative evaluation metrics (Ref. [1, 2]), including:
    • Fréchet physics distance (FPD)
    • Kernel physics distance (KPD)
    • Wasserstein-1 (W1)
    • Fréchet ParticleNet Distance (FPND)
    • coverage and minimum matching distance (MMD)
  • Loss functions:
    • Differentiable implementation of the energy mover's distance [3]
  • And more general jet utilities.

Additional functionality is under development, and please reach out if you're interested in contributing!

Installation

JetNet can be installed with pip:

pip install jetnet

To use the differentiable EMD loss jetnet.losses.EMDLoss, additional libraries must be installed via

pip install "jetnet[emdloss]"

Finally, PyTorch Geometric must be installed independently for the Fréchet ParticleNet Distance metric jetnet.evaluation.fpnd (Installation instructions).

Quickstart

Datasets can be downloaded and accessed quickly, for example:

from jetnet.datasets import JetNet, TopTagging

# as numpy arrays:
particle_data, jet_data = JetNet.getData(
    jet_type=["g", "q"], data_dir="./datasets/jetnet/", download=True
)
# or as a PyTorch dataset:
dataset = TopTagging(
    jet_type="all", data_dir="./datasets/toptagging/", split="train", download=True
)

Evaluation metrics can be used as such:

generated_jets = np.random.rand(50000, 30, 3)
fpnd_score = jetnet.evaluation.fpnd(generated_jets, jet_type="g")

Loss functions can be initialized and used similarly to standard PyTorch in-built losses such as MSE:

emd_loss = jetnet.losses.EMDLoss(num_particles=30)
loss = emd_loss(real_jets, generated_jets)
loss.backward()

Documentation

The full API reference and tutorials are available at jetnet.readthedocs.io. Tutorial notebooks are in the tutorials folder, with more to come.

Contributing

We welcome feedback and contributions! Please feel free to create an issue for bugs or functionality requests, or open pull requests from your forked repo to solve them.

Building and testing locally

Perform an editable installation of the package from inside your forked repo and install the pytest package for unit testing:

pip install -e .
pip install pytest

Run the test suite to ensure everything is working as expected:

pytest tests                    # tests all datasets
pytest tests -m "not slow"      # tests only on the JetNet dataset for convenience

Citation

If you use this library for your research, please cite our article in the Journal of Open Source Software:

@article{Kansal_JetNet_2023,
  author = {Kansal, Raghav and Pareja, Carlos and Hao, Zichun and Duarte, Javier},
  doi = {10.21105/joss.05789},
  journal = {Journal of Open Source Software},
  number = {90},
  pages = {5789},
  title = {{JetNet: A Python package for accessing open datasets and benchmarking machine learning methods in high energy physics}},
  url = {https://joss.theoj.org/papers/10.21105/joss.05789},
  volume = {8},
  year = {2023}
}

Please further cite the following if you use these components of the library.

JetNet dataset or FPND

@inproceedings{Kansal_MPGAN_2021,
  author = {Kansal, Raghav and Duarte, Javier and Su, Hao and Orzari, Breno and Tomei, Thiago and Pierini, Maurizio and Touranakou, Mary and Vlimant, Jean-Roch and Gunopulos, Dimitrios},
  booktitle = "{Advances in Neural Information Processing Systems}",
  editor = {M. Ranzato and A. Beygelzimer and Y. Dauphin and P.S. Liang and J. Wortman Vaughan},
  pages = {23858--23871},
  publisher = {Curran Associates, Inc.},
  title = {Particle Cloud Generation with Message Passing Generative Adversarial Networks},
  url = {https://proceedings.neurips.cc/paper_files/paper/2021/file/c8512d142a2d849725f31a9a7a361ab9-Paper.pdf},
  volume = {34},
  year = {2021},
  eprint = {2106.11535},
  archivePrefix = {arXiv},
}

FPD or KPD

@article{Kansal_Evaluating_2023,
  author = {Kansal, Raghav and Li, Anni and Duarte, Javier and Chernyavskaya, Nadezda and Pierini, Maurizio and Orzari, Breno and Tomei, Thiago},
  title = {Evaluating generative models in high energy physics},
  reportNumber = "FERMILAB-PUB-22-872-CMS-PPD",
  doi = "10.1103/PhysRevD.107.076017",
  journal = "{Phys. Rev. D}",
  volume = "107",
  number = "7",
  pages = "076017",
  year = "2023",
  eprint = "2211.10295",
  archivePrefix = "arXiv",
}

EMD Loss

Please cite the respective qpth or cvxpy libraries, depending on the method used (qpth by default), as well as the original EMD paper [3].

References

[1] R. Kansal et al., Particle Cloud Generation with Message Passing Generative Adversarial Networks, NeurIPS 2021 [2106.11535].

[2] R. Kansal et al., Evaluating Generative Models in High Energy Physics, Phys. Rev. D 107 (2023) 076017 [2211.10295].

[3] P. T. Komiske, E. M. Metodiev, and J. Thaler, The Metric Space of Collider Events, Phys. Rev. Lett. 123 (2019) 041801 [1902.02346].

About

For developing and reproducing ML + HEP projects.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 57.8%
  • Python 42.2%