-
Notifications
You must be signed in to change notification settings - Fork 2
Wiki for the UMUDGA dataset
This wiki and this repository act as support for the dataset UMUDGA.
- Mattia Zago (mattia.zago [at] um [dot] es)
- Manuel Gil Pérez (mgilperez [at] um [dot] es)
- Gregorio Martínez Pérez (gregorio [at] um [dot] es)
Authors are with the Department of Information and Communications Engineering, University of Murcia, Spain
In computer security, botnets still represent a major cyber threat. Concealing techniques such as the dynamic addressing and the Domain Generation Algorithms (DGAs) require an improved and more effective detection process. To this extent, this data descriptor presents a collection of over 30 million manually-labelled algorithmically generated domain names decorated with a feature set ready-to-use for machine learning (ML) analysis. This proposed dataset enables researchers to move forward the data collection, organization and pre-processing phases, eventually enabling them to focus on the analysis and the production of ML-powered solutions for network intrusion detection. To be as exhaustive as possible, 50 among the most important malware variants have been selected. Each family is available both as list of domains and as collection of features. To be more precise, the former is generated by executing the malware DGAs in a controlled environment with fixed parameters, while the latter is generated by extracting a combination of statistical and Natural Language Processing (NLP) metrics.
T.B.A.
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "Scalable Detection of Botnets based on DGA: Efficient Feature Discovery Process in Machine Learning Techniques," Soft Comput., Jan. 2019. DOI: 10.1007/s00500-018-03703-8
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: a dataset for profiling DGA-based botnet," Computers & Security, Jan. 2020. DOI: 10.1016/j.cose.2020.101719
- M. Zago, M. Gil Pérez, and G. Martínez Pérez, "UMUDGA: University of Murcia domain generation algorithm dataset," Mendeley Data, Jan. 2020. DOI: 10.17632/y8ph45msv8.1