From 9f4dc062a89a71cdd176fe14c301cc9f0ff26203 Mon Sep 17 00:00:00 2001 From: Paul Hutelmyer Date: Thu, 8 Dec 2022 08:16:58 -0500 Subject: [PATCH 1/2] Update CHANGELOG.md --- CHANGELOG.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c19de3b5..cb77e8bb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,10 +1,15 @@ # Changelog Changes to the project will be tracked in this file via the date of change. +## 2022-12-08 +### Added +- Added `ScanMsi` Scanner. +- Added `ScanMsi` Scanner test. + ## 2022-12-07 ### Added - Added PyTest scanner testing functionality (@cawalch) -- Added several scanner tests (`ScanFooter, `ScanGif`, `ScanURL`) (@cawalch) +- Added several scanner tests (`ScanFooter`, `ScanGif`, `ScanURL`) (@cawalch) - Added documentation for test execution. ## 2022-11-18 From 473129c28c90fef61f7e6c16f2448442af3eb3d5 Mon Sep 17 00:00:00 2001 From: Paul Hutelmyer Date: Thu, 8 Dec 2022 08:20:03 -0500 Subject: [PATCH 2/2] Update README.md --- docs/README.md | 119 +++++++++++++++++++++++++------------------------ 1 file changed, 60 insertions(+), 59 deletions(-) diff --git a/docs/README.md b/docs/README.md index 7f21245e..1443d3dc 100644 --- a/docs/README.md +++ b/docs/README.md @@ -535,66 +535,67 @@ Each scanner parses files of a specific flavor and performs data collection and/ ### Scanner List The table below describes each scanner and its options. Each scanner has the hidden option "scanner_timeout" which can override the distribution scanner_timeout. -| Scanner Name | Scanner Description | Scanner Options | Contributor | -|-------------------|----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------| -| ScanAntiword | Extracts text from MS Word documents | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/") | -| ScanBatch | Collects metadata from batch script files | N/A | -| ScanBase64 | Decodes base64-encoded files | N/A | [Nathan Icart](https://github.com/nateicart) -| ScanBITS | Analyzes Windows BITS scheduler database files | N/A | -| ScanBzip2 | Decompresses bzip2 files | N/A | -| ScanCapa | Analyzes executable files with FireEye [capa](https://github.com/fireeye/capa) | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/")
"location" -- location of the capa rules file or directory (defaults to "/etc/capa/") | -| ScanCuckoo | Sends files to a Cuckoo sandbox | "url" -- URL of the Cuckoo sandbox (defaults to None)
"priority" -- Cuckoo priority assigned to the task (defaults to 3)
"timeout" -- amount of time (in seconds) to wait for the task to upload (defaults to 10)
"unique" -- boolean that tells Cuckoo to only analyze samples that have not been analyzed before (defaults to True)
"username" -- username used for authenticating to Cuckoo (defaults to None, optionally read from environment variable "CUCKOO_USERNAME")
"password" -- password used for authenticating to Cuckoo (defaults to None, optionally read from environment variable "CUCKOO_PASSWORD") | -| ScanDocx | Collects metadata and extracts text from docx files | "extract_text" -- boolean that determines if document text should be extracted as a child file (defaults to False) | -| ScanElf | Collects metadata from ELF files | N/A | -| ScanEmail | Collects metadata and extract files from email messages | N/A | -| ScanEncryptedDoc | Attempts to extract decrypted Office documents through brute force password cracking | "password_file" -- location of passwords file for encrypted documents (defaults to etc/strelka/passwords.txt) | -| ScanEntropy | Calculates entropy of files | N/A | -| ScanExiftool | Collects metadata parsed by Exiftool | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/")
"keys" -- list of keys to log (defaults to all) | +| Scanner Name | Scanner Description | Scanner Options | Contributor | +|-------------------|----------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------| +| ScanAntiword | Extracts text from MS Word documents | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/") | +| ScanBatch | Collects metadata from batch script files | N/A | +| ScanBase64 | Decodes base64-encoded files | N/A | [Nathan Icart](https://github.com/nateicart) +| ScanBITS | Analyzes Windows BITS scheduler database files | N/A | +| ScanBzip2 | Decompresses bzip2 files | N/A | +| ScanCapa | Analyzes executable files with FireEye [capa](https://github.com/fireeye/capa) | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/")
"location" -- location of the capa rules file or directory (defaults to "/etc/capa/") | +| ScanCuckoo | Sends files to a Cuckoo sandbox | "url" -- URL of the Cuckoo sandbox (defaults to None)
"priority" -- Cuckoo priority assigned to the task (defaults to 3)
"timeout" -- amount of time (in seconds) to wait for the task to upload (defaults to 10)
"unique" -- boolean that tells Cuckoo to only analyze samples that have not been analyzed before (defaults to True)
"username" -- username used for authenticating to Cuckoo (defaults to None, optionally read from environment variable "CUCKOO_USERNAME")
"password" -- password used for authenticating to Cuckoo (defaults to None, optionally read from environment variable "CUCKOO_PASSWORD") | +| ScanDocx | Collects metadata and extracts text from docx files | "extract_text" -- boolean that determines if document text should be extracted as a child file (defaults to False) | +| ScanElf | Collects metadata from ELF files | N/A | +| ScanEmail | Collects metadata and extract files from email messages | N/A | +| ScanEncryptedDoc | Attempts to extract decrypted Office documents through brute force password cracking | "password_file" -- location of passwords file for encrypted documents (defaults to etc/strelka/passwords.txt) | +| ScanEntropy | Calculates entropy of files | N/A | +| ScanExiftool | Collects metadata parsed by Exiftool | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/")
"keys" -- list of keys to log (defaults to all) | | ScanFalconSandbox | Sends files to an instance of Falcon Sandbox | "server" -- URL of the Falcon Sandbox API inteface
"priority" -- Falcon Sandbox priority assigned to the task (defaults to 3)
"timeout" -- amount of time (in seconds) to wait for the task to upload (defaults to 60)
"envID" -- list of numeric envrionment IDs that tells Falcon Sandbox which sandbox to submit a sample to (defaults to [100])
"api_key" -- API key used for authenticating to Falcon Sandbox (defaults to None, optionally read from environment variable "FS_API_KEY")
"api_secret" -- API secret key used for authenticating to Falcon Sandbox (defaults to None, optionally read from environment variable "FS_API_SECKEY") | -| ScanFloss | Analyzes executable files with FireEye [floss](https://github.com/fireeye/flare-floss) | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/")
"limit" -- Maximum amount of strings to collect. (defaults to 100) | -| ScanGif | Extracts data embedded in GIF files | N/A | -| ScanGzip | Decompresses gzip files | N/A -| ScanHash | Calculates file hash values | N/A | -| ScanHeader | Collects file header | "length" -- number of header characters to log as metadata (defaults to 50) | -| ScanHtml | Collects metadata and extracts embedded files from HTML files | "parser" -- sets the HTML parser used during scanning (defaults to "html.parser") | -| ScanIni | Parses keys from INI files | N/A | -| ScanIso | Collects and extracts files from ISO files | "limit" -- maximum number of files to extract (defaults to 0) | -| ScanJarManifest | Collects metadata from JAR manifest files | N/A | -| ScanJavascript | Collects metadata from Javascript files | "beautify" -- beautifies JavaScript before parsing (defaults to True) | -| ScanJpeg | Extracts data embedded in JPEG files | N/A | -| ScanJson | Collects keys from JSON files | N/A | -| ScanLibarchive | Extracts files from libarchive-compatible archives. | "limit" -- maximum number of files to extract (defaults to 1000) | -| ScanLnk | Collects metadata from lnk files. | N/A | Ryan Borre, [DerekT2](https://github.com/Derekt2), [Nathan Icart](https://github.com/nateicart) -| ScanLzma | Decompresses lzma files | N/A | -| ScanMacho | Collects metadata from Mach-O files | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/") | -| ScanManifest | Collects metadata from Chrome Manifest files | N/A | [DerekT2](https://github.com/Derekt2) -| ScanMmbot | Collects VB results from a server running mmbotd | "server" -- network address and network port of the mmbotd server (defaults to "127.0.0.1:33907")
"timeout" -- amount of time (in milliseconds) to wait for a response from the server (defaults to 10000) | -| ScanOcr | Collects metadata and extracts optical text from image files | "extract_text" -- boolean that determines if document text should be extracted as a child file (defaults to False)
"tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | -| ScanOle | Extracts files from OLECF files | N/A | -| ScanPdf | Collects metadata and extracts streams from PDF files | N/A | -| ScanPe | Collects metadata from PE files | N/A | -| ScanPgp | Collects metadata from PGP files | N/A | -| ScanPhp | Collects metadata from PHP files | N/A | -| ScanPkcs7 | Extracts files from PKCS7 certificate files | N/A | -| ScanPlist | Collects attributes from binary and XML property list files | "keys" -- list of keys to log (defaults to all) | -| ScanQr | Collects QR code metadata from image files | N/A | [Aaron Herman](https://github.com/aaronherman) -| ScanRar | Extracts files from RAR archives | "limit" -- maximum number of files to extract (defaults to 1000)
"password_file" -- location of passwords file for RAR archives (defaults to etc/strelka/passwords.txt) | -| ScanRpm | Collects metadata and extracts files from RPM files | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | -| ScanRtf | Extracts embedded files from RTF files | "limit" -- maximum number of files to extract (defaults to 1000) | -| ScanStrings | Collects strings from file data | "limit" -- maximum number of strings to collect, starting from the beginning of the file (defaults to 0, collects all strings) | -| ScanSwf | Decompresses swf (Flash) files | N/A | -| ScanTar | Extract files from tar archives | "limit" -- maximum number of files to extract (defaults to 1000) | -| ScanTnef | Collects metadata and extract files from TNEF files | N/A | -| ScanUpx | Decompresses UPX packed files | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | -| ScanUrl | Collects URLs from files | "regex" -- dictionary entry that establishes the regular expression pattern used for URL parsing (defaults to a widely scoped regex) | -| ScanVb | Collects metadata from Visual Basic script files | N/A | -| ScanVba | Extracts and analyzes VBA from document files | "analyze_macros" -- boolean that determines if macros should be analyzed (defaults to True) | -| ScanX509 | Collects metadata from x509 and CRL files | "type" -- string that determines the type of x509 certificate being scanned (no default, assigned as either "der" or "pem" depending on flavor) | -| ScanXL4MA | Analyzes and parses Excel 4 Macros from XLSX files | "type" -- string that determines the type of x509 certificate being scanned (no default, assigned as either "der" or "pem" depending on flavor) | Ryan Borre -| ScanXml | Log metadata and extract files from XML files | "extract_tags" -- list of XML tags that will have their text extracted as child files (defaults to empty list)
"metadata_tags" -- list of XML tags that will have their text logged as metadata (defaults to empty list) | -| ScanYara | Scans files with YARA rules | "location" -- location of the YARA rules file or directory (defaults to "/etc/yara/")
"metadata_identifiers" -- list of YARA rule metadata identifiers (e.g. "Author") that should be logged as metadata (defaults to empty list) | -| ScanZip | Extracts files from zip archives | "limit" -- maximum number of files to extract (defaults to 1000)
"password_file" -- location of passwords file for zip archives (defaults to etc/strelka/passwords.txt) | -| ScanZlib | Decompresses gzip files | N/A +| ScanFloss | Analyzes executable files with FireEye [floss](https://github.com/fireeye/flare-floss) | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/")
"limit" -- Maximum amount of strings to collect. (defaults to 100) | +| ScanGif | Extracts data embedded in GIF files | N/A | +| ScanGzip | Decompresses gzip files | N/A +| ScanHash | Calculates file hash values | N/A | +| ScanHeader | Collects file header | "length" -- number of header characters to log as metadata (defaults to 50) | +| ScanHtml | Collects metadata and extracts embedded files from HTML files | "parser" -- sets the HTML parser used during scanning (defaults to "html.parser") | +| ScanIni | Parses keys from INI files | N/A | +| ScanIso | Collects and extracts files from ISO files | "limit" -- maximum number of files to extract (defaults to 0) | +| ScanJarManifest | Collects metadata from JAR manifest files | N/A | +| ScanJavascript | Collects metadata from Javascript files | "beautify" -- beautifies JavaScript before parsing (defaults to True) | +| ScanJpeg | Extracts data embedded in JPEG files | N/A | +| ScanJson | Collects keys from JSON files | N/A | +| ScanLibarchive | Extracts files from libarchive-compatible archives. | "limit" -- maximum number of files to extract (defaults to 1000) | +| ScanLnk | Collects metadata from lnk files. | N/A | Ryan Borre, [DerekT2](https://github.com/Derekt2), [Nathan Icart](https://github.com/nateicart) +| ScanLzma | Decompresses lzma files | N/A | +| ScanMacho | Collects metadata from Mach-O files | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/") | +| ScanManifest | Collects metadata from Chrome Manifest files | N/A | [DerekT2](https://github.com/Derekt2) +| ScanMmbot | Collects VB results from a server running mmbotd | "server" -- network address and network port of the mmbotd server (defaults to "127.0.0.1:33907")
"timeout" -- amount of time (in milliseconds) to wait for a response from the server (defaults to 10000) | +| ScanMsi | Collects MSI data parsed by Exiftool | "tempfile_directory" -- location where tempfile writes temporary files (defaults to "/tmp/")
"keys" -- list of keys to log (defaults to all) | +| ScanOcr | Collects metadata and extracts optical text from image files | "extract_text" -- boolean that determines if document text should be extracted as a child file (defaults to False)
"tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | +| ScanOle | Extracts files from OLECF files | N/A | +| ScanPdf | Collects metadata and extracts streams from PDF files | N/A | +| ScanPe | Collects metadata from PE files | N/A | +| ScanPgp | Collects metadata from PGP files | N/A | +| ScanPhp | Collects metadata from PHP files | N/A | +| ScanPkcs7 | Extracts files from PKCS7 certificate files | N/A | +| ScanPlist | Collects attributes from binary and XML property list files | "keys" -- list of keys to log (defaults to all) | +| ScanQr | Collects QR code metadata from image files | N/A | [Aaron Herman](https://github.com/aaronherman) +| ScanRar | Extracts files from RAR archives | "limit" -- maximum number of files to extract (defaults to 1000)
"password_file" -- location of passwords file for RAR archives (defaults to etc/strelka/passwords.txt) | +| ScanRpm | Collects metadata and extracts files from RPM files | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | +| ScanRtf | Extracts embedded files from RTF files | "limit" -- maximum number of files to extract (defaults to 1000) | +| ScanStrings | Collects strings from file data | "limit" -- maximum number of strings to collect, starting from the beginning of the file (defaults to 0, collects all strings) | +| ScanSwf | Decompresses swf (Flash) files | N/A | +| ScanTar | Extract files from tar archives | "limit" -- maximum number of files to extract (defaults to 1000) | +| ScanTnef | Collects metadata and extract files from TNEF files | N/A | +| ScanUpx | Decompresses UPX packed files | "tempfile_directory" -- location where `tempfile` will write temporary files (defaults to "/tmp/") | +| ScanUrl | Collects URLs from files | "regex" -- dictionary entry that establishes the regular expression pattern used for URL parsing (defaults to a widely scoped regex) | +| ScanVb | Collects metadata from Visual Basic script files | N/A | +| ScanVba | Extracts and analyzes VBA from document files | "analyze_macros" -- boolean that determines if macros should be analyzed (defaults to True) | +| ScanX509 | Collects metadata from x509 and CRL files | "type" -- string that determines the type of x509 certificate being scanned (no default, assigned as either "der" or "pem" depending on flavor) | +| ScanXL4MA | Analyzes and parses Excel 4 Macros from XLSX files | "type" -- string that determines the type of x509 certificate being scanned (no default, assigned as either "der" or "pem" depending on flavor) | Ryan Borre +| ScanXml | Log metadata and extract files from XML files | "extract_tags" -- list of XML tags that will have their text extracted as child files (defaults to empty list)
"metadata_tags" -- list of XML tags that will have their text logged as metadata (defaults to empty list) | +| ScanYara | Scans files with YARA rules | "location" -- location of the YARA rules file or directory (defaults to "/etc/yara/")
"metadata_identifiers" -- list of YARA rule metadata identifiers (e.g. "Author") that should be logged as metadata (defaults to empty list) | +| ScanZip | Extracts files from zip archives | "limit" -- maximum number of files to extract (defaults to 1000)
"password_file" -- location of passwords file for zip archives (defaults to etc/strelka/passwords.txt) | +| ScanZlib | Decompresses gzip files | N/A ## Tests As Strelka consists of many scanners and dependencies for those scanners, Pytests are particularly valuable for testing the ongoing functionality of Strelka and it's scanners. Tests allow users to write test cases that verify the correct behavior of Strelka scanners to ensure that the scanners remain reliable and accurate. Additionally, using pytests can help streamline the development process, allowing developers to focus on writing new features and improvements for the scanners. The following section details how to setup Pytests.