Skip to content
This repository has been archived by the owner on Apr 23, 2023. It is now read-only.

TA-dmarc - No longer pulling events - TypeError #48

Open
mikegvalerio opened this issue Apr 18, 2023 · 8 comments
Open

TA-dmarc - No longer pulling events - TypeError #48

mikegvalerio opened this issue Apr 18, 2023 · 8 comments

Comments

@mikegvalerio
Copy link

Good afternoon,

We have the TA-dmarc add-on installed on our HF and recently stopped receiving dmarc events on Friday, April 14, 2023. After reenabling the inputs and performing a restart on the HF, the events will still not come up. After performing a search on the source type for dmarc, we found the following error:

  • TypeError: expected string or bytes-like object

Are you aware of these issues and would you be able to possibly provide a fix when possible?

Please advise when you can, thank you!

@hkelley
Copy link
Collaborator

hkelley commented Apr 19, 2023

Is it possible that you were sent an improperly-formatted DMARC report on April 14, hence the runtime error? If you moved all of the reports from April 14 to another folder, and let the input run again, does it resume ingestion?

Could you paste the full log message with the error?

@mikegvalerio
Copy link
Author

No problem, please see the full log message below:

2023-04-19 20:45:36,598 ERROR pid=1370583 tid=MainThread file=base_modinput.py:log_error:309 | Get error when collecting events.
Traceback (most recent call last):
  File "/opt/splunk/etc/apps/TA-dmarc/bin/ta_dmarc/aob_py3/modinput_wrapper/base_modinput.py", line 128, in stream_events
    self.collect_events(ew)
  File "/opt/splunk/etc/apps/TA-dmarc/bin/dmarc_imap.py", line 92, in collect_events
    input_module.collect_events(self, ew)
  File "/opt/splunk/etc/apps/TA-dmarc/bin/input_module_dmarc_imap.py", line 80, in collect_events
    filelist = i2d.process_incoming()
  File "/opt/splunk/etc/apps/TA-dmarc/bin/dmarc/imap2dir.py", line 351, in process_incoming
    filelist = self.save_reports_from_message_bodies(response)
  File "/opt/splunk/etc/apps/TA-dmarc/bin/dmarc/imap2dir.py", line 256, in save_reports_from_message_bodies
    filename = self.write_part_to_file(uid, msg)
  File "/opt/splunk/etc/apps/TA-dmarc/bin/dmarc/imap2dir.py", line 186, in write_part_to_file
    filename = re.sub('[^\w\d!.-]', '', filename)
  File "/opt/splunk/lib/python3.7/re.py", line 194, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

@hkelley
Copy link
Collaborator

hkelley commented Apr 20, 2023

Do you have any emails that arrived on the 14th without the report attached (or with an abnormal name)?

@mikegvalerio
Copy link
Author

mikegvalerio commented Apr 20, 2023

We actually only got 1 email on the 14th that came from [email protected] but that was it.

@jorritfolmer
Copy link
Owner

The issue seems to be:

  1. An attachment of type zip, gz or xml was found so we prepare to save it to disk for further checking and processing.
  2. However, this attachment was nameless so get_filename() returned None which in turn caused re.sub() to fail because it expects bytes or string. None isn't that.

I'm working on a patch to catch this.

@jorritfolmer
Copy link
Owner

Here's a patch. I'm unable to test it so don't run it in production.

diff --git a/bin/dmarc/imap2dir.py b/bin/dmarc/imap2dir.py
index 9e07116..c60d2bd 100644
--- a/bin/dmarc/imap2dir.py
+++ b/bin/dmarc/imap2dir.py
@@ -9,6 +9,7 @@ from imapclient import IMAPClient
 import dkim
 import dns
 import msal
+import uuid
 
 # Copyright 2017-2020 Jorrit Folmer
 #
@@ -179,12 +180,55 @@ class Imap2Dir(object):
                 set(messageslist[x:x + fetch_size]), [b'RFC822']))
         return response
 
+    def get_file_ext_from_content_type(self,ctype,uid):
+        """ For an eligible mime-type, return an appropriate file extension string
+        """
+        self.helper.log_debug(
+            'get_file_ext_from_content_type: determining file extention for content-type %s of msg uid %d' %
+            (ctype, uid))
+        if ctype == "application/zip":
+            return "zip"
+        elif ctype == "application/gzip":
+            return "gz"
+        elif ctype == "application/x-gzip":
+            return "gz"
+        elif ctype == "application/octet-stream":
+            # Non-standard mimetype used by Amazon SES dmarc reports
+            return None
+        elif ctype == "application-x-gzip":
+            # Non-standard mimetype used by Comcast dmarc reports
+            return "gz"
+        elif ctype == "application/x-zip-compressed":
+            # Non-standard mimetype used by Yahoo dmarc reports
+            return "zip"
+        elif ctype == "application/xml":
+            return "xml"
+        elif ctype == "text/xml":
+            return "xml"
+        else:
+            self.helper.log_debug(
+                'get_file_ext_from_content_type: skipping content-type %s of msg uid %d' %
+                (ctype, uid))
+            return None        
+
     def write_part_to_file(self, uid, part):
         """ Write the selected message part to file """
         filename = part.get_filename()
-        # Sanitize filename, see issue #43
-        filename = re.sub('[^\w\d!.-]', '', filename)
-        filename = os.path.join(self.tmp_dir, os.path.basename(filename))
+        if filename is not None:
+            # Sanitize filename, see issue #43
+            filename = re.sub('[^\w\d!.-]', '', filename)
+            filename = os.path.join(self.tmp_dir, os.path.basename(filename))
+        else:
+            # Try to create a random file name because there isn't one in the mime header
+            # See issue #48.
+            # Since we don't have a file name, we have to guess the file extension based
+            # on the content type first.
+            ctype = part.get_content_type()
+            file_ext = self.get_file_ext_from_content_type(ctype, uid)
+            if file_ext is None:
+                # Guessing failed so skip writing this attachment
+                return None
+            filename = "{}.{}".format(uuid.uuid4(),file_ext)
         try:
             open(filename, 'wb').write(part.get_payload(decode=True))
         except Exception as e:
@@ -246,7 +290,8 @@ class Imap2Dir(object):
                     ctype = part.get_content_type()
                     if self.check_eligible_mimetype(ctype, uid):
                         filename = self.write_part_to_file(uid, part)
-                        filelist.append(filename)
+                        if filename is not None:
+                            filelist.append(filename)
             else:
                 self.helper.log_debug(
                     'save_reports_from_message_bodies: start non-multipart processing of msg uid  %d' %
@@ -254,7 +299,8 @@ class Imap2Dir(object):
                 ctype = msg.get_content_type()
                 if self.check_eligible_mimetype(ctype, uid):
                     filename = self.write_part_to_file(uid, msg)
-                    filelist.append(filename)
+                    if filename is not None:
+                        filelist.append(filename)
                 else:
                     self.helper.log_debug(
                         'save_reports_from_message_bodies: skipping content-type %s of msg uid %d' %

@mikegvalerio
Copy link
Author

No worries, thank you for the quick update on this @jorritfolmer ! Do you anticipate this rolling out in an upcoming update for the app?

@jorritfolmer
Copy link
Owner

No sorry I don’t think so, see #49

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants