Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove RecordBase to speedup processing #57

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

xmedeko
Copy link
Contributor

@xmedeko xmedeko commented Mar 7, 2018

Fix #55
The second commit limits the data processor getattr.

@dtcooper
Copy link
Owner

dtcooper commented Mar 7, 2018

Looks decent. Thanks for submitting. Need more time to review more intimately. What sort of performance gains do you observe.

@xmedeko
Copy link
Contributor Author

xmedeko commented Mar 7, 2018

The first commit is about 10-15%, the second commit about 5%.

The second commit moved caching to the instance of processor, so, if someone uses a new processor every time and have small FITs, may have some performance degradation, maybe. Anyway, maybe the scrubbing is to complicated and may be simplified? Do not know.

@xmedeko
Copy link
Contributor Author

xmedeko commented Mar 9, 2018

I've added BaseType.size caching to the first commit, which is kind of micro-optimization.

@xmedeko
Copy link
Contributor Author

xmedeko commented Mar 16, 2018

Update: I've run benchmarks on my Win32 Intel i5 2.5GHz machine previously. I've tried it on our Linux x64 (Ubuntu Xenial) Intel Xeon 2.6GHz server, and it's 5x faster. (all running Python 3.5.2). IMHO it cannot be caused just by the faster processor, there must be some performance problem with Python on Windows or on x32 arch.

Anyway, removing RecordBase brings more flexibility to class constructor, e.g. see #58.

@xmedeko
Copy link
Contributor Author

xmedeko commented Apr 4, 2018

Just about the commit: FitFileDataProcessor cache methods not just method names
I thing the data_processor should be removed from the FitFile completely. It may be designed as a wrapper around DataMessage, so a user may plugin into the parsing process externally. Something like:

with FitFile(...) as ffile:
    pr = FitFileDataProcessor(ffile)
    for m in pr.get_messages(...):
        # m is processed

This way, we do not need any static caching, it's up to the user.

return method

scrubbed_method_name = scrub_method_name(method_name)
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do method = getattr(self, scrubbed_method_name, None) to use None as the default if the attribute doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr is slow for large FITs (e.g. 10 hours records). It's the purpose of this commit to avoid it and speed the processors.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. I'm suggesting using getattr(self, scrubbed_method_name, None) instead of the try/except for the initial caching.

def __repr__(self):
return '<FieldType: %s (%s)>' % (self.name, self.base_type)


class MessageType(RecordBase):
class MessageType():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, missing the object, thanks.

@xmedeko
Copy link
Contributor Author

xmedeko commented Apr 25, 2018

@pR0Ps thanks for CR. Meanwhile I've got mroe insight into the problem with the processors a bit. So I would discard the commit about the processors, make a issue for that where we can debate it more in depth.

@pR0Ps
Copy link
Collaborator

pR0Ps commented Apr 25, 2018

If your solution to the processors is the one you suggested above, I wouldn't recommend it. Making things "up to the user" is just pushing the problem along. If most users are going to want the values out of the FitFile object as standard SI units (likely), then it makes sense for that to be the default. The common case should be easy, and the advanced case (get raw values) should be possible. This is the way it is currently set up now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants