Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sped up the creation and filling of datasets #30

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

clayton-ho
Copy link

Sped up the creation and filling of datasets (in Dataset.py), making the RSG over an order of magnitude faster for large datasets.
This requires the datavault server to have a setting "shape" which returns the shape of the dataset.
Instead of initializing an empty numpy array and appending successive data grabs from the datavault, we now use the datavault's shape setting to preallocate a numpy array and fill it with data.
Additionally, the size of each data grab has been increased from 100 to 1000 since most of the overhead arises from filling the dataset.

@clayton-ho
Copy link
Author

I realized that I was stupid and forgot to account for live datasets, in which case the Dataset object will not be able to accommodate the increasing dataset.

@clayton-ho
Copy link
Author

I realized that I was stupid and forgot to account for live datasets, in which case the Dataset object will not be able to accommodate the increasing dataset.

These have been fixed in the latest commit, which somewhat speeds up getData since we now create the data arrays upon instantiation of a Dataset object so we never have to check for the existence of data arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant