Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pacemaker2022 #417

Merged
merged 8 commits into from
Jun 1, 2022
Merged

Feature/pacemaker2022 #417

merged 8 commits into from
Jun 1, 2022

Conversation

yury-lysogorskiy
Copy link

No description provided.

Yury Lysogorskiy added 5 commits May 20, 2022 15:58
 - add elements and cutoff properties
 - set default self.input
 - rewrite _save_structure_dataframe_pckl_gzip
 - write_input: if _train_job_id_list is non empty, after adding job.add_training_data(training_container), then compose training dataframe using 'job.create_training_dataframe'
 - automatically determine the list of elements if self.structure_data is pd.DataFrame
 - implement _get_training_data and  _get_predicted_data
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@srmnitc
Copy link
Member

srmnitc commented Jun 1, 2022

@pmrv would you be able to review this quickly as we can then merge and release for the workshop.

@coveralls
Copy link

coveralls commented Jun 1, 2022

Pull Request Test Coverage Report for Build 2424052605

  • 1 of 201 (0.5%) changed or added relevant lines in 2 files are covered.
  • 17 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.2%) to 8.859%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_contrib/atomistics/pacemaker/job.py 0 200 0.0%
Files with Coverage Reduction New Missed Lines %
pyiron_contrib/atomistics/runner/utils.py 17 0%
Totals Coverage Status
Change from base Build 2418300557: -0.2%
Covered Lines: 908
Relevant Lines: 10249

💛 - Coveralls

Copy link
Contributor

@pmrv pmrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_training_data and _get_predicted_data should be fixed, otherwise things are ok. I'll change the input to DataContainer after this is merged and then we can make a new release.

Comment on lines 26 to 29
# set loggers
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
for logger in loggers:
logger.setLevel(logging.WARNING)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that necessary to hide some annoying warnings? Loggers are already setup and accessible via job.logger. The logger has different handlers attached that filter by different levels for different outputs, so it should not be necessary to configure them again.


self._train_job_id_list = []

self.input = GenericParameters(table_name="input")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use DataContainer, but I can also do that once this is merged.

Comment on lines 332 to 342
def _get_training_data(self) -> TrainingStorage:
# TODO: convert to TrainingStorage ?
fname = os.path.join(self.working_directory, "fitting_data_info.pckl.gzip")
df = pd.read_pickle(fname, compression="gzip")
return df

def _get_predicted_data(self) -> FlattenedStorage:
# TODO: convert to FlattenedStorage ?
fname = os.path.join(self.working_directory, "train_pred.pckl.gzip")
df = pd.read_pickle(fname, compression="gzip")
return df
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not implemented properly, the plotting functions will break. Ideally the job would keep a TrainingStorage around and extend it every time _add_training_data is called. For the workshop we can probably get away with something like this

df["atoms"] = df.ase_atoms.map(ase_to_pyiron)
t = TrainingStorage()
for _, r in d.iterrows():
    t.add_structure(r.atoms, energy=r.energy_corrected, forces=r.forces, identifier=r['name'])

and something similar for _get_predicted_data. If it contains the structures also use a TrainingStorage otherwise use a FlattenedStorage.

@srmnitc
Copy link
Member

srmnitc commented Jun 1, 2022

@yury-lysogorskiy could you please fix the issues as soon as possible. We can then release and get everything ready. thanks!

Yury Lysogorskiy and others added 3 commits June 1, 2022 18:59
extract training_data(TrainingStorage) and  predicted_data_fs(FlattenedStorage)  in collect_output
@pmrv
Copy link
Contributor

pmrv commented Jun 1, 2022

The example notebook blocks the CI, since it won't run without training data, so I've removed it for now. It's a very nice notebook though, so we should find a different place for it.

I also realized that I do not have a possibility to test pacemaker jobs on our cluster yet, so I won't change the input to DataContainer. It'll be not as nice, but will save us from unnecessary back and forth.

I've made issues #419 and #420 to remind me to pick it up after the workshop.

@pmrv pmrv merged commit 3cda20c into master Jun 1, 2022
@delete-merged-branch delete-merged-branch bot deleted the feature/pacemaker2022 branch June 1, 2022 20:21
@srmnitc
Copy link
Member

srmnitc commented Jun 1, 2022

@pmrv @yury-lysogorskiy Thanks for the work on this, I will merge it and release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants