-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstraction between representations and calculations/evolutions #109
Comments
I was a bit verbose here, but there's a summary at the end.
@JNmpi I'm certainly open to leaving wrapper methods on different objects that allow the user to set multiple attributes in one line. Another idea would be to allow users to set attributes directly in the instantiation, e.g. job = pr.atomistics.job.Lammps('my_lammps_job', structure=pr.atomistics.bulk('Al'), potential='my_fav_mendelev_pot') But honestly, that above example demonstrates why I'm not totally comfortable with how we use
From a philosophical perspective I think this is absolutely the right way to do it. But I would call it I would also need some sort of protection that I couldn't pass a Practically, I think this setup will wind up feeling a bit counterintuitive, since developers will need to keep potentials up to date with which engines they support (instead of engines with which potentials are possible), and since users will need to thing "I want to run a GGA calculation" (instead of "I want to run a VASP calculation"). As a bit of an aside, I also think it might be a bit awkward putting
To the extent that the visual scripting approach will probably want to flatten some inputs out instead of having many smaller nodes (a sort of macro as you suggest), I agree -- in command line it's nice to have something like From my perspective there are a handful of interdependent pieces for an atomistic calculation:
These are the necessary pieces for converting positions and species into forces and energies. Those can then be used to do any number of physical "calculations". Finally, for very practical reasons we also need to specify the computational resources for the engine and/or calculation, let's just keep calling these the "server". Right now, unfortunately, IMO we have the worst possible setup -- namely we have a mixture of two setups! Sometimes, we mangle the engine and calculator right together, engine_calculator = pr.atomistics.vasp('mangled')
engine_calculator = [1, 1, 1] # engine options
engine_calculator.potential = 'PAW-PBE' # model
engine_calculator.structure = pr.atomistics.structure.bulk('Cu') # structure
engine_calculator.calc_md(temperature=300, n_steps=10) # implicit calculator
engine_calculator.server.cores = 4 # server
engine_calculator.run() # Let's us know this object is the king, the thing that holds the output we want Other times we break them apart, engine = pr.atomistics.Lammps('separated_engine')
engine.structure = pr.atomistics.structure.bulk('Cu') # structure
engine.potential = engine.list_potentials()[0] # model
calculator = pr.atomistics.PhonopyJob('separated_calculator')
calculator.ref_job = engine # In this case, the calculator will update the structure for the engine
calculator.some_options = "something, whatever" I like this second setup better, although obviously if the engine has some sort of built-in and very efficient tool for running the calculation it's desirable to use that under the hood. SummaryIn the abstract, these are the main architectures floating around my head now:
|
@liamhuber, thanks for your thoughts and detailed description. I definitely also see the need to unify the syntax for providing/modifying input and parameters. I would also opt that users can chose among multiple options, i.e., they should be able to use a construction they prefer. To put this into pseudo code:
should be identical to something like:
This approach could be easily generalised to structures, potentials etc.
I also like your suggestion:
To unify the input we could allow the following constructs simultaneously:
Thanks also for your detailed thoughts and discussion regarding the structure of the engineCalculator. I fully see your point when you define a node like EAM for a specific set of potentials. What I had in mind was to create a generic potential node that receives all its information it needs from the calculator, i.e., sometehing like:
Thus, when replacing one job (i.e. Lammps by VASP) or one structure by another one the list gets updated. In the worst case no potential exists, but this could be easily handled. Since the structure is an input into job this construct would be a true functional object, i.e., changing within one node an input will have no impact onto another one (which it presently has). The sequence would then be:
I fully agree that we may/should still provide a macro node that groups all the inputs together like we have it now. To formalise this discussion, a node should be purely functional, i.e., node(i1, i2) should rely on independent input i1 and i2, not on something like i1 is a function of i2, i.2. i1=i1(i2). Having this notation of a node being a pure function and a workflow being a sequence of these functions would make it easier to check the validity of certain constructs. In this way I also have a slight preference for constructing first the full job (in this case MD) and sending this object then to the server (SERVER). A possible scenario would be that depending on the job type (i.e. LAMMPS vs. VASP) we could easily assign different resources. The above discussion are just a quick summary of my thoughts and I am happy to get your insights. |
Input syntax unificationI'll make a new issue for this over on base or atomistics summarizing and linking this discussion. Nodes
This is a really superb point. I wasn't thinking about it clearly in those terms, but I completely agree. I'm not sure I agree on the definition of functional though:
From my perspective this is simply a matter of certain input combinations being (in)valid, which is totally fine in a functional paradigm. In this view, our current Lammps node then just provides a helpful UI layer on top of this functional node so that the user can avoid these impossible combinations. For me what is important with functionality is that the node produces the same output given the same input, regardless of how many times it's executed. A corollary to that is that the node is node modifying some sort of global (or at least super-self) state. Here our Lammps node needs some work, since right now I think we can modify the input, hit "run", and the job will just re-load itself! But that has to do with input locking or setting
It's definitely possible for this setup to align with my definition for functional nodes, but I'm not sure it's the most efficient. For instance, I have an even deeper concern with the idea of "sending [the MD job] to the server". From my perspective it would be un-functional if Ultimately, form a "functional perspective" I'm unconcerned if changing |
@liamhuber, thanks for formalising the discussion. Below a few thoughts to the various issues:
I realize that my notation was insufficient. The main point I wanted to make is that i1 is not only an input of i2 but also of the node n1 itself, i.e.
To be specific, the potential depends not only on the structure, but also on the calculator, i.e., whether Lammps, VASP etc. are used. To prevent that the input depends on the node in which it is used was the starting point for my thoughts. The question whether an input i1 depends on the node itself or not depends on the abstraction level. If we want to just use code specific input we don't need it. However, for generic constructs like MD or MD we need to know the node (code) type to translate the generic input into a code specific one. Whenever we have such a situation (which imo is the case for any generic input) we need to resolve the implicit node dependence into an explicit one, i.e.:
In our case n1 would be the calculator (LAMMPS, VASP etc.) and n2 the potential. These considerations address also your second point:
In practice, the potential node does not need to peek into structure, but that node1 provides a function to create the input for node 2. Again for the case of the potential this would be simply the list_potentials() function. n2 would be the a high level container that allows the user to select only potentials that are consistent with the code described by node n1. When thinking about this I don't see a difference to the MD node. If we would use the LAMMPS specific notation/input file we could use the same argument as for the specific potential, i.e., it would be an input and not a separate MD module. |
Thanks for clarifying, @JNmpi! The explicit example with a Suppose, in the case that the The catch is, I don't see a big difference with the "explicit" version where it's broken down to So in both cases it's always possible to recover a sort of "pure functionality" by simply letting the nodes break when one input updates to be incommensurate with another one, but in both cases there is a user-friendly "hand-wavingly functional" approach where (iff the node input in question is not explicitly wired to another node's output) we automatically update the input to something useful. The good news is that if both the "implicit" and "explicit" topologies really are equivalent in this way, we can just be super pragmatic in choosing how we implement it. In this case, I think the connection between a Lammps classical potential and a Vasp pseudo-potential is actually quite weak, and I'd just keep them split up as inputs to their respective engines. (Otherwise what would we do with, e.g. Wien2k? It feels like these codes, but wouldn't make sense as input to a Ryven itself is not fundamentally functional at all; for instance when we naively set up the In the latest branch (#131) I am now explicitly doing type checking on ports, and added an attribute Right now this type checking is still very primitive based on classes and values, but soon I'll start on the NumFocus "ontologicalization" work and we'll see how much power we can milk out of it. |
Thanks @liamhuber. I fully agree that the two cases are fundamentally equivalent. Since both have there pros and cons we should provide both options to the user. This approach would follow the python philosophy - provide all modern concepts of computer languages, rather than forcing users into a certain paradigm (e.g. functional only). In this way the users can come up with best practice examples and decide what works best for them. It may be helpful to remember how we got into this issue: It started with your idea to put MD after the Lammps job. I first didn't like it, since it makes the code in the jupyter notebook longer and less compact. However, the more I thought about it the more I realised that this is the way to go when attempting to replace specific input by generic (input) modules. While these generic modules are great for our workflow concepts it is probably a good idea to provide also the more compact input option to please the more application oriented users. I thought also a bit about use cases where one or the other formulation would be more useful. A particular important use case for workflows are parametric studies, where loops over various input values have to be performed. For the potential case, a specific scenario would be to go over all available potentials for a given atomic structure. In the formulation
one could add an optional input all. If this parameter is set to True the Potential node would create and run all corresponding (e.g. Lammps) jobs. In this respect the Potential node is not only a container for list_potentials but is derived from something like a ParallelMaster, i.e., it can create and run these jobs. Directly related to the above thoughts it may be helpful to implement a new data type Batch or BatchData. If a job creating node receives as input such a variable it would create a bunch of parallel jobs. An example would be a batch of structures (e.g. our structure container) or the above discussed set of potentials. The advantage of this approach would be that we don't need any explicit loops or indexing, i.e., we get away with easy and intuitive graphs. The possibility to avoid explicit indexing is similar to what numpy is doing and is the basis for very compact and well structured code. To summarise the above thoughts, the following workflow example would be possible:
would create and run a three dimensional set of jobs going over all structure, atomic calculators as well as available potentials for each calculator, without having to setup a single for loop. |
@JNmpi super! I think we are basically in agreement. It was helpful for me to go through this in detail though -- I was not even thinking in terms of "functional" nodes, but even if we present users with fancy UI that hides this aspect, it will still be helpful to design (most if not all) nodes from a functional perspective. I think we're still on slightly different pages with I really like the |
From our conversation in #102
The text was updated successfully, but these errors were encountered: