-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about using RegioML #1
Comments
Hi ririjeong, Thank you for your interest in RegioML! To answer your questions:
Best wishes, Nicolai |
Thank you for answering, I tried colab code, but there still are some sites that have no predicted probabilites. I also have another question regarding model. Sincerely, ririjeong, |
Hi again, In the output lists only unique EAS sites are shown, so if there are identical sites these are not in the list. However, in the depiction part all EAS sites are taken into account. This means that an EAS site with a score above 5 % as well as identical atoms will be highlighted. This is done in the DescriptorCreator/molecule_svg.py file: The shape of the atomic descriptor is always the same size (a 485-dimensional descriptor) no matter what molecule you are exploring. This is because the atomic descriptor is made from a sorting of the atomic CM5 charges according to the Cahn–Ingold–Prelog (CIP) rules. So you can think of this as a convolution of the atomic charges around the atom of interest. Please have a look at Fig. 1 in our paper and note that we stop the sorting at the 5th shell. Best wishes, Nicolai |
Hi agian, If you wish to output all the probabilities of all the possible EAS sites, I will recommend you to import the following in the regioML.py file: RegioML is tested in this way and the performance you obtain should be identical to what we report in our paper. However, I have investigated the issue a bit further and found the following reason. As you can see the calculated atomic charges are not completely identical for atoms with otherwise identical ranking. These small deviations results in slightly different input descriptors, which then result in a different classification score. Once again thank you for your interest in RegioML! Best wishes, |
Hi, I have some questions about RegioML
When there is no probability on a site in molecule, does that mean LGBM model cannot predict a probability
Why can't it predict the probability?
What does it mean for black circle?
Thanks
The text was updated successfully, but these errors were encountered: