Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple protein sequences #1

Open
hnlixuanji opened this issue Dec 6, 2023 · 4 comments
Open

Multiple protein sequences #1

hnlixuanji opened this issue Dec 6, 2023 · 4 comments

Comments

@hnlixuanji
Copy link

Dear Author.

Thank you for your contribution to protein function prediction. According to your paper, Domain-PFP performs well against many of the latest tools. I am considering using your tool to assign functions to multiple protein sequences (tens of thousands) in a fasta file. The highest confidence MF, BP, CC will be selected for each sequence. so I was wondering if you have developed this script so I don't have to repeat it again :-)

Best,
XJ

@nibtehaz
Copy link
Collaborator

nibtehaz commented Dec 7, 2023

Hi XJ

Thank you for your interest in our project.

We plan to release a web-server. That's why our github sample code and google colab version is for 1 protein sequence at a time and the batch processing will be performed in the server later. For the time being for multiple proteins, we suggest writing a bash script pointing to the fasta files sequentially.

We have various scripts for batch processing, used during our experiments (since they are nor properly cleaned and refactored, they were not released). However, if you have any particular specification of how your input is and how you would like the output to be, I can share some scripts accordingly.

@hnlixuanji
Copy link
Author

Dear nibtehaz
Thank you very much for your reply. Our input is a file containing a catalog of all non-redundant genes (starting with ">gene name"). I need to convert all genes to protein sequences before using your tool. I would like our output to be a CSV file containing all the genes with column names "gene_name", "GO_MF", "MF_definination", "Go term", "Confidence", "GO_BP", "BP_definination", "Go term", " Confidence", "GO_CC", "CC_definination", "Go term", "Confidence". Or maybe you have other better ideas or scripts to show all the genes.

BTW, I have an open question :-) have you tried to or plan to integrate Alpha-Fold into the function prediction?

Best,
XJ

@nibtehaz
Copy link
Collaborator

Hi XJ

Sure, I can prepare a script like that. The input will be a large fasta file right?

We have plans to use structure from AlphaFold in protein function prediction. But at this moment we are not actively pursuing that.

@hnlixuanji
Copy link
Author

Yes, it is a large fast file. Thank you a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants