BioSeeker: Python library for the analysis of codon/bicodon conservation rates across linked species

This project facilitates calculating codon and bicodon conservation rates for a given genus.

Useful Links:

1. About the project

Open-source bioinformatics project licensed under GPL v3.0 The inspiration for this project came from this paper, which I've tried to (partially) replicate using Drosophila's alignments from FlyDIVaS. Feel free to make as many additions as you'd like.

2. How does it work?

In this repo you will find a Python script called bioseeker.py. It takes a file (or a group of files) as input, which contains homologous genes previously aligned, in FASTA format. After parsing the file(s) for data extraction, it creates a matrix using NumPy, in order to iterate across matrix slices. The obtained information (codon count from reference sequence, and number of times that said codon was conserved across species) is stored in a CSV file, which is created using Pandas. For each MSA file two types of dataframes will be generated - one for codons, and another for codon pairs. The algorithm calculates codon/bicodon conservation rates across all 3 reading frames. So, there will be a total of 6 CSV files that will be generated (2 for each reading frame).

3. Installing and running the program

Start by cloning the repository:

$ git clone https://github.com/SouthernBio/BioSeeker

Copy and paste the MSA FASTA files on the directory where bioseeker.py is located.

Make sure that your Python interpreter is added to PATH. Then, you can activate the virtual environment and run BioSeeker.

Windows PowerShell:

$ pipenv shell
$ ./bioseeker # or 'py -m bioseeker'

GNU/Linux:

$ pipenv shell
$ bioseeker # or 'python3 -m bioseeker"

If you want to test how the program works before using it on your data, you will find alignment files on FASTA_files/.

4. Dependencies

To execute this script you must install Python, Git (if you want to clone the repo with Git) and its package manager, pip. You can do it on Ubuntu through the terminal:

$ sudo apt-get update
$ sudo apt-get install python3 python3-pip git-all

Once you have installed Python and its package manager, you can proceed to install pipenv:

$ pip3 install pipenv

5. Additional details

After parsing the files and calculating conservation rates, it will also generate a file called unreadable.txt which stores the names of MSA files that could not be parsed. Then, it will assemble all individual dataframes into 6 different dataframes that contain all the information across linked species. BioSeeker will automatically create a new directory called dataframes/ which will contain all the new dataframes.

💙 Support this project

Your contribution would help SouthernBio in improving the quality of this project and adding additional features. If you find this project useful and/or interesting, please consider offering your support on Github Sponsors, Ko-Fi or PayPal

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
FASTA_files		FASTA_files
tools		tools
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
FUNDING.yml		FUNDING.yml
LICENSE.md		LICENSE.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__py		__init__py
bioseeker.py		bioseeker.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioSeeker: Python library for the analysis of codon/bicodon conservation rates across linked species

Useful Links:

1. About the project

2. How does it work?

3. Installing and running the program

Windows PowerShell:

GNU/Linux:

4. Dependencies

5. Additional details

💙 Support this project

About

Releases

Sponsor this project

Packages

Contributors 2

Languages

License

SouthernBio/BioSeeker

Folders and files

Latest commit

History

Repository files navigation

BioSeeker: Python library for the analysis of codon/bicodon conservation rates across linked species

Useful Links:

1. About the project

2. How does it work?

3. Installing and running the program

Windows PowerShell:

GNU/Linux:

4. Dependencies

5. Additional details

💙 Support this project

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 2

Languages

Packages