Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planned Features #1

Open
30 of 40 tasks
atc3 opened this issue May 15, 2018 · 0 comments
Open
30 of 40 tasks

Planned Features #1

atc3 opened this issue May 15, 2018 · 0 comments

Comments

@atc3
Copy link
Contributor

atc3 commented May 15, 2018

Will update this as things are changed

  • test separate variance modeling in a structured way
  • fix experiment alignment figures image path - needs to be relative to the HTML.
  • iteratively run the filters, and check if there are experiments where all observations are filtered out, and then remove them and run the filters again
  • fix "sort" issue with pd.concat
q:\anaconda\lib\site-packages\pandas\core\frame.py:6201: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.

To retain the current behavior and silence the warning, pass sort=False

  sort=sort)
  • fix RI columns being wiped when concatenating
  • quick and dirty pairwise correlation b/n experiments - to see outliers and warn the user that they should be filtered out
  • check the max(PEP) of each raw file, and warn the user if they input a raw file with PEPs that are too low (nothing to boost)
  • retention length filtering - raw file specific
  • rename output columns
  • migrate to config file instead of command-line options
  • improve input file-type converting
    • file-type determines column names
    • move filtering blocks into separate functions. file-type determines which functions are run
  • pip installable
  • violin plot of residual density by RT (RT on x-axis)
  • pairwise correlation of RTs - heatmap
  • diagnostic figures for the update portion
    • PEP vs PEP.new scatterplot
    • fold change increase in IDs as function of PEP threshold
  • validation figures
    • multiple peptides of the same protein - should have the same intensity (measure the CV)
  • generate HTML file to view figures
  • add and start throwing exceptions
  • create entire output directory including all subfolders
  • parameter for defining column headers - additional option instead of specifying the file type
  • fix experiment exclusion
  • optional save alignment parameters
  • split up outputs in same way the inputs are split up
    • then remove input_id column
  • remove id column as well?
  • verbose levels and actually enforce them in code
  • additional parameters to select which columns to have
    • default should just be pep_new. maybe have a "diagnostic" flag that includes the other columns?
  • logging -> logger
  • default retention length filter - (max_retention_time) / 60
  • optimize experiment updating
  • filter_decoys/contaminants -> include_decoys/contaminants
  • add PEP_updated column

FUTURE VERSION

  • move off of STAN
  • optimize data selection by RT bin, experiment, and peptide. remove as much as possible but retain the same amount of coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant