Skip to content

Commit

Permalink
cleaning up
Browse files Browse the repository at this point in the history
  • Loading branch information
ttimbers committed Jan 25, 2020
1 parent 05c1976 commit 57dbfa7
Show file tree
Hide file tree
Showing 3 changed files with 63 additions and 58 deletions.
35 changes: 0 additions & 35 deletions Makefile

This file was deleted.

86 changes: 63 additions & 23 deletions doc/breast_cancer_predict_report.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Predicting breast cancer from digitized images of breast mass
================
Tiffany A. Timbers </br>
2019/12/30 (updated: 2020-01-23)
2019/12/30 (updated: 2020-01-24)

# Summary

Expand All @@ -22,22 +22,29 @@ production in the clinic.

# Introduction

TO ADD: General introduction about the problem, and why it is important
(e.g., how prevalent breast cancer is, the mortality rate for
undiagnosed cancer, etc).

For this project we are trying to answer the question: given tumour
image measurements is a newly discovered tumour benign or malignant?
Answering this question is important because traditional,
non-data-driven methods for tumour diagnosis are quite subjective and
can depend on the diagnosing physicians skill as well as experience
(Street, Wolberg, and Mangasarian 1993). Furthermore, benign tumours are
not normally dangerous; the cells stay in the same place and the tumour
Women have a 12.1% lifetime probability of developing breast cancer, and
although cancer treatment has improved over the last 30 years, the
projected death rate for women’s breast cancer is 22.4 deaths per
100,000 in 2019 (Canadian Cancer Statistics Advisory Committee 2019).
Early detection has been shown to improve outcomes (Canadian Cancer
Statistics Advisory Committee 2019), and thus methods, assays and
technologies that help to improve diagnosis may be beneficial for
improving outcomes further.

Here we ask if we can use a machine learning algorithm to predict
whether a newly discovered tumour is benign or malignant given tumour
image measurements. Answering this question is important because
traditional methods for tumour diagnosis are quite subjective and can
depend on the diagnosing physicians skill as well as experience (Street,
Wolberg, and Mangasarian 1993). Furthermore, benign tumours are not
normally dangerous; the cells stay in the same place and the tumour
stops growing before it gets very large. By contrast, in malignant
tumours, the cells invade the surrounding tissue and spread into nearby
organs where they can cause serious damage. Thus, it is important to
quickly and accurately diagnose the tumour type to guide patient
treatment.
organs where they can cause serious damage. Thus, if a machine learning
algorithm can accurately and effectively predict whether a newly
discovered tumour benign or malignant given tumour image measurements
this could lead to less subjective, and more scalable breast cancer
tumour diagnosis which could contribute to better patient outcomes.

# Methods

Expand Down Expand Up @@ -70,7 +77,7 @@ and Python programming languages (R Core Team 2019; Van Rossum and Drake
2009) and the following R and Python packages were used to perform the
analysis: caret (Jed Wing et al. 2019), docopt (de Jonge 2018), feather
(Wickham 2019), knitr (Xie 2014), tidyverse (Wickham 2017), docopt
(Keleshev 2014), os (Van Rossum and Drake 2009), feather (Wickham 2019)
(Keleshev 2014), os (Van Rossum and Drake 2009), feather (McKinney 2019)
Pandas (McKinney 2010). The code used to perform the analysis and create
this report can be found here:
<https://github.com/ttimbers/breast_cancer_predictor>.
Expand Down Expand Up @@ -127,11 +134,12 @@ Our prediction model performed quite well on test data, with a final
Cohen’s Kappa score of 0.9 and an overall accuracy calculated to be
0.97. Other indicators that our model performed well come from the
confusion matrix, where it only made 4 mistakes. However all 4 mistakes
were predicting a malignant tumour as benign, given the impications this
has for patients health, this model is not good enough to yet implement
in the clinic.
were predicting a malignant tumour as benign, given the implications
this has for patients health, this model is not good enough to yet
implement in the
clinic.

<table class="table" style="margin-left: auto; margin-right: auto;">
<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">

<caption>

Expand Down Expand Up @@ -242,13 +250,38 @@ M

</table>

TO ADD: Further discussion of model results, including how it might be
improved with more work.
To further improve this model in future with hopes of arriving one that
could be used in the clinic, there are several things we can suggest.
First, we could look closely at the 4 misclassified observations and
compare them to several observations that were classified correctly
(from both classes). The goal of this would be to see which feature(s)
may be driving the misclassification and explore whether any feature
engineering could be used to help the model better predict on
observations that it currently is making mistakes on. Additionally, we
would try seeing whether we can get improved predictions using other
classifiers. One classifier we might try is random forest forest because
it automatically allows for feature interaction, where k-nn does not.
Finally, we also might improve the usability of the model in the clinic
if we output and report the probability estimates for predictions. If we
cannot prevent misclassifications through the approaches suggested
above, at least reporting a probability estimates for predictions would
allow the clinician to know how confident the model was in its
prediction. Thus the clinician may then have the ability to perform
additional diagnostic assays if the probability estimates for prediction
of a given tumour class is not very high.

# References

<div id="refs" class="references">

<div id="ref-ccsac">

Canadian Cancer Statistics Advisory Committee. 2019. “Canadian Cancer
Statistics.” *Canadian Cancer Society*.
<http://cancer.ca/Canadian-Cancer-Statistics-2019-EN>.

</div>

<div id="ref-docopt">

de Jonge, Edwin. 2018. *Docopt: Command-Line Interface Specification
Expand Down Expand Up @@ -288,6 +321,13 @@ edited by Stéfan van der Walt and Jarrod Millman, 51–56.

</div>

<div id="ref-featherpy">

———. 2019. *Feather: Simple Wrapper Library to the Apache Arrow-Based
Feather File Format*. <https://github.com/wesm/feather>.

</div>

<div id="ref-R">

R Core Team. 2019. *R: A Language and Environment for Statistical
Expand Down Expand Up @@ -320,7 +360,7 @@ Wickham, Hadley. 2017. *Tidyverse: Easily Install and Load the

</div>

<div id="ref-feather">
<div id="ref-featherr">

———. 2019. *Feather: R Bindings to the Feather ’Api’*.
<https://CRAN.R-project.org/package=feather>.
Expand Down
Binary file modified results/final_model.rds
Binary file not shown.

0 comments on commit 57dbfa7

Please sign in to comment.