This is a 3 credit course for upper-level undergraduates and graduate students that introduces the powerful open-source computing tools that are used in biological research for the creation, organization, manipulation, processing, analysis, and archiving of “big data”. This course is designed to prepare and enable students to use computational tools for bioinformatic applications in advanced courses and independent research projects. The primary topics covered are: data formats and repositories, command line Linux computing and scripting, regular expressions, super-computing, computer programming with PYTHON and R, data visualization with R, version control and dissemination of scripts and programs with GIT, typesetting with LATEX, and organizing data with SQL relational databases.
Text Book: Computing Skills for Biologists
Apply Win10 Ubuntu Settings To New Computer
Upon the successful completion of this course, students should be able to:
- Recognize, describe, and organize data into standard biological data structures
- Locate scientific data repositories and extract data
- Operate UNIX/LINUX computers from command line
- Construct and modify computer programming/scripting logic structures for processing biological data
- Use version control software (git)
6. Describe and use regular expressions to query data
7. Typeset with LaTeX or MarkDown
- Use the most popular open-source tools for biological data manipulation: bash, python, R
Computation for 21st Century Biologists will convene on Fridays at 1pm for 2.5 hours. Class periods will involve interactive lectures that require each student to have a computer designed for content creation (Linux, OSX, Windows, not chrome, not iOS, not Android). Homework exercises will embellish upon concepts addressed in lecture. Participation involves attending lectures and performance on unannounced quizzes. Weekly Assignments will be given to reinforce concepts covered in lectures and encourage students to start using computational tools. Exams will be used to evaluate comprehension of the materials covered in lectures and assignments. For undergraduates only, a comprehensive Final Exam will be used to assess the learning objectives detailed above. Rather than having a final exam, graduate students are expected to complete a Final Project involving the automation of the manipulation and/or analysis of data, The code should be archived on GitHub. A report written in Latex or Markdown will be due during the final exam period. The report should be concise in stating what the problem is, describing the strategy used for the solution, and describing how the code works (be sure to include a flow-chart or outline describing what code does). Each student will give a 10-minute presentation during the Final period on their project. Project examples: automatically process data from experimental apparatus; image analysis; automated reporting of experimental results; downloading and organizing data from online repositories; etc…
-
10/04 Week05 Basic Python Programming I
- Assignment_5, Due 10/11
- Grad Student Course Project: Commit at least 1 working function to your GitHub project repo, Due 10/11
-
10/11 Week06 Basic Python Programming II
- Assignment 6, Due 10/18
- Grad Student Course Project: Commit at least 1 additional working function to your GitHub project repo, Due 10/18
-
11/08 Week10 Statistical Computing I
- No Assignment
-
11/22 Week12 Data Wrangling and Visualization with The Tidyverse
- Assignment 12, Due 12/04 Undergrads Only
- Graduate students, push your completed independent projects to GitHub on 12/04
Due 12/12 11:59 pm, Undergrads only
Your "final" is completing your independent project (repo due 12/04, presentations Wed 12/11 at noon, same room as lecture) where you automate the processing, analysis, and/or visualization of data