x# APEC 8222: Big Data Methods in Economics
Class shared google drive: https://drive.google.com/drive/u/1/folders/0ADfBSFdysX0gUk9PVA
Lecture slides: https://drive.google.com/drive/u/1/folders/1x6hVJA286TJ4v4hak9f8Myf3Q5bVWL00
Course readings: https://drive.google.com/drive/u/1/folders/1ocsMTAfo0kkYDLvn2CG6yecLanRtO3fs
Course data: https://drive.google.com/drive/u/1/folders/1lcGSH0yQZ9IJ-sdNWbQQTGqU34oFQ4cd
Turrell textbook: https://aeturrell.github.io/coding-for-economists/intro.html
Date | Topic | Readings | Assignments | |
24-Oct | Introduction, Course Overview, and "what is big data?" | None | Assignment 1 Assigned (very short assignment) | |
26-Oct | Expanding your toolset | Turrell, "Preliminaries" section: https://aeturrell.github.io/coding-for-economists/code-preliminaries.html | ||
31-Oct | Introduction to Python | Turrell, "Coding Basics" section: https://aeturrell.github.io/coding-for-economists/code-basics.html | Assignment 1 Due, Assignment 2 Assigned | |
2-Nov | Huge matrices with Numpy | Optional: "Workflow Basics" and "Writing Code" sections | ||
7-Nov | Python for Big Spatial Rasters | https://nature.com/articles/s41586-020-2649-2 | Assignment 2 Due, Assignment 3 Assigned | |
9-Nov | Huge tables with Pandas | Turrell, "Intro to Geo-Spatial Analysis" section: https://aeturrell.github.io/coding-for-economists/geo-intro.html | ||
14-Nov | Large vector data linked to tables | "Machine Learning Methods Economists Should Know About", Hastie et al. (2009) Chp 2 | Assignment 3 Due | |
16-Nov | Combining many big-data types, starting ML | "Big Data: New Tricks for Econometrics", Hastie et al (2009) Chp 3 | ||
21-Nov | Machine Learning and Cross-Validation | Hastie et al (2009) Chp 15 | Assignment 4 Assigned | |
23-Nov | University Closed (Thanksgiving Holiday) | |||
Nov 28 (guest lecture) | R and Python Integration | (Guest lecture) | ||
Nov 30 (guest lecture) | Economics and Large Language Models | (Guest lecture) https://aeturrell.github.io/coding-for-economists/coming-from-r.html | ||
5-Dec | Regressions on Rasters and SVM | Carbon Edge Effects Manuscript in https://drive.google.com/open?id=1-A1FeXHkbAH1Kq-V5KNcRm53wojwSP98&usp=drive_fs and Hastie et al (2009) Chapter 11 | ||
7-Dec | Convolutional Neural Net & Transformers | Jean et al. - 2016 - "Combining satellite imagery and machine learning to predict poverty" | Assignment 5 goes live, Assignment 4 Due (the next day) | |
12-Dec | Regression Trees, Random Forest and LULC classification | Avci et al. - 2023 - "Comparison between random forest and support vector machines" | Last Day of Class | |
15-Dec | No class | Assignment 5 due | ||
21-Dec | No class | Final day of UMN semester |
Justin Johnson ([email protected])
337f Ruttan Hall
Tuesdays 1:00-2:00 P.M. in person (classroom or office 337F). If virtual is needed, go to https://umn.zoom.us/my/jajohns Appointments also welcome.
Tuesday and Thursday, 11:45 A.M.--1:00 P.M.
October 24 -- December 12
Ruttan Hall 135B
Hastie, Tibshirani, and Friedman. 2009. The Elements of Statistical Learning, 2nd Edition. Springer. This book is available on Robert Tibshirani's website: http://statweb.stanford.edu/~tibs/ElemStatLearn/.
Turrell, Arthur. 2023. Coding for Economists. Freely available at https://aeturrell.github.io/coding-for-economists/intro.html
Adhikari, Ani, John DeNero, David Wagner. Computational and Inferential Thinking: The Foundations of Data Science. 2nd Edition. Freely available at https://inferentialthinking.com/chapters/intro.html
Mueller and Guido (2017). Introduction to Machine Learning with Python. This book is a purchasable ebook, though the authors have provided free and legal copies in years past on their github page https://github.com/amueller/introduction_to_ml_with_python and elsewhere.
Additional readings and materials will be on the course Canvas site
Challenges, techniques, and opportunities presented by data that has one or more of the following characteristics: large, unstructured, high frequency, variable quality. The course will consist of three parts: 1) computational tools for applying standard econometric techniques on large datasets, 2) working with new types of data, such as unstructured data (e.g. images, text) and web-scraped data, and 3) application of machine learning, AI and other new statistical learning techniques (e.g. classifiers, regression trees, machine learning, neural nets). This course is intended to be a broad introduction to many domains of knowledge with the goal of orienting students towards where they could learn more detailed information relevant to their research.
APEC 8221 or equivalent programming course: APEC 5031 or equivalent econometrics course.
You should leave this course with an understanding of and ability to estimate standard econometric models on large datasets, produce datasets for analysis from unstructured data, and apply several statistical learning techniques to answer economic questions. In particular, you will be able to
-
Estimate standard econometric models on datasets containing millions of records
-
Understand cross-validation approaches
-
Assess the strengths and weaknesses of statistical learning techniques and evaluate their applicability to questions of causal inference.
-
Gain competency on new machine learning models
-
Achieve moderate skill level of Python programming
Assignments: 70%
Independent Project: 20%
Class Participation: 10%
There will be six assignments during the course, which will consist of programming tasks designed to give you experience working with big and otherwise challenging data in the context of econometric analysis. We plan that you will have at least a week to complete them. You will submit Python-based Jupyter notebooks with your results and analysis, structured so that it can be run without modification. Assignments will be evaluated based on both functionality and the readability/organization of the code that you write. Please read the course policies below on collaboration and working together.
The independent project will be introduced in the middle of the semester and due near the end of the semseter. You will submit a short (3-7 page) writeup, along with the code used. You will also give a short (5 minute) presentation on your results.
If you show up to class every day ready to learn with your computer setup with the day's tools, you will likely get full class participation points.
[Late Policy]{.underline}: All assignments must be submitted on Canvas. Late assignments will be accepted for up to three days afterward. For each day late your grade will automatically reduce by 10% (for a maximum reduction of 30%).
There will be no opportunities for extra credit.
If you think that any grading was done incorrectly or unfairly, please come to my office hours. I am happy to revisit the grading, but will re-grade the entire assignment. The purpose of that policy is to be as fair as possible: if one problem needs re-grading, they probably all should have a second look.
Please see the UMN policy below on make-up work and policies concerning legitimate absences. In short, if a legitimate reason prohibits you from finishing an assignment on time, please let me know and we will make accommodations for you.
We will using the Python programming language (though there will be frequent reference to equivalent R functions). All software in the class is open source and freely available.
For Python, we will work through installation of the programming language and several supporting tools. This course strongly recommends that you bring your own laptop because having a well-setup laptop will serve you well into your academic career. It is not strictly required, but you will likely want to have administrator rights to your computer in order to install all of the supporting software. If you are using a personal computer, this should not be an issue. If you are using a university-owned computer, it might be. Reach out to me ASAP if you have questions about this. If you are not able to bring a suitable computer, please contact the instructors within the first week of class and we can discuss alternatives. It is possible to use a PC, Mac or Linux for this course, though all examples will be given on a PC.
Becoming skilled in Big Data is partly about mastering the tools and it will be your responsibility to come to class with your computer setup in a way for you to succeed. We will discuss any setup steps necessary in the lecture before it is to be used.
Approximate Course Schedule
I will take more or less time on the topics as needed and will react to student feedback and requests.
Date | Topic | Readings | Assignments |
24-Oct | Introduction & Course Overview | None | Assignment 1 Assigned (very short assignment) |
26-Oct | Expanding your toolset, Introduction to Python [part 1] | Turrell, "Preliminaries" section: https://aeturrell.github.io/coding-for-economists/code-preliminaries.html | |
31-Oct | Introduction to Python [part 2] | Turrell, "Coding Basics" section: https://aeturrell.github.io/coding-for-economists/code-basics.html | Assignment 1 Due, Assignment 2 Assigned |
2-Nov | Python on Big and/or Spatial Data | Optional: "Workflow Basics" and "Writing Code" sections | |
7-Nov | Huge matrices with Numpy | https://nature.com/articles/s41586-020-2649-2 | Assignment 2 Due, Assignment 3 Assigned |
9-Nov | Huge tables with Pandas/Geopandas | Turrell, "Intro to Geo-Spatial Analysis" section: https://aeturrell.github.io/coding-for-economists/geo-intro.html | |
14-Nov | Machine Learning and Cross-Validation | "Machine Learning Methods Economists Should Know About", Hastie et al. (2009) Chp 2 | Assignment 3 Due, Assignment 4 Assigned |
16-Nov | Regularization and Shrinkage | "Big Data: New Tricks for Econometrics", Hastie et al (2009) Chp 3 | Independent Project Assigned |
21-Nov | Regression Trees, Random Forest and LULC classification | Hastie et al (2009) Chp 15 | Assignment 4 Due, Assignment 5 Assigned |
23-Nov | University Closed (Thanksgiving Holiday) | ||
Nov 28 (guest lecture/remote) | KGML (Knowledge Guided Machine Learning) | To be assigned. | List of 3 project ideas/datasets due |
Nov 30 (guest lecture/remote) | R and Python Integration | https://aeturrell.github.io/coding-for-economists/coming-from-r.html | |
5-Dec | Neural Nets | Hastie et al (2009) Chapter 11 | Research proposal due |
7-Dec | Convolutional Neural Net & Transformers | "Combining satellite imagery and machine learning to predict poverty" | Assignment 5 Due |
12-Dec | Student Presentations! | Last Day of Class | |
15-Dec | No class | Research project due | |
21-Dec | No class | Final day of UMN semester |
- [Class attendance is expected.]{.underline} You will not be graded on attendance, but we expect you to come: any material covered in class is fair game on assignments and exams. Equally importantly, the questions you and your peers ask and the comments you make during class will make for a better experience for everyone, so please make every effort to attend. That said, unexpected conflicts come up for all of us, so any lecture notes and slides used during class will be available online.
- [You may discuss and work on assignments with other students, but you must write up and turn in your own assignment.]{.underline}You all have different strengths and can learn from one another by working together, so we encourage you to do so. The point of this policy is so that you can learn more, not less, so please don't abuse the privilege and let someone else do the work and you simply copy their answer. That won't help you learn the material.
-
[Do not violate the Student Conduct Code.]{.underline} The assignments you turn in and your answers on exams should reflect your own work. Simply copying someone else's work, or otherwise violating the Student Conduct Code
(http://regents.umn.edu/sites/default/files/policies/Student_Conduct_Code.pdf) may result in a failing grade and/or additional University action. That may sound scary, and it's supposed to be, but know that if you turn in work that is your own, you have nothing to worry about. If you have any questions or concerns about this policy, please don't hesitate to ask.
-
[Treat your classmates with respect.]{.underline} Everyone in the class is here to learn and has an equal right to be treated with respect. This means many things, but in summary, We will not tolerate any form of discrimination or sexual harassment. The University has official policies that you can read if you aren't sure what constitutes either of those things, but your common sense will take you a long way.
Other notes:
If you require disability accommodations, please let me know at your earliest convenience and we will work together to arrange accommodations. If you have or think you may have a disability and have not yet contacted Disability Services (DS), please do so at 612-626-1333 to arrange for a confidential consultation.
We will literally be learning about how ChatGPT, Large Language Models and Transformers work in this course. Not surprisingly, I am fine with any usage of ChatGPT in your work.
The University seeks an environment that promotes academic achievement and integrity, that is protective of free inquiry, and that serves the educational mission of the University. Similarly, the University seeks a community that is free from violence, threats, and intimidation; that is respectful of the rights, opportunities, and welfare of students, faculty, staff, and guests of the University; and that does not threaten the physical or mental health or safety of members of the University community.
As a student at the University you are expected adhere to Board of Regents Policy: Student Conduct Code. To review the Student Conduct Code, please see:http://regents.umn.edu/sites/default/files/policies/Student_Conduct_Code.pdf.
Note that the conduct code specifically addresses disruptive classroom conduct, which means "engaging in behavior that substantially or repeatedly interrupts either the instructor's ability to teach or student learning. The classroom extends to any setting where a student is engaged in work toward academic credit or satisfaction of program-based requirements or related activities."
Using personal electronic devices in the classroom setting can hinder instruction and learning, not only for the student using the device but also for other students in the class. To this end, the University establishes the right of each faculty member to determine if and how personal electronic devices are allowed to be used in the classroom. For complete information, please reference:http://policy.umn.edu/Policies/Education/Education/STUDENTRESP.html.
You are expected to do your own academic work and cite sources as necessary. Failing to do so is scholastic dishonesty. Scholastic dishonesty means plagiarizing; cheating on assignments or examinations; engaging in unauthorized collaboration on academic work; taking, acquiring, or using test materials without faculty permission; submitting false or incomplete records of academic achievement; acting alone or in cooperation with another to falsify records or to obtain dishonestly grades, honors, awards, or professional endorsement; altering, forging, or misusing a University academic record; or fabricating or falsifying data, research procedures, or data analysis. (Student Conduct Code:http://regents.umn.edu/sites/default/files/policies/Student_Conduct_Code.pdf) If it is determined that a student has cheated, he or she may be given an "F" or an "N" for the course, and may face additional sanctions from the University. For additional information, please see:http://policy.umn.edu/Policies/Education/Education/INSTRUCTORRESP.html.
The Office for Student Conduct and Academic Integrity has compiled a useful list of Frequently Asked Questions pertaining to scholastic dishonesty: http://www1.umn.edu/oscai/integrity/student/index.html. If you have additional questions, please clarify with your instructor for the course. Your instructor can respond to your specific questions regarding what would constitute scholastic dishonesty in the context of a particular class-e.g., whether collaboration on assignments is permitted, requirements and methods for citing sources, if electronic aids are permitted or prohibited during an exam.
Students will not be penalized for absence during the semester due to unavoidable or legitimate circumstances. Such circumstances include verified illness, participation in intercollegiate athletic events, subpoenas, jury duty, military service, bereavement, and religious observances. Such circumstances do not include voting in local, state, or national elections. For complete information, please see:http://policy.umn.edu/Policies/Education/Education/MAKEUPWORK.html.
Taking notes is a means of recording information but more importantly of personally absorbing and integrating the educational experience. However, broadly disseminating class notes beyond the classroom community or accepting compensation for taking and distributing classroom notes undermines instructor interests in their intellectual work product while not substantially furthering instructor and student interests in effective learning. Such actions violate shared norms and standards of the academic community. For additional information, please see: http://policy.umn.edu/Policies/Education/Education/STUDENTRESP.html.
The University utilizes plus and minus grading on a 4.000 cumulative grade point scale. Please seehttp://policy.umn.edu/education/gradingtranscripts for details..
"Sexual harassment" means unwelcome sexual advances, requests for sexual favors, and/or other verbal or physical conduct of a sexual nature. Such conduct has the purpose or effect of unreasonably interfering with an individual's work or academic performance or creating an intimidating, hostile, or offensive working or academic environment in any University activity or program. Such behavior is not acceptable in the University setting. For additional information, please consult Board of Regents Policy:http://regents.umn.edu/sites/default/files/policies/SexHarassment.pdf
The University provides equal access to and opportunity in its programs and facilities, without regard to race, color, creed, religion, national origin, gender, age, marital status, disability, public assistance status, veteran status, sexual orientation, gender identity, or gender expression. For more information, please consult Board of Regents Policy: http://regents.umn.edu/sites/default/files/policies/Equity_Diversity_EO_AA.pdf.
The University of Minnesota is committed to providing equitable access to learning opportunities for all students. The Disability Resource Center is the campus office that collaborates with students who have disabilities to provide and/or arrange reasonable accommodations.
If you have, or think you may have, a disability (e.g., mental health, attentional, learning, chronic health, sensory, or physical), please contact DS at 612-626-1333 to arrange a confidential discussion regarding equitable access and reasonable accommodations.
If you are registered with DS and have a current letter requesting reasonable accommodations, please contact your instructor as early in the semester as possible to discuss how the accommodations will be applied in the course.
For more information, please see the DS website, https://diversity.umn.edu/disability/.
As a student you may experience a range of issues that can cause barriers to learning, such as strained relationships, increased anxiety, alcohol/drug problems, feeling down, difficulty concentrating and/or lack of motivation. These mental health concerns or stressful events may lead to diminished academic performance and may reduce your ability to participate in daily activities. University of Minnesota services are available to assist you. You can learn more about the broad range of confidential mental health services available on campus via the Student Mental Health Website: http://www.mentalhealth.umn.edu.
Academic freedom is a cornerstone of the University. Within the scope and content of the course as defined by the instructor, it includes the freedom to discuss relevant matters in the classroom. Along with this freedom comes responsibility. Students are encouraged to develop the capacity for critical judgment and to engage in a sustained and independent search for truth. Students are free to take reasoned exception to the views offered in any course of study and to reserve judgment about matters of opinion, but they are responsible for learning the content of any course of study for which they are enrolled.*
Reports of concerns about academic freedom are taken seriously, and there are individuals and offices available for help. Contact the instructor, the Department Chair, your adviser, the associate dean of the college, or the Vice Provost for Faculty and Academic Affairs in the Office of the Provost.
* Language adapted from the American Association of University Professors "Joint Statement on Rights and Freedoms of Students".