-
Notifications
You must be signed in to change notification settings - Fork 20
/
Copy pathREADME.Rmd
164 lines (138 loc) · 7.76 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
output: github_document
---
## <img width="200" src="pics/puc_mprj_inova.png" align="center" style="background-color:white"/><br>Introdução à Ciência de Dados com R/tidyverse
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
dpi=72*2
)
```
* Organizador: Daniel Lima Ribeiro (MP-RJ), [email protected]
* Professor: [Dan S. Reznik (PUC-CCE)](https://www.linkedin.com/in/dan-s-reznik-phd-bb49133/), [email protected]
* Monitores
+ [Matheus Donato (ENCE/IBGE)](https://www.linkedin.com/in/matheus-donato-75526388/), [email protected]
+ Thomás Jagoda (UFRJ), [email protected]
* Homepage: [https://dan-reznik.github.io/MPRJ-Main/](https://dan-reznik.github.io/MPRJ-Main/)
### Notebook recomendado
* CPU i3 (6a geração ou melhor), RAM >= 4Gb, Windows10
* Tela: 14” (ou maior) e "full HD" (1920x1080). Evitar telas HD “puro” (720p)
* Exemplo: [Samsung Essentials E30](https://www.americanas.com.br/produto/133794107)
### Downloads p/ Windows10
* [R 3.5.3](https://cran.r-project.org/bin/windows/base/)
* [RStudio 1.1.463](https://download1.rstudio.org/RStudio-1.1.463.exe)
* [Git](https://git-scm.com/download/win)
* [Notepad++](https://notepad-plus-plus.org/download/v7.6.4.html)
### Pós-Instalação
* Instalar pacote `tidyverse` executando no cmd prompt:
+ `R -e "install.packages('tidyverse',repo='https://cloud.r-project.org')"`
### Git e GitHub
```{r,eval=T,echo=F,out.width='50%'}
knitr::include_graphics("pics/git.png")
```
* GitHub
+ Criar conta no GitHub
+ Fork: https://github.com/dan-reznik/R-Ministerio-Publico-RJ
* RStudio, criar chaves SSH
+ `git remote -v`
+ `git remote set-url origin [email protected]:user/repo_name.git`
* Git, identificar usuário no cmd prompt:
+ `git config --global user.email "<seu_mail>@<...>.com"`
+ `git config --global user.name "<seu nome>"`
### Plano de Aulas (sujeito a alterações)
* Aula 1
+ [Slides introdutórios](https://github.com/dan-reznik/MPRJ-Main/blob/master/aulas%20ppt/R%20Ministerio%20Publico%20Aula%201.pptx?raw=true)
+ Colocamos alunos no GitHub, fork do repositório, clonagem
+ Navegação no RStudio
+ Comandos básicos de R no console
* Aula 2
+ "refork" do projeto c/ delete, clonagem
+ Comandos basicos do R num script .R
+ Introdução a notebooks
+ Criação de dataframes manualmente
+ Leitura e exportação de dataframes com arquivos .csv
* Aula 3
+ Rename remote: `git remote set-url origin https://github.com/dan-reznik/MPRJ-Main`
+ Atualizar o fork do projeto
+ `git remote add upstream https://github.com/dan-reznik/MPRJ-Main && git fetch upstream && git checkout master && git merge upstream/master`
+ Nota: em caso de erro de merge, `git add . && git commit -m "fix merge" && git merge upstream/master`
+ "push" p/ seu fork pelo Rstudio
+ Introdução ao "pipe" `%>%`
+ Os "verbos" do `dplyr`: select, filter, mutate, arrange, group_by, summarize
+ Cheatsheets
+ [Data Import](https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf)
+ [Data Transformation](https://github.com/rstudio/cheatsheets/raw/master/data-transformation.pdf)
+ [Exercícios dplyr](https://jagodat.github.io/MPRJ-Exercicios/content/ex_dplyr.html)
* Aula 4
+ Revisão: verbos do `dplyr`
+ Visualização com `ggplot2`, do livro do Kieran Healy, ["Data Visualization"](https://socviz.co/makeplot.html#makeplot)
+ Cheatsheet: [Data Visualization](https://github.com/rstudio/cheatsheets/raw/master/data-visualization-2.1.pdf)
+ Use `ggplot2` como Tableau com [esquisse](https://github.com/dreamRs/esquisse)
+ [Exercícios ggplot](https://jagodat.github.io/MPRJ-Exercicios/content/visualizacao.html)
* Aula 5
+ RMarkdown, [cheatsheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf)
+ Projeto:
+ Escolher [dataset aberto](http://dados.gov.br/)
+ Propor questão de negócios + visualizações
+ Criar página do projeto no github com README.Rmd
+ Ingestão e preparo dos dados
* Aula 6
+ Projeto
+ Finalização
+ Entrega / Apresentação
### Exercícios
* Página [principal](https://jagodat.github.io/MPRJ-Exercicios/)
+ dplyr
+ [1a lista](https://jagodat.github.io/MPRJ-Exercicios/content/ex_dplyr.html), Matheus Donato
+ [2a lista](https://teachingr.com/content/the-5-verbs-of-dplyr/the-5-verbs-of-dplyr-exercise.html), Ben Stenhaug
+ ggplot2
+ [1a lista](https://jagodat.github.io/MPRJ-Exercicios/content/visualizacao.html), Thomás Jagoda
+ [2a lista](https://jagodat.github.io/MPRJ-Exercicios/content/visualizacao2.html), Thomás Jagoda
### Projetos
* Execução Orçamentária
+ [Anual](https://dan-reznik.github.io/MP-Execucao-Orcamentaria)
+ [Mensal](https://dan-reznik.github.io/MP-Execucao-Orcamentaria-Mensal)
### Tutoriais
* Combina múltiplos CSVs num só dataframe: [repo](https://github.com/dan-reznik/MP-Combina-Arquivos)
### Livros Online (grátis)
* <span style="background-color: #FFFF00">H. Wickham & G. Grolemund </span>, ["R for Data Science" (r4ds)](https://r4ds.had.co.nz/)
* Kieran Healy, ["Data Visualization"](https://socviz.co/)
* Claus Wilke, ["Fundamentals of Data Visualization"](https://serialmentor.com/dataviz/)
* Winston Chang, [Graphics Cookbook](https://r-graphics.org/), 2nd edition
* Garrett Grolemund, ["Hands-on Programming with R"](https://rstudio-education.github.io/hopr/)
* Claudia Engel, ["Data Wrangling with R"](https://cengel.github.io/R-data-wrangling/)
* Jenny Bryan, [Happy Git and GitHub for the useR"](https://happygitwithr.com/)
* Yihui Xie et al., ["Blogdown: Creating Websites with R Markdown"](https://bookdown.org/yihui/blogdown/)
* Max Kuhn, ["Applied Predictive Modeling"](http://appliedpredictivemodeling.com/)
* Max Kuhn, ["Feature Engineering and Selection: A Practical Approach for Predictive Models"](https://bookdown.org/max/FES/)
* Mark Sellors, ["Field Guide to the R Ecosystem"](https://fg2re.sellorm.com/)
### Vídeos dos “Mestres”
* Hadley Wickham, ["Whole Game"](https://www.youtube.com/watch?v=go5Au01Jrvs)
* David Robinson, ["Tidytuesdays"](https://www.youtube.com/user/safe4democracy/videos)
* Ben Stenhaug, ["Tidyverse Tutorial Playlist"](https://www.youtube.com/watch?v=lTTJPRwnONE&list=PLLxj8fULvXwGOf8uHlL4Tr62oXSB5k_in)
### Sites Úteis
* Tidyverse
+ [Pacotes](https://www.tidyverse.org/packages/)
+ [Cheatsheets](https://www.rstudio.com/resources/cheatsheets/)
+ [Regular Expressions](https://stringr.tidyverse.org/articles/regular-expressions.html)
+ [TeachR](https://teachingr.com/)
* Visualização
+ [Data Visualization in R](https://djnavarro.github.io/satrdayjoburg/), [slides](https://djnavarro.github.io/satrdayjoburg/slides)
+ [Galeria do ggplot2](https://www.r-graph-gallery.com/portfolio/ggplot2-package/)
+ [Yan Holz's Classes](https://www.yan-holtz.com/teaching)
* Markdown
+ [Pimp my Rmd](https://holtzy.github.io/Pimp-my-rmd/)
* Pacotes R
+ [Leaderboard](https://www.rdocumentation.org/trends)
+ [Task Views](https://cran.r-project.org/web/views/)
+ [Awesome R](https://awesome-r.com/)
* Blogs
+ [R-Bloggers](https://www.r-bloggers.com/)
+ Hadley Wickham, ["Como se tornar um Cientista de Dados"](https://gist.github.com/hadley/820f09ded347c62c2864)
* Version Control
+ Abhishek Joshi, ["Tutorial Git"](https://medium.com/@abhishekj/an-intro-to-git-and-github-1a0e2c7e3a2f)
+ [Rstudio & Git](https://support.rstudio.com/hc/en-us/articles/200532077-Version-Control-with-Git-and-SVN)
+ Karl Broman, ["Github tutorial"](http://kbroman.org/github_tutorial/)
+ Michael Freedman, ["Git Collaboration"](http://slides.com/michaelfreeman/git-collaboration)