-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment06.qmd
201 lines (147 loc) · 8.64 KB
/
assignment06.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
title: "Assignment06"
author: "Doris Yan, Ivy Xiu"
format: html
editor: visual
bibliography: references.bib
embed-resources: true
execute:
warning: false
---
## Exercise 02 Paper Overview
@martinez2018 conducted a randomized controlled trial to test the effect of a complementary feeding education program on stunted children in Guatemala. 324 stunted children with height-for-age Z score (HAZ) less than or equal to -2.5 SD were divided into the control ($N$=163) and the treatment group ($N$=161). The control group received usual care, while the treatment group received individualized caregiver complementary feeding education in addition to usual care. HAZ is a continuous random variable, and the difference in HAZ between control and treatment indicates the efficacy of the feeding education program, which can be assessed with Student's t-test. According to the data [@rohloff], the authors fail to reject the null hypothesis that there is no significant difference in the change in height-for-age Z score between the control and treatment. We will be reanalyzing their null hypothesis.
## Exercise 03 Load, Clean, and Tidy Data
```{r load libraries and dataset, output=FALSE}
library(here)
library(tidyverse)
library(readxl)
feeding<-read_excel(here("feeding.xls"))
```
```{r tidy data for KDE, output=FALSE}
glimpse(feeding)
feeding_tidy<-feeding |>
select(Grupo,HAZ_pre,HAZ_post)|>
rename(group=Grupo)|>
# calculate the difference in HAZ between control and treatment
mutate(diff_HAZ=HAZ_post-HAZ_pre)|>
# remove NA values
na.omit()
```
## Exercise 04 KDE
### KDE: HAZ Difference in the Entire Sample
```{r}
feeding_tidy|>
ggplot(mapping = aes(x = diff_HAZ)) +
geom_density(color = "red") +
labs(title = "Kernel Density Estimation of Differences in HAZ",
x = "HAZ Difference",
y = "Density")
# display summary statistics for the variable
summary(feeding_tidy$diff_HAZ)
```
This KDE estimates the probability density of height-for-age Z score (HAZ) difference in the entire sample, including control and treatment. HAZ difference ranges from -1.49 to 1.24 with equal mean and median of 0.01. The curve is almost symmetrical and bell-shaped, indicating that the difference in HAZ is close to being normally distributed in the sample. HAZ difference is concentrated at 0, meaning that in the 296 observations (after excluding NA values), there is mostly no difference between HAZ pre-intervention and post-intervention in the entire sample.
### KDE: HAZ Difference in Treatment
```{r}
feeding_tidy_treatment<-feeding_tidy|>
filter(group == 1)
feeding_tidy_treatment |>
ggplot(mapping = aes(x = diff_HAZ))+
geom_density(color = "red") +
labs(title = "Kernel Density Estimation of Differences in HAZ in Treatment",
x = "HAZ Difference",
y = "Density")
summary(feeding_tidy_treatment$diff_HAZ)
```
This KDE estimates the probability density of height-for-age Z score (HAZ) difference in the treatment group ($N=145$). HAZ difference ranges from -1.49 to 1.24 with equal mean and median of 0.05. HAZ is almost symmetrical and bell-shaped, indicating that the difference in HAZ is close to be normally distributed in the treatment group. HAZ difference is concentrated around 0, meaning that in the treatment group, there is mostly no difference between HAZ pre-intervention and post-intervention.
### KDE: HAZ Baseline
```{r}
feeding_tidy|>
ggplot(mapping = aes(x = HAZ_pre)) +
geom_density(color="red") +
labs(title = "Kernel Density Estimation of Baseline HAZ",
x = "HAZ at Baseline",
y = "Density")
summary(feeding_tidy$HAZ_pre)
```
This KDE estimates probability density of height-for-age Z score (HAZ) at baseline. HAZ at baseline ranges from -5.47 to -0.17 with a mean of -3.42 and a median of -3.33. Observations are concentrated at around -3, meaning that a baseline HAZ at about -3 is the most common in the sample. The shape of the distribution is skewed to the right, with the right tail (greater values) longer than the left tail (smaller values). In a right-skewed distribution, there are fewer children with a high HAZ than with a lower HAZ. The distribution makes sense because the study chooses stunted children sample based on the criterion that the child's baseline HAZ is less than or equal to -2.5SD [@martinez2018].
### KDE: HAZ Post-Intervention
```{r}
feeding_tidy|>
ggplot(mapping = aes(x = HAZ_post)) +
geom_density(color = "red") +
labs(title = "Kernel Density Estimation of HAZ After Intervention",
x = "HAZ After Intervention",
y = "Density")
summary(feeding_tidy$HAZ_post)
# get the most common occurrences of post-intervention HAZ
feeding_tidy|>
count(HAZ_post)|>
arrange(desc(n))
```
This KDE estimates probability density of height-for-age Z score (HAZ) after intervention. HAZ after intervention ranges from -5.35 to -0.81 with a mean of -3.40 and a median of -3.38. The mode of the sample for post-intervention HAZ is -3.07, and the second most common occurrence is -3.45. The distribution is still right skewed since the median is less than the mean. There are more children with a lower HAZ than a higher HAZ post intervention.
## Exercise 05 LOESS
```{r tidy data for lOESS}
feeding_LOESS<-feeding|>
select(WAZ_pre,WAZ_post,HAZ_pre,HAZ_post)|>
na.omit()
```
#### Weight and Height at Baseline
```{r}
feeding_LOESS|>
ggplot(mapping = aes(x = WAZ_pre, y = HAZ_pre)) +
geom_smooth(method = "loess") +
labs(title = "Relationship between WAZ and HAZ at Baseline",
x = "Weight-for-age Z score (WAZ)",
y = "Height-for-age Z score (HAZ)")
```
Children's weight-for-age Z score and height-for-age Z score at baseline are positively correlated. It makes sense because as their weight increases, their height also increases.
#### Weight and Height Post-Intervention
```{r}
feeding_LOESS|>
ggplot(mapping = aes(x = WAZ_post, y = HAZ_post)) +
geom_smooth(method = "loess") +
labs(title = "Relationship between WAZ and HAZ at the Exit of the Study",
x = "Weight-for-age Z score (WAZ)",
y = "Height-for-age Z score (HAZ)")
```
Similarly, children's weight-for-age Z score and height-for-age Z score at the exit of the study are positively correlated. Height increases as weight increases.
## Exercise 06 Non-parametric testing
### 6.1 Assumptions
@martinez2018 uses a two-sample t-test to test the null hypothesis that the feeding education program does not affect height-for-age Z score. The assumptions are
1\. Data in control and treatment group is a random sample drawn from the population since this is a randomized controlled trial (RCT).
2\. Height-for-age Z score is normally distributed.
3\. The population have equal variances, verified by the test below. There is insufficient evidence to reject the null that the population have equal variances. The variances are therefore equal.
```{r}
feeding_tidy |>
group_by(group) |>
summarize(var(diff_HAZ))
# test for equality of variances
var.test(diff_HAZ ~ group, data = feeding_tidy)
```
In non-parametric testing, there are less assumptions. For example, it only assumes that data from the population is continuous, with no assumption on the specific distribution form such as normal distribution.
### 6.2 Null and Alternative Hypothesis
$H_0:\mu_{control}=\mu_{treatment}$
$H_a:\mu_{control}\neq\mu_{treatment}$
If *p* value is less than 0.5, we will reject the null and conclude that there is a significant difference in the population mean HAZ change between control and treatment group.
### 6.3 Permutation Test
```{r}
library(infer)
feeding_tidy$group<-as.factor(feeding_tidy$group)
point_estimate <- feeding_tidy|>
specify(diff_HAZ ~ group)|>
calculate(stat = "diff in means", order = c("0", "1"))
perm_dist<- feeding_tidy |>
specify(response = diff_HAZ, explanatory = group) |>
hypothesize(null = "independence") |>
generate(reps = 10000, type = "permute") |>
calculate(stat = "diff in means", order = c("0", "1"))
perm_dist |>
visualize()+
labs(title="Permutations of Differences",
subtitle="Generated Under Null Hypothesis of No Difference in Means")+
shade_p_value(obs_stat = point_estimate, direction = "two-sided")
perm_dist |>
get_p_value(obs_stat = point_estimate, direction = "two-sided")
```
### 6.4 Interpretation
The 2-Sample Difference of Means test shows that *p* value is greater than 0.05. There is insufficient evidence to reject the null hypothesis at 5% significance level. We conclude that the mean change in height-for-age Z score in the treatment group is not statistically from that in the control control. The non-parametric testing reaches the same conclusion as the paper--the complementary feeding education program does not improve stunted children' height-for-age Z score.