New version of bcstats with summary sheet, variable error rates, variable type error rates, and collapsed options
net install ipabcstats, from("https://raw.githubusercontent.com/PovertyAction/ipabcstats/master") replace
Titleipabcstats -- Compare survey and back check data, producing an Excel output of comparisons and error rates.
ipabcstats, surveydata(filename) bcdata(filename) id(varlist) enumerator(varlist) backchecker(varlist) filename(filename) surveydate(varlist) bcdate(varlist) [options]
options Description ------------------------------------------------------------------------- Main * surveydata(filename) the survey data * bcdata(filename) the back check data * id(varlist) the unique ID(s) * enumerator(varlist) the enumerator ID * backchecker(varlist) the backchecker ID * filename(filename) save output file as filename * surveydate(varlist) the survey date variable (%tc format) * bcdate(varlist) the survey date variable (%tc format)
Comparison variables + t1vars(varlist) the list of type 1 variables + t2vars(varlist) the list of type 2 variables + t3vars(varlist) the list of type 3 variables
Enumerator checks enumteam(varname) display the overall error rates of all enumerator teams; varname in survey data is used bcteam(varname) display the overall error rates of all back check teams; varname in back check data is used showid(integer[%]) display unique IDs with at least integer differences or at least an integer% error rate or integer differences if % is not included; default is showid(30%)
Stability checks ttest(varlist) run paired two-sample mean-comparison tests for varlist in the back check and survey data using ttest prtest(varlist) run two-sample test of equality of proportions in the back check and survey data for dichotmous variables in varlist using prtest level(#) set confidence level for ttest and prtest; default is level(95) signrank(varlist) run Wilcoxon matched-pairs signed-ranks tests for varlist in the back check and survey data using signrank
Reliability checks reliability(varlist) calculate the simple response variance (SRV) and reliability ratio for type 2 and 3 variables in varlist
Comparisons keepsurvey(varlist) include varlist in the survey data in the comparisons data set keepbc(varlist) include varlist in the back check data in the comparisons data set full include all comparisons, not just differences nolabel do not use value labels replace overwrite existing file
Options okrange(varname range [, varname range ...]) do not count a value of varname in the back check data as a difference if it falls within range of the survey data nodiffnum(numlist) do not count back check responses that equal # as differences
nodiffstr("string" [ "string" ...]) do not count back check responses that equal string(s) as differences. Use quotations around each individual string.
excludenum(numlist) do not compare back check responses that equal any number in numlist. excludestr("string" [ "string" ...]) do not compare back check responses that equal any string. excludemissing do not compare back check responses that are missing. Includes extended missing values for numeric variables [., .a, .b, ..., .z] and blanks "" for string variables. lower convert all string variables to lower case before comparing upper convert all string variables to upper case before comparing nosymbol replace symbols with spaces in string variables before comparing trim remove leading or trailing blanks and multiple, consecutive internal blanks in string variables before comparing ------------------------------------------------------------------------- * surveydata(), bcdata(), and id() are required. * t1vars(), t2vars(), or t3vars() is required.
bcstats compares back check data and survey data, producing a data set of comparisons. It completes enumerator checks for type 1 and type 2 variables and stability checks for type 2 and type 3 variables.
The GitHub repository for bcstats is here. Previous versions may be found there: see the tags.
+----------------------+ ----+ Comparison variables +---------------------------------------------
t1vars(varlist) specifies the list of type 1 variables. Type 1 variables are expected to stay constant between the survey and back check, and differences may result in action against the enumerator. Display variables with high error rates and complete enumerator checks. See the Innovations for Poverty Action back check manual for more on the three types.
t2vars(varlist) specifies the list of type 2 variables. Type 2 variables may be difficult for enumerators to administer. For instance, they may involve complicated skip patterns or many examples. Differences may indicate the need for further training, but will not result in action against the enumerator. Display the error rates of all variables and complete enumerator and stability checks. See the Innovations for Poverty Action back check manual for more on the three types.
t3vars(varlist) specifies the list of type 3 variables. Type 3 variables are variables whose stability between the survey and back check is of interest. Differences will not result in action against the enumerator. Display the error rates of all variables and complete stability checks. See the Innovations for Poverty Action back check manual for more on the three types.
+------------------+ ----+ Stability checks +-------------------------------------------------
level(#) specifies the confidence level, as a percentage, for confidence intervals calculated by ttest and prtest. The default is level(95) or as set by set level.
+----------------------+ ----+ Comparisons data set +---------------------------------------------
keepsurvey(varlist) specifies that variables in varlist in the survey data are to be included in the output file. keepbc(varlist) specifies that variables in varlist in the back check data are to be included in the output file.
nolabel specifies that survey and back check responses are not to be value-labeled in the comparisons data set. Variables specified through keepsurvey or keepbc are also not value-labeled.
+---------+ ----+ Options +----------------------------------------------------------
okrange(varname range [, varname range ...]) specifies that a value of varname in the back check data will not be counted as a difference if it falls within range of the survey data. range may be of the form [-x, y] (absolute) or [-x%, y%] (relative).
excludenum(numlist) specifies that back check responses that equal any value in numlist will not be compared. These responses will not affect error rates and will be marked as "not compared" if you use the full option. Otherwise, they will not appear in the output file.
excludestr("string" [, "string" ...]) specifies that back check responses that equal any string in this list will not be compared. These responses will not affect error rates and will be marked as "not compared" if you use the full option. Otherwise, they will not appear in the output file. Be sure to keep each string in its own quotations, especially if there are spaces within a string.
excludemissing specifies that back check responses that are missing will not be compared. This uses the [R] missing command, so any extended missing value [., .a, .b, ..., .z] for numeric variables, and blanks ("") for string variables. Used when the back check data set contains data for multiple back check survey versions.
nodiffstr("string" [, "string" ...]) specifies that if a back check response equal any string in the list, it will not be counted as difference, regardless of what the survey response is.
excludenum(numlist) specifies that if a back check response equals any number in numlist, it will not be counted as difference, regardless of what the survey response is.
nosymbol replaces the following characters in string variables with a space before comparing: . , ! ? ' / ; : ( ) ` ~ @ # $ % ^ & * - _ + = [ ] { } | \ " < >
trim removes leading or trailing blanks and multiple, consecutive internal blanks before comparing. If nosymbol is specified, this occurs after symbols are replaced with a space.
Assume that missing values were not asked in the back check survey version. bcstats, surveydata(bcstats_survey) bcdata(bcstats_bc) id(id) /// okrange(gameresult [-1, 1], itemssold [-5%, 5%]) excludemissing /// t1vars(gender) enumerator(enum) enumteam(enumteam) backchecker(bcer) /// t2vars(gameresult) signrank(gameresult) /// t3vars(itemssold) ttest(itemssold) /// surveydate(submissiondate) bcdate(submissiondate) keepbc(comments) keepsurvey(comments) full replace
bcstats saves the following in r():
Scalars r(showid) 1 if showid() displayed number of IDs over the threshold specified in showid r(bc_only) 1 number of IDs only in backcheck data r(total_rate) 1 total error rate r(avd) 1 average days between survey and backcheck r(survey) 1 number of survey observations r(bc) 1 number of backcheck observations
Matrices r(enum) the total error rates of all enumerators r(enum1) the type 1 variable error rates of all enumerators r(enum2) the type 2 variable error rates of all enumerators r(enum3) the type 3 variable error rates of all enumerators r(backchecker) the total error rates of the back checkers r(backchecker1) the type 1 variable error rates of the back checkers r(backchecker2) the type 2 variable error rates of the back checkers r(backchecker3) the type 3 variable error rates of the back checkers r(enumteam) the total error rates of the enumerator teams r(enumteam1) the type 1 variable error rates of the enumerator teams r(enumteam2) the type 2 variable error rates of the enumerator teams r(enumteam3) the type 3 variable error rates of the enumerator teams
r(bcteam) the total error rates of the back checker teams r(bcteam1) the type 1 variable error rates of the back checker teams r(bcteam2) the type 2 variable error rates of the back checker teams r(bcteam3) the type 3 variable error rates of the back checker teams
r(var) the error rates of all variables r(var1) the error rates of all type 1 variables r(var2) the error rates of all type 2 variables r(var3) the error rates of all type 3 variables r(ttest) the results of ttest for selected variables r(ttest2) the results of ttest for type 2 variables r(ttest3) the results of ttest for type 3 variables r(signrank) the results of signrank for selected variables r(signrank2) the results of signrank for type 2 variables r(signrank3) the results of signrank for type 3 variables r(prtest) the results of prtest for selected variables r(prtest2) the results of prtest for type 2 variables r(prtest3) the results of prtest for type 3 variables
r(enum_bc) percentage of surveys backchecked for all enumerators r(rates) error rates by variable type r(rates_unit) error rates by variable type over time
Innovations for Poverty Action Back Check Manual
Hana Scheetz Freymiller of Innovations for Poverty Action conceived of the three variable types. bcstats was originally written by Matthew White with edits from Christopher Boyer, and the code and structure of this command draws heavily from their work.
Ishmail Azindoo Baako Rosemarie Sandino
For questions or suggestions, submit a GitHub issue or e-mail [email protected].
Also see
Help: [R] ttest, [R] prtest, [R] signrank