This repository contains program codes for replicating the analysis of the article "Market Entry, Fighting Brands and Tacit Collusion: Evidence from the French Mobile Telecommunications Market" by Marc Bourreau, Yutec Sun and Frank Verboven. The codebase is also available at GitHub repository: https://github.com/yutec/FightBrand.
Below is the description of the program codes and input data, with guideline for the replication process.
The analysis data are collected from multiple sources, some of which are proprietary and thus not included in the repository. For information on the final analysis data, the Stata codebook file codebook.txt
is available under directory data
.
For analysis, the complete set of data are assumed to be located in two folders: data/pre2014
and data/post2014
, as can be seen from the Stata do script 1merge.do
(more descriptions to follow).
- Kantar
- The demand-side data was purchased from the Kantar Worldpanel (https://www.kantarworldpanel.com/global) under a non-disclosure agreement. For purchase inquiry, it is recommended to contact the Kantar UK director Tony Fitzpatrick ([email protected]). It can take several months to negotiate data use agreements and gain access to the data. The authors will assist with any reasonable replication attempts for two years following publication.
- The pre2014 data contain the following samples collected during 2011-2013:
France_Data.csv
: main consumer surveybill2.csv
: mobile tariff bills of each consumer categorized by rangesspend.csv
: mobile spending of each consumerPanelDemogs.csv
: consumer panel demographicsdeptINSEE1.csv
,deptINSEE2.csv
,deptINSEE3.csv
: geographic locations (departments) of consumer paneldepts2011.txt
: code list of departments in France
- The post2014 data augment the above with 2014 survey sample in similarly structure:
FranceDatav2.csv
FrancePanelv2.csv
PhoneList.csv
: list of mobile phone devices
- ANFR
- The dataset on cellular networks was provided by Agence Nationale des Fréquences (ANFR), who gave permission for the public sharing. It is included in the directory
data/anfr
. An accompanying codebook is also available as filedocumentation-cartoradio.pdf
in the same location. For data inquiry, the agency can be contacted via its website form "Formulaire contact - Données" (https://www.anfr.fr/contact/poser-une-question/formulaire-contact-donnees/#menu2), which is only available in French. - The R script
main.R
in the same directorydata/anfr
converts the original data into csv fileanfr2016.csv
for Stata import. - For analysis, the data file
anfr2016.csv
is assumed to be located underdata/post2014
.
- Census
- The 2012 population data came from Institut National de la Statistique et des Études Économiques (INSEE). The population data were manually collected at the webpage (https://www.insee.fr/fr/statistiques/2119585?sommaire=2119686) from the table “Populations légales 2012” that has subtitle “Recensement de la population - Population des régions.” The hand-coded data file is located at
data/insee/population.csv
in the OPENICPSR repository (https://www.openicpsr.org/openicpsr/workspace?goToPath=/openicpsr/138921&goToLevel=project). The accompanying fileremapRegion.csv
classifies the regions according to the new 2016 legislation, and it is manually coded. - For analysis,
population.csv
is assumed to be located withindata/pre2014
, andremapRegion.csv
to be insidedata/post2014
.
- OECD data on fighting brands
- This contains the entry years of low-cost brands across OECD countries within folder
data/oecd
. The data were manually compiled from various publicly accessible sources. The source locations and links are documented in the accompanying filelow_cost_subsidiary_brands.docx
within the same folderdata/oecd
at the OPENICPSR repository where the raw data and Stata codes for processing the file are also provided.
The replication process requires a single local machine operating Linux/Mac OS with Stata 14.2, Julia 1.6.0, Matlab 2019, and a Unix-like shell equivalent to Bash. In Linux, 1 gigabytes of memory and 10 gigabytes of disk space would be sufficient. Total computation takes about 2 days with 40 CPU cores in our system. Absent parallelization, multithreading is the default option for Julia and could take up to 2-3 months in our crude projection.
Extra flag options, shell scripts, and system configuration may be required for cluster systems. Microsoft Windows is not guaranteed to work with the Julia replication code, and it is the user's responsibility to ensure seamless execution.
Optionally for Julia IDE, the user is advised to use Atom. Visual Studio Code is not supported due to unresolved file IO and library path issues.
- Stata
- Install the following Stata packages by running
0config.do
: carryforward
ivreg2
unique
estout
- Julia
-
Download the Julia binary from https://julialang.org/. Using Julia built from the source may produce different results.
-
Install the required packages by entering within the Julia REPL console the following command:
] add JLD, LoopVectorization, Optim, NLsolve, StatsBase, Distributions, Plots, Revise, CSV, DataFrames
- Matlab (optional)
- For generating Latin hypercube pseudorandom numbers for estimation and simulation, we use Matlab's lhsnorm procedure. The files are provided in the replication package.
We provide a step-by-step description on how the results were produced for the manuscript.
- Move
data
folder to~/work/kantar/brand/data
(some data not included). - Create path
~/work/kantar/brand/work
for Stata outputs. - Create path
~/work/kantar/brand/workfiles
for Julia to import the csv files exported by Stata. - Execute Stata codes in
dataprep
folder. Follow the instruction below for details. - Move csv file outputs in
~/work/kantar/brand/work
to~/work/kantar/brand/workfiles
. - Go to
code/estim
- Run
julia -O3 -p 20 mainMulti.jl
. It can take about 2 days on the Intel Xeon E5-4627 v4 system. Check below for details. - Run
julia -O3 main.jl 2>&1 | tee -a log.txt
. It takes about 20-30 minutes on MacBook Pro 2019 with Intel Core i9. - Copy output files:
cp out/base/m0s2blp200/swtest_blp.csv post/testIV/m0/
cp out/base/m0s2diff-quad200/swtest_diff-quad.csv post/testIV/m0/
cp out/base/m0s2diff-local200/swtest_diff-local.csv post/testIV/m0/
cp out/base/m15s2blp200/swtest_blp.csv post/testIV/m15/
cp out/base/m15s2diff-quad200/swtest_diff-quad.csv post/testIV/m15/
cp out/base/m15s2diff-local200/swtest_diff-local.csv post/testIV/m15/
- Go to
post/testIV/m0
and runswtest.do
. - Go to
post/testIV/m15
and runswtest.do
. - Go to
code
- Copy csv files for
sim
module:
cp estim/out/base/m0s0opt200/*.csv sim/input/dat824/m0/
cp estim/out/base/m15s0opt200/*.csv sim/input/dat824/m15/
- Go to
code/sim
- Run
julia -O3 -p 40 mainMulti.jl
. With 40 CPU cores, this step usually takes about 8-10 hours. - Run
julia -O3 -p 40 mainMulti2.jl
. - Run
julia -O3 main.jl 2>&1 | tee -a log.txt
.
This completes the replication process.
The replication proceeds in multiple stages. In the data-cleaning stage, Stata exhibits randomness during Steps 1 and 2 described in the following section Module dataprep
where an intermediary file dataProcessed802.dta
is generated. As a result, the sample size of the analysis data may vary each time the data is generated. For later stages, it is required to use the same file dataProcessed802.dta
to ensure the successful replication.
For estimation, the Julia code selects the global optimum among multiple trials, often through multi-step process. And the final estimation results are entered as input for the counterfactual simulation. All of this creates a sequential dependence throughout the entire replication process. The replication codes assume that the replication follows a sequence pre-determined by the prior knowledge of analysis results in each stage. Whenever this assumed chain is broken due to any change with data, random numbers, or numerical library, it is critical for the user to update it for correct replication. For details, see the discussion at the beginning of Module estim
in Section Program structure.
Different Stata versions may also cause discrepancy in some tables. For example, Stata 14 and 17 were found to generate slightly different results for Table A.2.
The analysis results are generated in in the following steps where the program files are organized by the corresponding folders.
Within this folder, Stata scripts build from sources the dataset for analysis. Before getting started, the external packages can be installed by executing the Stat do-script file 0config.do
first. Then, the do-script files must be executed in the following order.
- merge: Merge all raw source files into Stata format
- clean: Clean up the dataset and define variables
- reshape: Reshape the data structure into estimation format
- estim: Estimate simple and IV logit demands
- export: Export datasets into csv format for estimation and simulation in Julia
- plots/tables: Produce descriptive statistics and plots reported in the paper.
- estim.extra: the version of estim implemented for the full sample
- export.extra: the full-sample version of export
The generated dataset may vary each time steps 1 and 2 are executed. For consistent and exact replication, it is necessary to use the same output dataProcessed802.dta
as described earlier. For the complete list of tables produced by this module, see the section Tables below.
The script file 5export.do
exports the following csv files for estimation in Step 2:
demand824.csv
: consumer demand and sample identifiers for the main modeldemand824ms15.csv
: a variant of demand824.csv where market size increased by 50%demand824NoAllow.csv
: a variant of demand824.csv where sample is extended to 2011 Q1 without allowance variablesXinput824.csv
: product characteristicsZinputBlp824.csv
,ZinputBlp824core.csv
: inputs for BLP instrumentsDiffIVinput824reduced.csv
,DiffIVinput824reduced2.csv
: inputs for differentiation IVsincome.csv
,incomeEMdraws
: income statistics and draws
All the csv files are generated under ~/work/kantar/brand/work
by default. They must be copied into ~/work/kantar/brand/workfiles
for the next stage.
This folder contains Julia program files for random coefficients logit demand estimations in the Unix-type shell environment. It also includes Stata scripts within subfolder post/testIV
to perform weak IV tests.
In addition to the above CSV data files, the estimation needs input files for random draws simulated by external program. We used Matlab's lhsnorm to generate Latin hypercupe samples from normal distribution. The file names must be consistent with the dimension of the random coefficients distribution and the number of simulation draws. Matlab codes used for random number generation are sampleDraws.m
and simDraw.m
. They are included within the directory ~/work/kantar/brand/workfiles/simdraws
where the input random number files are also located.
For selecting the global optimum of GMM estimation, the Julia codes assume that the user will provide the input "mcid" for the estiamtion procedure "gmmOptimMulti!" within mainMulti.jl
for the two-stage estimators. The parameter "mcid" chooses which Monte Carlo estimate to use as the global optimum, and it is manually selected by the prior knowledge of the results. All sources of randomness are suppressed in Julia so that "mcid" does not need to be updated with each program run, as long as the working data derive from the same Stata intermediary file dataProcessed802.dta
.
The program should run in the following steps.
Estimate RC logit demand using various specifications & IV approaches from 20 starting points. For acceleration, it is advised to use multiple processors by entering in the command shell:
julia -p 20 -O3 mainMulti.jl 2>&1 | tee -a err.txt
where 20 is the maximum number of CPUs for running each of the estimation runs from 20 starting points. More CPUs are redundant, and large memory is not required. In case of a program error, all the output messages on the screen can be found by inspecting the error log file err.txt
.
To work with clusters, extra preparation is typically required. A sample script for launching job on PBS cluster looks as follows:
#!/bin/bash
#PBS -S /bin/bash
#PBS -N julia
#PBS -l nodes=1:ppn=4
#PBS -l walltime=03:00:00:00
#PBS -l pmem=1gb
#PBS -o out.txt
#PBS -e err.txt
#PBS -m abe
date
# Change to directory from which job was submitted
cd $PBS_O_WORKDIR
echo $PBS_O_WORKDIR
echo "loaded modules"
#module load R/3.5.0-foss-2014a-bare
module list
echo "here is your PBS_NODEFILE"
cat $PBS_NODEFILE
echo "check library path"
echo $LD_LIBRARY_PATH
echo "calling julia now"
julia -O3 --machine-file $PBS_NODEFILE mainMulti.jl
echo "julia successfully finished"
On alternative cluster, the user may have to follow similar procedure for the cluster to load library paths and add worker nodes properly. For more information on troubleshooting cluster problems, consult with Julia documentations at https://docs.julialang.org/en/v1/stdlib/Distributed/. General information on Julia parallel computing is available at https://docs.julialang.org/en/v1/manual/distributed-computing/.
Estimating model 29 (Line 29 in mainMulti.jl
) alone takes about 30 hours when running on a 40-core Xeon E5-4627 v4 system. The rest of the lines may take about total 6-8 hours.
This generates main estimation tables for the manuscript and input files for the weak IV tests, using the estimation results obtained in Step 1. It also exports input files for the Monte Carlo simulation in the next step. All the output files are exported to corresponding subfolders within directory out
. It can be executed by entering:
julia -O3 main.jl 2>&1 | tee -a log.txt
The program exports the estimation results into CSV files to be imported to LibreOffice for print-friendly format. The list includes:
- The remaining columns in Tables 4 & A.1
- Table A.15
- Table A.17
- Table A.2 (only in screen output)
When the excecution is complete, file log.txt
stores all the screen outputs, among which Table A.2 can be found.
The Stata script performs the Sanderson-Windmeijer test under folder post/testIV/mXX
where XX denotes model ID code. First, the input CSV files swtest_xxx.csv
(xxx is a tag identifier) need to be placed in the same directory, which can be done by running the following commands on terminal:
cp out/base/m0s2blp200/swtest_blp.csv post/testIV/m0/
cp out/base/m0s2diff-quad200/swtest_diff-quad.csv post/testIV/m0/
cp out/base/m0s2diff-local200/swtest_diff-local.csv post/testIV/m0/
cp out/base/m15s2blp200/swtest_blp.csv post/testIV/m15/
cp out/base/m15s2diff-quad200/swtest_diff-quad.csv post/testIV/m15/
cp out/base/m15s2diff-local200/swtest_diff-local.csv post/testIV/m15/
The Stata script prints the results as screen outputs for the following table:
- Tables A.16
Since Tables A.16 reports only a few statistics from the output, the table is manually constructed.
All the estimation results are stored within a subfolder corresponding to each estimation specification under the output directory out/
.
gmmStage2.csv
: output for post-estimation analysisgmmParam.csv
,estimLatex.csv
: ouptput for formatted tablesgmmStage2Interim.csv
: summary of 20 estimation runsparamDraws.csv
: Bootstrap samples of GMM estimates for Monte Carlo simulationexpDelta.csv
: BLP fixed point exp(delta)swtest_xxx.csv
: outputs for the weak IV tests. Tag xxx corresponds to IV approaches (BLP or diff IVs)
The Julia code in this folder produces Monte Carlo simulation results. It takes as input the CSV files exported from the estim
module, which must be copied to subfolder input/dat824/mxx
where tag xx is the model identifier. In the console, it can be done by entering:
cp estim/out/base/m0s0opt200/*.csv sim/input/dat824/m0/
cp estim/out/base/m15s0opt200/*.csv sim/input/dat824/m15/
This performs 200 Monte Carlo simulations. Each CPU executes single Monte Carlo cycle out of the total 200 simulations. For example, we can run on 40 CPUs by entering:
julia -p 40 -O3 mainMulti.jl 2>&1 | tee -a err.txt
Each instance does not need large memory (about 1GBs of memory would suffice). All the scren outputs are stored in err.txt
to store possible error messages as before.
This exports the simulation results as JLD file sim824.jld
in a directory path where its name is structured as output/dat824/m15/mc0/b1/
, where m15 is model ID for the RC logit demand specification, mc0 for the default wholesale marginal cost of MVNOs, b1 for the 1st 200 batch of the Monte Carlo (v1 for vertical integration model). For details, see the arguments of "readData" function in Helper.jl
.
This file performs the same simulation as mainMulti.jl
, but only for a different model ("model=0"). It can be run by entering command:
julia -p 40 -O3 mainMulti2.jl 2>&1 | tee -a err2.txt
The results are saved in the same JLD file sim824.jld
in the directory named similarly as above.
This post-simulation code generates all the remaining tables for the counterfactual exercises in the manuscript. Most outputs are printed in the console screen and are also saved in the log file log.txt
. The large tables for diversion ratios and elasticities are exported as CSV files within the same subfolder as in the above (mainMulti.jl
). It takes as input the file sim824.jld
in the original path without need for copying.
julia -O3 main.jl 2>&1 | tee -a log.txt
After the execution is complete, the tables can be retrieved from file log.txt
.
The following table lists the location of source codes generating the tables in the manuscript as either screen or file output. The exported CSV files are not in print-friendly format and need to be imported into LibreOffice or equivalent to generate the final table. The line numbers are where the analysis results are processed for the final output, after the main part of intensive computations is complete.
Table | Program | Folder | Line number | Output |
---|---|---|---|---|
1 | 6tables.do | dataprep | 20 | Screen |
2 | 6tables.do | dataprep | 46, 50 | Screen |
3 | 6tables.do | dataprep | 63, 65 | Screen |
4 | - | - | - | From Table A1 |
4estim.do | dataprep | 71, 81 (IV logit-elasticities) | Screen | |
post.jl | sim | 24 (RC logit I-elasticities) | Screen | |
24 (RC logit II-elasticities) | Screen | |||
5 | - | - | - | From Table A3 |
6 | post.jl | sim | 24 | Screen |
7 | post.jl | sim | 24 | Screen |
8 | post.jl | sim | 24 | Screen |
9 | post.jl | sim | 24 | Screen |
A1 | 4estim.do | dataprep | 64 (Logit & IV logit) | tableLogit824.csv |
main.jl | estim | 32 (RC logit I) | out/base/m0s0opt200/estimLatex.csv | |
main.jl | estim | 33 (RC logit II) | out/base/m15s0opt200/estimLatex.csv | |
A2 | 3reshape.do | dataprep | 422 (Observed income) | Screen |
main.jl | estim | 48 (Predicted income) | Screen | |
A3 | post.jl | sim | 26 | output/dat824/m15/mc0/b1/divRatio.csv |
A4 | post.jl | sim | 26 | output/dat824/m15/mc0/b1/elasR.csv |
A5 | post.jl | sim | 24 | Screen |
A6 | post.jl | sim | 24 | Screen |
A7 | post.jl | sim | 24 | Screen |
A8 | post.jl | sim | 24 | Screen |
A9 | post.jl | sim | 24 | Screen |
A10 | post.jl | sim | 24 | Screen |
A11 | post.jl | sim | 24 | Screen |
A12 | post.jl | sim | 24 | Screen |
A13 | post.jl | sim | 24 | Screen |
A14 | post.jl | sim | 24 | Screen |
A15 | main.jl | estim | 32 (RC logit I) | out/base/m0s0opt200/estimLatex.csv |
33 (RC logit II) | out/base/m15s0opt200/estimLatex.csv | |||
34 (Normal RC) | out/base/m27s0opt3000/estimLatex.csv | |||
35 (M*1.5) | out/ms15/m15s0opt200/estimLatex.csv | |||
36 (No Allownace) | out/noAllow/m15s0opt200/estimLatex.csv | |||
37 (Full sample) | out/extra/m15s0opt200/estimLatex.csv | |||
A16 | swtest.do | estim/post/testIV/m0 | 7 (RC logit I-BLP) | Screen (manually collected) |
12 (RC logit I-Diff IV quad) | Screen (manually collected) | |||
17 (RC logit I-Diff IV local) | Screen (manually collected) | |||
swtest.do | estim/post/testIV/m15 | 7 (RC logit II-BLP) | Screen (manually collected) | |
12 (RC logit II-Diff quad) | Screen (manually collected) | |||
17 (RC logit II-Diff local) | Screen (manually collected) | |||
A17 | main.jl | estim | 40 (RC logit I-BLP) | out/base/m0s2blp200/estimLatex.csv |
41 (RC logit I-Diff quad) | out/base/m0s2diff-quad200/estimLatex.csv | |||
42 (RC logit I-Diff local) | out/base/m0s2diff-local200/estimLatex.csv | |||
43 (RC logit II-BLP) | out/base/m15s2blp200/estimLatex.csv | |||
44 (RC logit II-Diff quad) | out/base/m15s2diff-quad200/estimLatex.csv | |||
45 (RC logit II-Diff local) | out/base/m15s2diff-local200/estimLatex.csv | |||
A18 | post.jl | sim | 24 | Screen |
The following table lists the location of codes exporting the figures shown in the manuscript and appendix.
Figure | Program | Folder | Line number | Output |
---|---|---|---|---|
1 | 6plots.do | dataprep | 43 | price1.pdf |
2 | 6plots.do | dataprep | 29 | ms1.pdf |
3 | paper.tex | LaTeX file | 731-836 | In text |
4 | figure4.do | data/oecd | 6 (Figure 4a) | fig4a.pdf |
13 (Figure 4b) | fig4b.pdf | |||
A1 | 6plots.do | dataprep | 52 | price2.pdf |
For Figure 4, the Stata code figure4.do
imports input data crosscountry.dta
that was manually entered based on the table "Entries per year" in the enclosed Excel file table low cost brands.xlsx
All the raw data and their sources are in the same folder data/oecd
.
For the BLP demand estimation, this paper uses the continuous-updating version of the optimal IV approach based on Reynaert and Verboven (Journal of Econometrics, 2014). The procedure is implemented by the optimIV!
function within Estim.jl
of the estim module. The rest of the main computations for the GMM estimation is performed by Estim.jl
as well.