Download archived NYC MTA bus position data, and scrape gtfs-realtime data from the MTA.
Bus position data for July 2017 forward is archived at https://s3.amazonaws.com/nycbuspositions
. Archive files follow the pattern https://s3.amazonaws.com/nycbuspositions/YYYY/MM/YYYY-MM-DD-bus-positions.csv.xz
, e.g. https://s3.amazonaws.com/nycbuspositions/2017/07/2017-07-14-bus-positions.csv.xz
.
Requirements:
- Python 3.x
- PostgreSQL 9.5+
Specify your connection parameters using the standard Postgres environment variables:
PGDATABASE=dbname
PGUSER=myuser
PGHOST=myhost.com
You may skip this step if you're using a socket connection to your user's database.
This command will create a number of whose tables that begin with rt_
, notably rt_vehicle_positions
, rt_alerts
and rt_trip_updates
. It will also install the Python requirements, including the Google Protobuf library.
make install
Download a (UTC) day from data.mytransit.nyc, and import into the Postgres database dbname
:
make -f download.mk download DATE=2016-12-31
Scrapers have been tested with Python 3.4 and above. Earlier versions of Python (e.g. 2.7) won't work.
The scraper depends assumes an environment variable, BUSTIME_API_KEY
, contains an MTA BusTime API key. Get a key from the MTA.
export BUSTIME_API_KEY=xyz123
Download the current positions from the MTA API and save a local PostgreSQL database named mtadb
:
make positions
Download current trip updates:
make tripupdates
Download current alerts:
make alerts
The included crontab
shows an example setup for downloading data from the MTA API. It assumes that this repository is saved in ~/mta-bus-archive
. Fill-in the PG_DATABASE
and BUSTIME_API_KEY
variables before using.
Create a project in the Google API Console. Make sure to enable the "Google Cloud Storage API" for your application. Then set up a service account. This will download a file containing credentials named something like myprojectname-3e1f812da9ac.json
.
Then run the following (on the machine you'll be using to scrape and upload) and follow instructions:
gsutil config -e
Next, create a bucket for the data using the Google Cloud Console.
You've now authenticated yourself to the Google API. You'll now be able to run a command like:
make -e gcloud DATE=2017-07-14 PG_DATABASE=mydbname
By default, the Google Cloud bucket will have the same name as the database. Use the variable GOOGLE_BUCKET
to customize it.
Available under the Apache License.