This project analyzes transit accessibility and service distribution across Boston's MBTA subway system, identifying underserved areas and potential expansion opportunities. By combining MBTA transit data with US Census population data, we evaluate transit accessibility and develop recommendations for system improvements.
- Which areas of Boston are most underserved by the MBTA?
- What areas are the best candidates for system expansion?
- Chelsea, Everett, South Boston, and Dorchester were identified as significantly underserved areas
- Machine learning models helped identify patterns in transit accessibility across census blocks
- MBTA API: Station locations and service data
- US Census API: Population data for Suffolk, Middlesex, and Norfolk counties
- MassGIS: Geographic shapefiles for census block mapping
-
Transit Demand Scoring
- Developed a custom demand score formula: (population_density²) * (distance/800)
- Incorporated distance thresholds: 800m minimum (well-served areas) and 10km maximum (beyond rapid transit range)
-
Machine Learning Models
- Linear Regression: Predicting transit demand scores
- K-means Clustering: Grouping census blocks by accessibility levels
- Python
- Pandas & NumPy for data manipulation
- Scikit-learn for machine learning models
- Folium or Mapping
- GeoPandas for geospatial analysis
- Matplotlib & Seaborn for visualization
- Clone the repository
- Install required packages(there's a few):
- Set up API keys for MBTA and Census APIs
- Run the Jupyter notebook(mainNotebook) to replicate.
- Sebastian Arteaga ([email protected])
- Darwin Alarcon ([email protected])
- Suraj Swamy ([email protected])
- Jason Lam ([email protected])
- Northeastern University DS3000: Foundations of Data Science
- Dr. Mohit Singhal ([email protected])
- MBTA for providing access to their API
- US Census Bureau for population data
- MassGIS for geospatial data