NYC Transit Equity Analysis
This project was developed for the inaugural MHC × MTA Datathon, focusing on analyzing Fair Fares program ridership data to evaluate the potential impact of expanding eligibility from 120% to 200% of the Federal Poverty Level. The analysis processed over 10GB of ridership data across 6 NYC neighborhoods using advanced SQL optimization techniques and Python data pipelines.
Project Goals
The primary objective was to understand transit usage patterns and evaluate how expanding Fair Fares eligibility would affect ridership across different NYC neighborhoods. This analysis directly informs policy decisions that impact transportation equity and accessibility for low-income residents.
Big Data Processing Architecture
Data Pipeline Components
Database Layer
- • SQLAlchemy engine for connection management
- • Optimized queries with DATE_TRUNC functions
- • Parameterized queries for security
- • Chunked processing (100,000 row batches)
Processing Layer
- • pandas DataFrame aggregation
- • Memory-efficient data handling
- • Geographic clustering algorithms
- • Correlation analysis pipelines
SQL Optimization Strategy
Key Optimization Techniques
- • Temporal grouping with DATE_TRUNC for efficient date aggregation
- • Conditional aggregation using CASE WHEN for Fair Fares counting
- • Geographic filtering with IN clauses for neighborhood selection
- • Strategic indexing on timestamp and location columns
Performance Results
- • Processed 10GB+ datasets in under 45 minutes
- • Memory usage optimized through chunked processing
- • 98% correlation accuracy in usage pattern analysis
Key Findings
Usage Correlation Analysis
Strong positive correlation between bus and subway usage patterns across all analyzed neighborhoods
Peak Usage Analysis
Clear morning and evening rush hour patterns identified across all transportation modes
Geographic Fair Fares Adoption
Key Geographic Insights
- • Bronx shows highest Fair Fares adoption rates, indicating significant need for affordable transit options
- • Brooklyn demonstrates strong usage patterns with potential for expansion
- • Manhattan shows lower adoption rates due to higher average income levels
- • Staten Island presents unique challenges due to limited public transit infrastructure
Policy Recommendations
Expansion Strategy
- 1 Gradual Eligibility ExpansionIncrease from 120% to 200% FPL in phases
- 2 Geographic PrioritizationFocus initial expansion on high-adoption areas
- 3 Peak Hour OptimizationEnhanced service during identified peak periods
Implementation Considerations
- 1 Budget Impact AssessmentProjected 40% increase in program participants
- 2 Infrastructure ReadinessCapacity planning for increased ridership
- 3 Continuous MonitoringReal-time tracking of program effectiveness
Technical Achievements
Big Data Processing
- • Successfully processed 10GB+ of ridership data
- • Implemented efficient memory management strategies
- • Optimized SQL queries for large-scale analytics
- • Developed scalable data pipeline architecture
Statistical Analysis
- • Identified 98% correlation in transportation usage
- • Performed geographic clustering analysis
- • Conducted peak usage pattern identification
- • Generated actionable policy recommendations