NYC Transit Equity Analysis

Comprehensive analysis of Fair Fares ridership data across 6 NYC neighborhoods to evaluate expanding eligibility from 120% to 200% Federal Poverty Level, processing massive datasets with optimized SQL queries and identifying 98% correlation patterns.

10GB+

Data Volume

98%

Usage Correlation

6 NYC Areas

Neighborhood Analysis

Datathon Participant

Competition Result

Technology Stack

Python SQL Tableau pandas Data Pipeline Geospatial Analysis Public Policy Big Data Processing

This project was developed for the inaugural MHC × MTA Datathon, focusing on analyzing Fair Fares program ridership data to evaluate the potential impact of expanding eligibility from 120% to 200% of the Federal Poverty Level. The analysis processed over 10GB of ridership data across 6 NYC neighborhoods using advanced SQL optimization techniques and Python data pipelines.

Project Goals

The primary objective was to understand transit usage patterns and evaluate how expanding Fair Fares eligibility would affect ridership across different NYC neighborhoods. This analysis directly informs policy decisions that impact transportation equity and accessibility for low-income residents.

Big Data Processing Architecture

Data Pipeline Components

Database Layer

• SQLAlchemy engine for connection management
• Optimized queries with DATE_TRUNC functions
• Parameterized queries for security
• Chunked processing (100,000 row batches)

Processing Layer

• pandas DataFrame aggregation
• Memory-efficient data handling
• Geographic clustering algorithms
• Correlation analysis pipelines

SQL Optimization Strategy

Key Optimization Techniques

• Temporal grouping with DATE_TRUNC for efficient date aggregation
• Conditional aggregation using CASE WHEN for Fair Fares counting
• Geographic filtering with IN clauses for neighborhood selection
• Strategic indexing on timestamp and location columns

Performance Results

• Processed 10GB+ datasets in under 45 minutes
• Memory usage optimized through chunked processing
• 98% correlation accuracy in usage pattern analysis

Key Findings

Usage Correlation Analysis

98%

Bus-Subway Correlation

Strong positive correlation between bus and subway usage patterns across all analyzed neighborhoods

Manhattan 99.2%

Brooklyn 97.8%

Queens 98.1%

Peak Usage Analysis

8 AM / 6 PM

Peak Hours

Clear morning and evening rush hour patterns identified across all transportation modes

Morning Rush (7-9 AM) 34%

Evening Rush (5-7 PM) 31%

Off-Peak Hours 35%

Geographic Fair Fares Adoption

Bronx

Highest Adoption

23.4%

Brooklyn

Strong Usage

18.7%

Queens

Growing Adoption

15.2%

Key Geographic Insights

• Bronx shows highest Fair Fares adoption rates, indicating significant need for affordable transit options
• Brooklyn demonstrates strong usage patterns with potential for expansion
• Manhattan shows lower adoption rates due to higher average income levels
• Staten Island presents unique challenges due to limited public transit infrastructure

Policy Recommendations

Expansion Strategy

1

Gradual Eligibility Expansion

Increase from 120% to 200% FPL in phases
2

Geographic Prioritization

Focus initial expansion on high-adoption areas
3

Peak Hour Optimization

Enhanced service during identified peak periods

Implementation Considerations

1

Budget Impact Assessment

Projected 40% increase in program participants
2

Infrastructure Readiness

Capacity planning for increased ridership
3

Continuous Monitoring

Real-time tracking of program effectiveness

Technical Achievements

Big Data Processing

• Successfully processed 10GB+ of ridership data
• Implemented efficient memory management strategies
• Optimized SQL queries for large-scale analytics
• Developed scalable data pipeline architecture

Statistical Analysis

• Identified 98% correlation in transportation usage
• Performed geographic clustering analysis
• Conducted peak usage pattern identification
• Generated actionable policy recommendations

Performance Metrics

10GB+

Data Volume

Massive datasets processed with optimized SQL queries and Python pipelines

98%

Usage Correlation

98% correlation between bus and subway usage patterns identified

6 NYC Areas

Neighborhood Analysis

Comprehensive analysis across diverse NYC neighborhoods

Datathon Participant

Competition Result

Participated in inaugural MHC × MTA Datathon