Project Information

Forecasting Street Speed using Alternative Datasets

This is the final project for the course Realtime and Big Data Analytics.

The project is to predict the speed of the traffic on NYC streets using past data. The traffic data is collected from 2019 to 2020/03 (Uber Movement). Alternative datasets are NYC historical weather data and the New York City motor vehicle crashes.

Raw data sources are placed into HDFS where we apply data cleaning and pre-processing using MapReduce. Then we use Spark to construct DataFrames over the data sources and apply additional steps to fit the structure for forecasting. Finally using Fbprophet in conjunction with Pyspark we construct forecasts over our testing period for a fixed number of streets. These can then be stored in HDFS or otherwise the average RMSE is calculated against the true street speed values.

More information about the project can be found on the GitHub repository.