By Suzette Norris
To many, 68,437 observations of traffic congestion over an eight-month period may not generate excitement. But to data science students from several universities across New York State, it’s a chance to make their mark in the growing field of urban analytics.
The Up-Stat 2015 Data Competition, sponsored by Xerox, brought together some of the brightest university students from across the state during the Fourth Annual Joint Conference of the Upstate Chapters of the American Statistical Association.
The data, provided by the Monroe County Department of Transportation, was obtained from a “loop detector” underneath the pavement of the southbound lane near a busy intersection in Rochester, NY. Loop detectors sense the presence and speed of vehicles. They can tell how many cars are traveling on a road, how many are lined up at a traffic light, and the average speed of traffic. They often are used by transportation agencies to identify areas and episodes of traffic congestion.
“Students studying the data set discovered that traffic patterns fall neatly into categories for weekdays, Saturdays and Sundays/holidays … that, in aggregate, the population is quite regular in its driving habits,” said John Handley (right), principal scientist and expert in data mining and business intelligence at PARC, a Xerox company. “This insight enables traffic engineers to set traffic signals and plan for capacity with confidence.” Handley was one of the conference organizers and competition judges.
As part of the competition, student teams were challenged to “clean” the data. Often data from hardware sensors has quality issues, including spuriously large values, missing data and values “stuck at zero,” Handley said. The data must be cleaned before it is used, requiring techniques such as imputing missing values or “guessing” what the real values are using statistical techniques.
Student teams also had to detect interesting patterns, and communicate the information clearly using graphics. And they had to verify their work using mathematical models.
What do you do with 68,437 observations of traffic congestion? #BigData http://ctt.ec/e1do6+ http://ctt.ec/u04P3+
“The results will help a traffic engineer understand what is going on with traffic in a city,” Handley said. “This is a way to inspire students in the region to look at these important problems and develop new statistical methods or approaches to data analysis.”
The conference was hosted by the State University of New York at Geneseo. The contest judges included representatives from Xerox, SAS the Rochester Institute of Technology and the University of Rochester.
Subscribe to this blog and receive email updates when we publish a new article.