NYC Subways and the Terrible, Horrible, No Good, Very Bad, Turnstile Data
The future ain’t what it used to be. - Yogi Berra
![]()
MTA turnstile data is shockingly bad, in two ways.
First, ridership is down a lot, with far-reaching implications.
And also, the data is really poor quality. You can’t run a railroad with data this bad. Maybe they have something better internally. But if so, no reason to publish embarrassingly bad data.
We have data for each turnstile at 4-hour intervals. You don’t always get data consistently every 4 hours though. Check out Astoria Boulevard.
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,00:00:00,REGULAR,0006787742,0012484763
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,04:00:00,REGULAR,0006787746,0012484793
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,08:00:00,REGULAR,0006787934,0012484831
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,12:00:00,REGULAR,0006788217,0012484886
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,00:00:00,REGULAR,0006789742,0012487200
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,04:00:00,REGULAR,0006789752,0012487262
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,08:00:00,REGULAR,0006789786,0012487287
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,12:00:00,REGULAR,0006789894,0012487341
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,16:00:00,REGULAR,0006789985,0012487447
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,20:00:00,REGULAR,0006790056,0012487606
The last 2 columns are this turnstile’s entry and exit ‘odometer’. You can see how it skips 3 days and the entry odometer count jumps by 1500. The jump suggests it’s not a scenario where the station was closed for scheduled maintenance, they just missed collecting the data for 3 days.
Is 4-hour granularity even adequate in the age of the ‘Internet of Things’? If you want to schedule trains during the morning rush, wouldn’t you want data down to minute intervals or so? 4-hour data isn’t helpful beyond scheduling the number of trains in the four-hour period.1
Sometimes turnstiles randomly start counting down instead of up. This also happens quite a bit.
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,09:00:00,REGULAR,0000390322,0000202804
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,13:00:00,REGULAR,0000390763,0000203216
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,17:00:00,REGULAR,0000390763,0000203478
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,21:00:00,REGULAR,0592416589,0886336073
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,01:00:00,REGULAR,0592416496,0886336027
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,05:00:00,REGULAR,0592416496,0886336027
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,09:00:00,REGULAR,0592415729,0886335659
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,13:00:00,REGULAR,0592415135,0886335411
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,17:00:00,REGULAR,0592414623,0886335168
It goes on like that for a couple of months, then starts counting up again.
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,13:00:00,REGULAR,0591889116,0886065249
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,17:00:00,REGULAR,0591889096,0886065246
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,21:00:00,REGULAR,0591889071,0886065241
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/19/2022,01:00:00,REGULAR,0591889055,0886065240
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/19/2022,05:00:00,REGULAR,0591889055,0886065240
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/20/2022,17:00:00,REGULAR,0000000026,0000000000
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/20/2022,21:00:00,REGULAR,0000000076,0000000006
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/21/2022,01:00:00,REGULAR,0000000080,0000000006
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/21/2022,05:00:00,REGULAR,0000000080,0000000006
One can conjecture that maintenance got done and the counter got reversed, and then eventually more maintenance got done and it got flipped back. Of course, you can just take the absolute value of the difference. But there are a lot of these rollovers where you just have to drop the row.
This is just scratching the surface. Inconsistently named/nonexistent stations, you name it. If your data gets assigned to data science classes the world over as the world’s messiest data set, you’ve got problems.
And it gets worse. In recent data, the entries look significantly undercounted relative to exits.
Before the pandemic, exits ran 20% or so behind entries. This makes sense, there’s a big crush when a rush hour train arrives and a lot of people bypass the turnstiles and exit via the ‘emergency’ exit gates which don’t count each exit.
During the pandemic, entries ran about even with exits. Interesting, maybe cracking down on gates with signs and alarms and locks narrowed the gap and maybe fare evasion increased?
Since the pandemic, the ratio has flipped, and entries are running 30% behind exits.
If we rule out the spontaneous generation of New Yorkers in the subway, it’s hard to avoid the conclusion that the entries are undercounted, and the data is off. If use of the exit gates had been cut in half, it would take about 40% fare evasion to explain the gap, and from daily experience and the MTA’s own numbers (13.4%) that’s just not happening.2 Maybe software rot, something changed in the environment that software didn’t account for, or a bug got introduced.
No normal company could run their business with data this bad, and if you are publishing data this bad, it makes me skeptical of all your data and decision-making based on it.
I have been around for a long time. I have infant memories of a subway like this. I grew up in this New York. The Warriors was a documentary. (Kidding, never seen it and I’m more of a French Connection/Serpico/Taking of Pelham 1-2-3 guy).
The MTA has gotten MUCH better. Almost graffiti-free, AC on the trains, frequent service, pretty safe and reliable.3 And open data that any fool on the Internet can use, and find fault with.
New York at its best is a monument to proletarian and oligarch alike.4 They mix in the subway. It’s the city’s blender and beating circulatory system. When all the races and classes are packed like sardines, people have no choice but to tolerate and get along. Car culture is the opposite, you build a little bubble to get to your gated community and personal domain where you are total master. Nothing wrong with wanting a little suburban paradise. But to a dyed-in-the-wool New Yorker, there’s an element of self-delusion. It doesn’t scale before gridlock sets in, and there isn’t the opportunity for dense social networks and rich interactions. There’s a reason the best and the brightest from the four corners of the world flock to NYC.
New York is always changing, and currently, it’s changing in ways that make the subway somewhat less relevant.
We can see in the data that ridership dropped massively and the central business district stations dropped the most. Less clearly, one can glean that silk stocking district stations dropped a little less, maybe people still go out but travel to work less. Traffic at outer stations dropped even less as essential workers still need to travel to work.5
NYC Subway Stations, Size=Entries per day 2022, color=%change from 2019