NYC Subways and the Terrible, Horrible, No Good, Very Bad, Turnstile Data
The future ain’t what it used to be. - Yogi Berra
MTA turnstile data is shockingly bad, in two ways.
First, ridership is down a lot, with far-reaching implications.
And also, the data is really poor quality. You can’t run a railroad with data this bad. Maybe they have something better internally. But if so, no reason to publish embarrassingly bad data.
We have data for each turnstile at 4-hour intervals. You don’t always get data consistently every 4 hours though. Check out Astoria Boulevard.
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,00:00:00,REGULAR,0006787742,0012484763
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,04:00:00,REGULAR,0006787746,0012484793
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,08:00:00,REGULAR,0006787934,0012484831
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/27/2022,12:00:00,REGULAR,0006788217,0012484886
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,00:00:00,REGULAR,0006789742,0012487200
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,04:00:00,REGULAR,0006789752,0012487262
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,08:00:00,REGULAR,0006789786,0012487287
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,12:00:00,REGULAR,0006789894,0012487341
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,16:00:00,REGULAR,0006789985,0012487447
R514,R094,00-00-00,ASTORIA BLVD,NQW,BMT,04/30/2022,20:00:00,REGULAR,0006790056,0012487606
The last 2 columns are this turnstile’s entry and exit ‘odometer’. You can see how it skips 3 days and the entry odometer count jumps by 1500. The jump suggests it’s not a scenario where the station was closed for scheduled maintenance, they just missed collecting the data for 3 days.
Is 4-hour granularity even adequate in the age of the ‘Internet of Things’? If you want to schedule trains during the morning rush, wouldn’t you want data down to minute intervals or so? 4-hour data isn’t helpful beyond scheduling the number of trains in the four-hour period.1
Sometimes turnstiles randomly start counting down instead of up. This also happens quite a bit.
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,09:00:00,REGULAR,0000390322,0000202804
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,13:00:00,REGULAR,0000390763,0000203216
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,17:00:00,REGULAR,0000390763,0000203478
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/23/2019,21:00:00,REGULAR,0592416589,0886336073
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,01:00:00,REGULAR,0592416496,0886336027
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,05:00:00,REGULAR,0592416496,0886336027
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,09:00:00,REGULAR,0592415729,0886335659
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,13:00:00,REGULAR,0592415135,0886335411
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,07/24/2019,17:00:00,REGULAR,0592414623,0886335168
It goes on like that for a couple of months, then starts counting up again.
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,13:00:00,REGULAR,0591889116,0886065249
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,17:00:00,REGULAR,0591889096,0886065246
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/18/2022,21:00:00,REGULAR,0591889071,0886065241
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/19/2022,01:00:00,REGULAR,0591889055,0886065240
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/19/2022,05:00:00,REGULAR,0591889055,0886065240
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/20/2022,17:00:00,REGULAR,0000000026,0000000000
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/20/2022,21:00:00,REGULAR,0000000076,0000000006
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/21/2022,01:00:00,REGULAR,0000000080,0000000006
R236,R045,00-03-01,GRD CNTRL-42 ST,4567S,IRT,09/21/2022,05:00:00,REGULAR,0000000080,0000000006
One can conjecture that maintenance got done and the counter got reversed, and then eventually more maintenance got done and it got flipped back. Of course, you can just take the absolute value of the difference. But there are a lot of these rollovers where you just have to drop the row.
This is just scratching the surface. Inconsistently named/nonexistent stations, you name it. If your data gets assigned to data science classes the world over as the world’s messiest data set, you’ve got problems.
And it gets worse. In recent data, the entries look significantly undercounted relative to exits.
Before the pandemic, exits ran 20% or so behind entries. This makes sense, there’s a big crush when a rush hour train arrives and a lot of people bypass the turnstiles and exit via the ‘emergency’ exit gates which don’t count each exit.
During the pandemic, entries ran about even with exits. Interesting, maybe cracking down on gates with signs and alarms and locks narrowed the gap and maybe fare evasion increased?
Since the pandemic, the ratio has flipped, and entries are running 30% behind exits.
If we rule out the spontaneous generation of New Yorkers in the subway, it’s hard to avoid the conclusion that the entries are undercounted, and the data is off. If use of the exit gates had been cut in half, it would take about 40% fare evasion to explain the gap, and from daily experience and the MTA’s own numbers (13.4%) that’s just not happening.2 Maybe software rot, something changed in the environment that software didn’t account for, or a bug got introduced.
No normal company could run their business with data this bad, and if you are publishing data this bad, it makes me skeptical of all your data and decision-making based on it.
I have been around for a long time. I have infant memories of a subway like this. I grew up in this New York. The Warriors was a documentary. (Kidding, never seen it and I’m more of a French Connection/Serpico/Taking of Pelham 1-2-3 guy).
The MTA has gotten MUCH better. Almost graffiti-free, AC on the trains, frequent service, pretty safe and reliable.3 And open data that any fool on the Internet can use, and find fault with.
New York at its best is a monument to proletarian and oligarch alike.4 They mix in the subway. It’s the city’s blender and beating circulatory system. When all the races and classes are packed like sardines, people have no choice but to tolerate and get along. Car culture is the opposite, you build a little bubble to get to your gated community and personal domain where you are total master. Nothing wrong with wanting a little suburban paradise. But to a dyed-in-the-wool New Yorker, there’s an element of self-delusion. It doesn’t scale before gridlock sets in, and there isn’t the opportunity for dense social networks and rich interactions. There’s a reason the best and the brightest from the four corners of the world flock to NYC.
New York is always changing, and currently, it’s changing in ways that make the subway somewhat less relevant.
We can see in the data that ridership dropped massively and the central business district stations dropped the most. Less clearly, one can glean that silk stocking district stations dropped a little less, maybe people still go out but travel to work less. Traffic at outer stations dropped even less as essential workers still need to travel to work.5
NYC Subway Stations, Size=Entries per day 2022, color=%change from 2019
Reasons the subway may be permanently less relevant:
- Remote work has soared, obviously.
- Mass transit is a pandemic vector. People just don’t want to take the subway as much, even when they decide to leave the confines of their apartments and go somewhere. People who can, walk, drive, hoverboard etc.
- Delivery of everything has soared v. going to Macy’s or (the late great) Century 21 or Queens Center.
- Micromobility has soared, e-bikes are everywhere, along with regular bikes, Citibikes, hoverboards, scooters, longboards, moonwalkers, you name it. I don’t worry about being stabbed anymore but I worry about getting run over by an Uber Eats or DoorDash driver on his phone going the wrong way.
- On the horizon are self-driving vehicles + congestion pricing. I could see giving over 2 lanes on 2nd Ave to electric self-driving vans, where you punch in where you want to go on your phone and a vehicle picks you up and takes you there. (Via or dollar vans + self-driving + dedicated lanes).6 Travel times could plummet. You’ll need traffic enforcers and fare-evasion enforcers and lots and lots of cameras though.
I’m not in the ‘MTA = Money Thrown Away’ faction. But I don’t think projects that require investing $100,000 per daily user or more, like LIRR to Grand Central and the 2nd Ave subway, the Interborough Express (more like $50k but still), are sustainable at current interest rates. The level of graft is mind-blowing. It seems out of touch with reality to expect real estate to pay for those projects with taxes on new $7b office towers, when so much of the existing office space is empty. (Update: and there it is, now on hold) But that’s where things are going, new revenue streams from property taxes and congestion pricing so essential workers can get to midtown without $10 fares.
I don’t think New York as a whole is going to enter a doom loop. Soaring rents and still-strong residential demand suggest otherwise.
But the subway might be falling victim to a doom loop. Fewer riders make it less safe and service less frequent, driving more riders away, perpetuating the cycle. The system has huge debt and interest rates have gone up a lot. So then the system has to raise fares, triggering another downcycle.
There’s a doughnut effect from remote work. People can work anywhere, and a lot of (especially young) people choose New York. And a lot of other people might choose low cost-of-living areas. Close-in daily commuter zone communities, where people lived because they wanted the house but HAD to come into the city every day, get hurt. Also if you need a database admin in Mobile, and everyone there can work remotely for companies in Austin and NYC, low cost places to do business might not be as low-cost but will get an economic boost.
There’s an Instagram effect. The NYSE floor is dead and ready to be converted to an event space. CNBC sends a camera around the floor and traders jerk awake for a few seconds. A few blocks away people crowd around the bull to take pictures. In both cases, the real economic activity takes place online. Software is eating the world. But people still crave a sense of place and real human interaction.
That’s the future of New York, for better or worse. A lot of people will still want to live here. Fewer will commute to gleaming office towers by subway. A lot of the class B office space will get converted to residential. A lot of people are going to get around on e-bikes and scooters.
And the sooner the MTA gets into the Internet era the less painful it will be. Both with realistic economic expectations, and modern data engineering. Any 2-bit online business has something like Mixpanel. Any normal business would go DEFCON 5 to get proper real-time data on their customers for capacity planning and maximizing ad revenue. Data this bad has a strong whiff of institutional rot. Seriously, get your shit together.
No one goes there nowadays, it’s too crowded. - Yogi Berra
-
We don’t even get consistent reporting at regular 4-hour granularity. At least some turnstiles stay on a fixed 4h schedule through the daylight savings time switch. So the timing shifts by 1h vs. human work schedule timing. I haven’t fully run this down. Since most but not all reports are at regular 4-hour intervals starting at midnight, I round the oddball reports to those intervals. Then when I chart the 4 AM hour I see clear daylight savings time ‘seasonality’ artifacts. Upsampling and downsampling would reduce this issue, but not eliminate it. ↩
-
Now there has probably been a significant increase in fare evasion. Times are tough and a $5.50 round trip is a lot for a kid in a low wage job. You do want to enforce it consistently. A lot of stuff in NYC is not enforced consistently. You can go to cops with clear evidence of a crime, like video of a guy violently harassing women or killing dogs, or Airtag evidence that your stolen bike is at a known chop shop, and you will be lucky to get them to do anything. If you go in front of any police station you are going to see the biggest collection of illegally parked personal vehicles in the neighborhood. It’s a tough job and a lot of them are professional. But the 50% of bad cops make the other 50% look bad. There’s a police culture problem. Moar police at high salaries and high overtime retiring after 20 years isn’t going to help. We need more specialized non-cop enforcers like parking enforcement agents, for things like fare evasion. Sending 2-cop teams after fare evaders is uneconomical, demoralizing for cops, and leads to bad interactions. Same for crazy/antisocial people in the subway. ↩
-
Except when you are waiting late at night at Penn and the express arrives on the local track with zero warning and you have to make a dash, miss it and wait another 10 minutes. ↩
-
Also, to civilization. To humanity. I have no truck with yokels who think the Real Americans™ are the semi-literate propane salesmen or meth-heads in Appalachian hollers. And don’t get me started on the white supremacist Y’all Quaeda, Vanilla Isis, American Taliban, ‘Who would Jesus bomb to oblivion’ nutjobs. Which is as true a picture of the heartland as a description of New York as a blood-drenched wasteland from The Purge. Yeah, we see some of the worst of humanity sometimes. But really, the worst are the ones who waste their lives dividing people instead of building something decent. They get all outraged when you call them ‘deplorable’ while they say you are possessed by demonic spirits and they act like you’re their enemy. Don’t act all outraged about urban violence when you’re the one trafficking all the guns to violent and mentally ill people and explicitly disclaim interest in anything but thoughts and prayers to reduce daily mass shootings. ↩
-
Assuming, of course, this is not just a data artifact. Hopefully, the missing entries are pretty evenly spread. If e.g. OMNY messed everything up and OMNY use is much more predominant in Manhattan, could be wrong. ↩
-
I don’t think self-driving will ever work in NYC without changing the rules of the road and giving self-driving cars dedicated lanes. Self-driving is not just a technological problem, but also a human behavior / game theory problem. Suppose the collision avoidance is perfect. Now put it on the roads of NYC or New Delhi. There are a lot of people who will just walk in front of a car going 40mph, if they know for sure it will brake hard and stop. The solution is to change the rules of the road, have protected lanes for self-driving buses and taxis and cars, and enforcement. Let vehicles that can take full advantage of communicating with each other and the road go fast and use the infrastructure to maximum theoretical capacity, without having to worry about dumb human drivers. ↩