The U.S. Department of Transportation claims that vehicle-to-infrastructure (V2I) communications will preserve the anonymity of drivers. Specifically, the department claims
The system—operated by private entities—will not permit tracking through space or time of vehicles linked to specific owners or drivers or persons. Third parties attempting to use the system to track a vehicle would find it extremely difficult to do so, particularly in light of far simpler and cheaper means available for that purpose.
In post, I assess that claim using the Basic Safety Message data from ongoing V2I pilots. The bottom line is that this data can be used to recreate individual trips contrary to the claims of the DOT.
95% of the vehicles sold today are “connected vehicles" equipped with hundreds of sensors and over a hundred computers to process and package the data that they collect. The output is 25 gigabytes of data per vehicle per hour sent to the manufacturer. This includes not only information like maintenance status, but also location, speed, heading, braking instances, and road conditions.
Smart infrastructure seeks to use a portion of that data, in real time, to promote traffic safety. This data is limited to location information and basic “telematic” data like speed, heading, and brake and transmission status. The car uses short-range radio communications to send that data to a piece of infrastructure, such as a traffic light, bridge, overpass, highway exit, or crosswalk. Once the infrastructure receives that data, it can determine if there are any hazards to that vehicle. If so, the infrastructure “talks back” to the vehicle to relay important safety information.
Proponents of V2I communications repeatedly emphasize that these communications are anonymous. Consider the following claims:
According to the National Highway Safety Administration, “[t]here is no data in the [basic message] exchanged by vehicles or collected by [connected-vehicle systems] that could . . . personally identify a . . . driver.” This is due to the format of the data exchange. According to the Administration, that format ensures that “tracking a specific car or driver based on [basic messages] would be both difficult and costly.” The result is a “very limited potential risk to individual privacy.”
According to the Department of Transportation’s Intelligent Transportation Systems office, the basic message does not “contain data that is reasonably, or as a practical matter, linkable to you.” “Third parties attempting to use the [connected-vehicle] system to track a vehicle would find it difficult to do so, particularly in light of simpler and cheaper means available for that purpose.”
According to a legal commentator, the basic message “is not identified with regard to any particular vehicle or person, [so] the task of re-identification would be particularly difficult, time-consuming, and costly. Securing a judicial warrant to install a GPS device on a suspect's vehicle . . . would almost certainly be less expensive and less burdensome.”
But idea that V2I communications “will not permit tracking through space or time of vehicles” or “cannot be used to recreate accident scenes” seems dubious.
Criminal investigators know that that there is no such thing as truly anonymous data, so they are watching eagerly to see how V2I technologies develop. Indeed, the data transferred through V2I communications suggests that deanonymization is trivial and the privacy risks are real. The basic message that each vehicle transmits includes telemetry and location data. It pairs that information with a temporary identification number for the vehicle. According to the industry standard for V2I applications, each connected vehicle broadcasts this data unencrypted every ten milliseconds. In theory, this information can be tied to a specific vehicle to recreate its trip through connected infrastructure.
In practice, it is almost trivial to trace a single vehicle's trip through smart infrastructure using the data that it sends.
The data that I'm using below comes from a V2I pilot in Tampa Bay.
First, we look at the data.
Notice that all datapoints have a field called coreData_id. This is a temporary identification number that the V2I standard dictates should change periodically to prevent tracking. We can sort values to identify the vehicle with the most data.
We'll plot this one to recreate its trip.
Here is the plotted route in full.
As you can see, you can recreate a good deal of information about a single vehicle's trip using this data. This data alone covered a period of 30 minutes.
Even more information is revealed when you plot by vehicle size. You may have noticed above that each message also sends with information about the size of the vehicle. This is measured in centimeters. It's easy to imagine why this is important information: An 18-wheeler is going to have a harder time coming to a stop than a Ford Rio, and a traffic light may want to know which type of vehicle it is dealing with if it is going to issue a warning that the vehicle should brake.
But a vehicle measured down to the centimeter gives unqiue data that aids deanonymization. Here, we can isolate all the message data coming from vehicles of the same size, match them together, then plot them.
Again, a single vehicle's trip is readily apparent after a little cleaning. This time, even more data is revealed.
Basic inferences from the data enable deanonymization.
Each of the basic messages plotted above state that the vehicle has a width of 254 centimeters and length of 1,250 centimeters. This confirms that we are dealing with the same vehicle.
Those dimensions are much larger than a passenger vehicle. This suggests that the vehicle is a commercial truck or bus.
The vehicle pulled to the right side of the road to stop at a midblock location for 50 seconds. That could be a delivery truck, but it could also be a bus.
The vehicle stopped at that location at 6:28 a.m.
The city transit authority website shows that two bus routes pick up from that location within two minutes of 6:28 am. We can compare those routes against the remaining locations sent by the basic messages.
We can conclude that this vehicle was the Hillsborough Area Region Transit number 9 bus.
This level deanonymization is far from “difficult and costly.”