What you’ll learn in this article:
- Sidewalk Labs’ Replica software is now its own business subsidiary, Replica Inc.
- The contract between Replica and Portland Metro restricts access to Replica source code or algorithms, which could prevent the government from conducting an impact assessment.
- The contract details regarding unique travel pattern data concerned an Electronic Frontier Foundation staff attorney.
- The contract states that Portland Metro will not object if Replica contests public records disclosure of system documents.
- After the relationship is terminated, there is no indication that Replica will delete the models it built using the city’s data, or the insights or analysis derived from it.
UPDATE: A followup comment from Sidewalk Labs was added to this story on 7.30.19.
In light of the ongoing controversy over the Sidewalk Labs neighborhood takeover in Toronto, people have lots of questions about what information the company has now and what it will have access to as a result of its relationships with municipal clients. These data privacy and policy questions remain mostly unanswered, but today we have a glimpse into how the firm — a subsidiary of Google-parent Alphabet — and one of its municipal clients handle these issues.
We have a Sidewalk Labs contract.
In particular, it’s an agreement between the company and Portland, Oregon’s Metro. Portland plans to use Sidewalk’s city mobility tracking software, Replica, and Metro is the government agency handling the contractual and technical aspects of the relationship. RedTail’s Kate Kaye obtained the contract via public records request. (Read the full contract here.)
If the Sidewalk Labs initiative in Toronto is a gargantuan urban innovation cruise ship afloat in a sea of data, the Replica project in Portland is a mere dinghy by comparison. The official Sidewalk proposal for Toronto has been published as a 1,500-plus page technicolor tome. Meanwhile, Replica in Portland has received little attention.
But, the nearly half-a-million-dollar Portland deal gives Metro and two Portland city agencies — Portland’s Bureau of Transportation and transit agency TriMet — access to a substantial dataset that mirrors how people actually move throughout the city and its surroundings, one the city says it can’t get elsewhere. If you haven’t read about Replica in Portland in Kate’s Geekwire feature or companion piece, do that to get a richer understanding of this contract analysis.
Despite earlier expectations of getting Replica data sometime this month, Portland government agencies have yet to access any Replica data or analytics. The algorithmic models used to make Replica’s synthetic population move throughout the software’s virtual Portland-esque environment are still being calibrated, so it looks like testing will not start till late August or early September.
Replica now operates as its own subsidiary business unit of Sidewalk Labs.
No Longer Just Software, Replica Is Now a Company
An interesting note about the contract: It turns out the agreement is not between Sidewalk Labs and Metro, but rather between Replica and Metro. Indeed, no longer is Replica just the name of the company’s software. Nick Bowden, Sidewalk’s head of model and now Replica CEO, established Replica, Inc. in March.
Though both Sidewalk Labs and Metro provided some background for this story, Sidewalk Labs would not share a reason why Replica now operates as its own subsidiary business unit of Sidewalk. (So, we’ll have to wait to learn why, according to the California state document establishing the firm, Replica’s business provides “quality fish reproductions.”)
Let’s get on to the contract analysis. Here are excerpts from the contract that highlight what stood out in relation to algorithmic transparency, data privacy, access and ownership, and public records requests.
Obscuring Algorithms that Influence Government Policy
Replica uses algorithmic models to enable its synthetic population system. Those algorithms are at the core of Replica’s process which will help inform how city agency decision-makers understand travel patterns of people living here.
The contract prevents Portland agencies from poking around its algorithms.
Portland will use it to better gauge how people use different types of transportation in a variety of scenarios. They’ll use it to get information, reports and analysis that helps them make transit decisions. For instance, a Metro or Bureau of Transportation staffer might use it to get insights into how ride-sharing affects traffic congestion. Or staffers at Portland’s public transit agency TriMet might look to Replica to tell them how the expected closures of three light rail stops affect travel options for minorities or low-income people commuting from areas east, far away from the downtown surroundings that Portlander’s call “close-in.”
But the contract prevents Portland agencies using Replica, or people evaluating the system’s performance and impact from poking around its algorithms.
It specifically states that “Metro will not attempt and will not, directly or indirectly, allow Metro’s Users to…disassemble, reverse engineer, attempt to derive the source code or underlying ideas, algorithms, structure, organization or data” associated with the Replica system.
Why does this matter? Well, this is the sort of language that could prevent governments and those affected by its decisions from assessing the impact of algorithmic and automated systems that influence policy. This sort of language already has slithered its way into trade deals (see RedTail’s story about algorithmic opacity in the new NAFTA deal), and likely is popping up in all sorts of AI technology contracts with governments.
Will it be possible to assess the impact of AI if governments agree not to attempt to derive source code or algorithms from the world’s most lawyered-up tech giants?
AI Ethics advocates want algorithmic transparency, meaning they want the steps and processes that enable AI to be made visible enough to inspect and understand them, particularly when they lead to decisions that have questionable or negative consequences (Read RedTail’s feature about obstacles to algorithmic transparency.)
AI ethics groups and Oregon’s own Senator Ron Wyden have called for governments and corporations to implement algorithmic impact and accountability assessments. But those require some access to the algorithms and the data used to train them.
It does not appear that Portland has conducted an algorithmic impact assessment of Replica, though the city will test it do determine whether its data can be used to reidentify individuals, as well as to evaluate how it compares to other data sources public agencies here already use.
The language in this contract is a sign of things to come. Will it be possible to assess the impact of AI and automated decision systems if municipal governments agree not to attempt to derive source code or algorithms from vendors, especially when those vendors are the world’s most lawyered-up tech giants?
Personal Information and the Risk of Reidentification
Replica employs mobile location data along with demographic information and data from city partners such as public transit data. Among the most pressing concerns surrounding Replica and Sidewalk’s other projects more broadly, is whether the data it uses and provides could be reidentified to expose the people associated with the information.
The Metro contract does not go into detail about what types of data Replica uses to build its models; but some of those details are addressed in these documents, as well as Kate’s Geekwire story. In addition, the contract does not provide any information about the methods or processes Replica employs to deidentify otherwise-personally-identifiable information.
But it’s clear from reading the contract that Sidewalk Labs and its new Replica subsidiary want to make sure they don’t supply government partners with personally identifiable data and that those partners do not tamper with anonymized data to try to personalize it.
Sidewalk Labs and its new Replica subsidiary want to make sure they don’t supply government partners with personally identifiable data and that those partners do not tamper with anonymized data to try to personalize it.
The contract specifies that Replica reports and analysis provided to Metro will be deidentified, and requires that it should not be used in conjunction with personally identifiable information. As noted in the Geekwire story, when Metro finally can access Replica data, it will test whether it can be used to identify people, and if it can, the agency says it will terminate the agreement.
The contract addresses this testing. It prohibits Metro from using Replica data in an attempt to identify or personalize data subjects, “except for the purposes of ensuring that Services and Content adequately safeguard residents’ privacy during validation testing.”
So, what is personally identifiable information according to Replica? The contract puts stuff like name and contact information, biometric information and license plate numbers in this category. Yet, like pretty much every other corporation gathering or using it, mobile location data is not considered personally-identifiable.
More Mobility, More Privacy Problems
There are lots of privacy concerns swirling around certain forms of location data, such as the kind reflecting so-called micro-mobility options like ride-sharing or escooter trips. The contract directly addresses unique travel patterns, as seen in the highlight below.
However, Jamie Williams, a staff attorney at the Electronic Frontier Foundation questioned the contract’s details, and whether location data used by Replica could include individual trip patterns, such as data reflecting escooter trips, which could reveal unique travel patterns and be employed to reidentify people. “We’re OK with them getting aggregate and deidentified, down-to-the-street-level data, but we don’t want individual trip patterns,” she said.
It’s that term “output” that has Williams wondering.
“We’re OK with them getting aggregate and deidentified, down-to-the-street-level data, but we don’t want individual trip patterns.”
– Jamie Williams, Electronic Frontier Foundation
Without getting too in-the-weeds, the contract refers to Replica “output,” “content” and “services” as separately defined components of the product. We might think of the output as the stats, charts, graphs, analysis and insights that Replica spits out when a city agency staffer queries the system. However, it’s not clear why Replica addresses removal of unique travel patterns or demographic data points from the output, but not from the content or services.
If Replica Contests a Public Records Request for Confidential Data…
The contract allows Metro to display aggregate information from the Replica system on websites and public documents. Think of a city report or press release featuring a data visualization showing the types of transport people use in a certain city area during morning rush hour compared to the afternoon. In Portland, city agency staff are developing best practices and policies that would determine whether Replica data would ever be available in any form as open data.
The contract also permits Metro to disclose confidential information such as business plans, technical information and designs, under Oregon Public Records Law:
This contract itself was made available to RedTail via a public records request. As governments incorporate more and more data-harvesting tech, they are trying to balance open data transparency goals with threats of unwanted data exposure via public records and Freedom of Information Act requests. Governments want to provide valuable data that could inform citizens and be used to improve life in their cities. What they don’t necessarily want is to be on the hook if tech vendor competitors, courts, law enforcement, or — gulp! reporters — seek otherwise hidden information.
Most of what’s in this public records section of the contract is standard, said Hannah Bloch-Webha, an assistant professor of law at Drexel University who focuses on first amendment internet law and technology. But then there’s that line about Metro not objecting “if Replica wishes to contest disclosure of the documents.” That “is problematic” said Bloch-Webha.
“There is this willingness to sort of yield this responsibility to the vendor. So I’m not saying it’s unusual for them to do it. I just think it’s really bad.”
– Hannah Bloch-Webha, Drexel University
“I think a lot of governments don’t feel like they can make informed decisions about the data or the information,” she told RedTail. “They don’t really understand the landscape that they’re contracting in very well, and so there is this willingness to sort of yield this responsibility to the vendor. So I’m not saying it’s unusual for them to do it. I just think it’s really bad.”
Eliot Rose, technology strategist at Metro said the language does not mean Metro has waived its right to disagree with Replica about what should be disclosed. “If Metro believes that material requested should be disclosed under Oregon public records law we can elect to disclose that material,” he said. “What this language is saying is that, if Replica disagreed with our decision to disclose material, we wouldn’t object if they wanted to make an argument against disclosure directly to the district attorney, but there’s nothing in the agreement that prevents us from coming to our own determination about what should be disclosed.”
“There’s nothing in the agreement that prevents us from coming to our own determination about what should be disclosed.”
– Eliot Rose, Metro
No Mention of Law Enforcement Access
What about police or court access to Replica information? This is a big concern for privacy and civil liberties advocates who worry that location data can be requested by law enforcement agencies even without probable cause of illegal activity.
World Privacy Forum Executive Director Pam Dixon is quoted in the Geekwire story, stating, “People who are moving within cities need to be able to trust that their data is truly de-identified and will never come back at them — for example, through a judicial process or law enforcement process — and would not be used against them in a discriminatory or unfair manner.”
As reported in that story, a Portland Bureau of Transportation spokesperson said the contract with Sidewalk Labs prevents law enforcement or any entities other than Metro, Portland Bureau of Transportation or TriMet from accessing Replica software or data. But it turns out the contract does not specifically mention the police department, law enforcement or the courts. Here’s how it addresses who gets to use the software:
Data Ownership and Replica Value Extraction
Knowledge is power, and it increasingly comes in digital databases. When governments partner with tech firms, citizens want to know who has control over the data that not only might inform policy, but has personal and monetary value.
The agreement with Metro grants Replica and its contractors a limited license to use Metro’s data and program code to ensure the Replica system works. It also stipulates that Replica has no right, title or interest to any of Metro’s data, which is defined as “electronic data and information submitted by or for Metro to the Services, other than by REPLICA and excluding Content and Non-REPLICA Applications.”
Replica’s business model relies on the proprietary information municipal clients hand over, which is used to evolve and improve the company’s algorithms. Ultimately, that’s where the value lies for Replica and Sidewalk Labs — and that is not addressed in the contract.
In fact, it requires Replica to “delete or destroy” Metro’s data after the partnership ends:
But, and this is a big “but” — that’s just the data that Metro provided to inform Replica and its synthetic population models. There’s no indication in the contract that Replica will delete the models it built using that data, or the insights or analysis derived from it.
In effect, this company’s business model relies not only on payment by city agencies, but in part on the proprietary information those municipalities hand over, which is used to evolve and improve the company’s algorithms. Ultimately, that’s where the value lies for Replica and Sidewalk Labs — and that is not addressed in the contract.
7.30.19 UPDATE: After this story published, a Sidewalk Labs spokesperson refuted the notion that the company’s business model is partially reliant on government data. In an email he wrote, “This data is used only to validate the model, it is not the basis of the model that Replica provides, nor does it ‘train’ the model. The business model is not based the value of this city data.”
There has been no response yet to a subsequent question from RedTail asking how the company defines its business model.
Again, please read the full contract for a more comprehensive view, and if anything not mentioned here stands out to you, please add a comment below, or send an email to Kate at RedTailMedia dot org.