now, the real problem facing personal data, is the trade terms is often opaque and unfair. Browse information, send messages, online shopping, you these habits and preferences are digitized in the breaking and entering.
your information what, who have seen them, and what will be the basis of their inferences are made about you, depends largely on the screen at the other end of the various companies, Google, amazon, Facebook, advertisers, advertising deal, data agents, etc. “Today’s Internet is just like a big black box.” An assistant professor of computer science at the university of Columbia scientists led by Roxana Geambasu said, “so we need transparency.”
Geambasu suggests, assistant professor of Columbia’s another Augustin Chaintreau and a group of graduate students in Mathias Lecuyer, come up with a solution tool – the XRay data transparency problem. They will be held this week at the San Diego Usenix security symposium, make reports, and explain their early research results. They will also be released XRay to win a licence to open source software.
XRay is essentially a reverse operation of the machine, simulation produced the correlation of network service. The team initially tried to determine the following three aspects: based on the mail text content and the type of advertising that is shown to Gmail users; Amazon based on user’s wish list, and other data and show the user the product recommendations; YouTube, according to the user’s view history and recommend the video.
the researchers created a few accounts, to enter the email information, search and browse the products. Then they attention to advertising, the recommendation of products and video. Then they simulate the correlation of the input and output, thus can through the network service observation and prediction to the relevant results and behavior orientation.
XRay findings showed that predictability, it even though it’s interesting and disturbing. The correlation between Gmail and advertising, for example. If email mention the pregnancy, users receive advertising will contain “celebration of the invitation is looking for a baby? Can the highest sixty percent discount “, or “Ralph lauren clothing official shop”, etc. Obviously, these ads is a personalized, and therefore will be very useful.
if the mail using the word “such as depression, depressed, sad, shows the theme of the sadness, the user would receive of advertising, such as” call, accept the shaman healing “, and “teach you texting – catch up with the one you love her” and so on. These seem to be able to understand, but also a bit far-fetched.
if email use loan, loan words, shows that a person may have the demand on economy, so users will receive such as “car loans without the guarantor”, “have a bad credit records can auto loans!” The ads. But the truth of these ads are questionable. A recent New York times article reviewed the car loan borrowers, including their subprime credit record and our marketing strategy.
this is not only produced a computer-generated inference is an issue to be decided, also let a person think, how the data is used and Shared. Geambasu suggests that related with depression “shaman healing” advertising, is a widely used related content? For example, if you click on the rest of the text shaman healing ads, you will not be is thought to be suffering from depression?
Geambasu suggests said: “the leaked information, audience could be used for various purposes. It can be used for discrimination. And this is a very covert discrimination.”
a few months ago the White House report on big data just the attention to this problem are put forward. The report calls for to limit companies use personal information collected online.
Chaintreau pointed out that, from the Angle of efficiency and personalized network data collection, targeted advertising and services have obvious interest. “But we want to put the individual to become more transparent.” He said.
Chaintreau said, adding that more and more people understand the risk of big data, are committed to the development of personal tracking network data flow. And they, his brother big team, are a part of this group of people.
at Princeton, for example, a computer scientist Arvind Narayanan was brought up a designed to map data acquisition, reasoning, and the entire network Shared secret project. Closely connected with the MIT media lab ID3, the non-profit organizations are developing open source software, let people can better control their own data, including how to generate personal data by using an audit trail.
although XRay was just a prototype, but experts say early results look impressive. Stanford computer scientists Dan Boneh when evaluating XRay said: “it makes sense.”
with the further development, XRay team hope in one year or shorter time probably give a stronger, more general tools. Team members, said the most likely users is in private organizations, state attorney general’s office, the federal trade commission in tech-savvy workers, and reporters. XRay researchers have received grants, brown institute for media innovation and Columbia journalism school and the Stanford university school of engineering, customize their correspondent development oriented version data monitoring technology.
the electronic frontier foundation, a senior lawyer Lee Tien said the XRay seems promising, is to reverse the trade terms between the consumer and data collection company. , he said, “see what they see, is the first step to strike a balance.”
the Columbia researchers in a report with the theme of the balance: “we work called for and guarantee under the condition of voluntary transparent implement best practices, at the same time let investigators and regulators to provide an important new tools to alert.”
XRay engineering has acquired from the defense advanced research projects agency (darpa), the national science foundation, Google and Microsoft’s fund support.