This is a really great lecture! Julie Cohen manages to touch upon almost everything I am interested in, in about half an hour.
Archive for the 'personalization' Category
Google has just launched a new option for its search results, in the United States only. United States based users can (through their Google account) edit a personal profile to be shown in Google’s search results. Google presents the move as giving users more control over what people find if looking for them:
To give you greater control over what people find when they search for your name, we’ve begun to show Google profile results at the bottom of U.S. name-query search pages.
The move makes sense in a number of ways. It gives users a (limited) remedy against a bad ego- search page. Users enrich their profiles because they need to add enough information to the profile for the profile to be shown. And users also help Google to do better people search.
Maybe it’s not surprising that the feature remains limited to United States users. People search is controversial under European privacy laws (p.13-14). Google usually defends itself against application of data privacy rules with regard to personal data in their search results with the argument that it is a passive intermediary (p. 4) – Google’s own link is broken . In particular it argued that:
[...] the Google search engine is not responsible for the creation of content on the web, nor are its search results intended to form a profile of any individual. Rather, Google responds to user search queries with links to what appear to be relevant pages.
Of course, that isn’t entirely true. Google knows it when you look for a natural person, and tries to return relevant results too. It’s irrelevant if the processing is automated. In a recent law article (in Dutch) I discuss these issues in more depth. In particular, I point to the outdated media exception in Article 9 of the Privacy Directive:
Member States shall provide for exemptions or derogations from the provisions of this Chapter, Chapter IV and Chapter VI for the processing of personal data carried out solely for journalistic purposes or the purpose of artistic or literary expression only if they are necessary to reconcile the right to privacy with the rules governing freedom of expression.
In my opinion, this exception should be extended to cover some of the activities of internet intermediaries and search engines. The data privacy directive, in its current form, is ill-suited to govern the public processing of publicly accessible personal data. The principles make a lot more sense for people search activities.
Daniel Brandt makes some interesting points about the data processing that is going on, and in particular the possibility of integration of DoubleClick and AdSense data collection. Seth Finkelstein makes a great point about the cleverness of Google’s pr about its ‘surveillance as a service‘:
If Google can convince people its surveillance is merely a warm and fuzzy way of helping you shop, while ISPs’ surveillance is akin to warrantless wiretapping, that gives Google an enormous advantage in collecting information to sell to advertisers
Last week, Google announced it will start to offer what it calls interest-based advertising through its network of AdSense partners and on YouTube. With the move, Google taps further into its unequaled database of Web behavioral data by end-users, aiming to increase the economic value of the advertisement space for its AdSense partners, and using the same to monetize traffic on YouTube. The use of the database for YouTube is maybe least remarkable considering Google’s problems to make money on the leading global video platform. Some of the features of the program for end-users are remarkable and positive from the end-user’s perspective but it is important to acknowledge their limitations.
Relation with acquisition of DoubleClick
The move is partly a result of Google’s acquisition of DoubleClick, one of the biggest players in the field of online advertising that used behavioral targeting for many years. The new service seems to use some of DoubleClicks technology, including the cookie that is used to track end-user behavior. Google has been less clear about the data collection architecture. Does the use of one cookie for tracking imply that the underlying database of click-streams on the Google AdSense network and on DoubleClick customers have been integrated or are ready to be integrated?
Users in control
Google’s interest-based advertising service has been praised because it offers end-users access and control over their profiles and offers an opt-out. True, this is a remarkable move, as no competitor in behavioral targeting was doing this yet. Most competitors do not place as much emphasis on their relation with end-users as Google does. By putting users in control, Google strikes a new balance between the interests of advertisers and content producers on the one hand, and end-users on the other hand. It will be interesting to see if DoubleClick will make a similar move towards end-users.
Still, I am skeptical how substantial these controls really are. First, end-users only get access to the tip of the iceberg of the technological and behavioral data-processing architecture. Consider this quote from Search Engine Land about a Q&A with Google:
[C]an an advertiser pass along a specific ad to a specific user? For example, can I show an ad for the Sony HDR-XR200V if this user added the Sony HDR-XR200V to their shopping cart on my site but did not check out? Bender said yes, but ultimately it is up to the advertiser how specific they want to get with those ads.
That means that advertisers have more control over targeting than end-users do. I would be able to access and control my interest categories, such as the category “Video Players & Recorders”. Advertisers and e-commerce sites that use the program can reach me through much more granular controls facilitated by Google. To some extent, the control and transparency is merely a façade, behind which a (for the end-user) opaque sophisticated data processing architecture is doing the real work.
Opting out – of what?
Of course, there is the option of opting out through a special cookie and Google has designed (with the help of EFF) a browser plug-in to ensure that opt-outs are persistent for end-users that regularly delete their cookies. An opt-in model is not considered to be economically feasible. I would not be surprised if research would show that expected opt-out numbers would be around the same level as expected opt-in. The large majority of end-users will simply not notice anything of the targeting based on their browsing. You can make as many videos as you want, there is a limit to the number of people you will be able to reach if you do not force them to listen before making them subject to certain treatment.
Apart from the many shades of gray between an opt-in and an opt-out, we should ask ourselves what the offered opt-out really means. Does it mean that Google stops to target ads based on a profile of the interests of end-users, which is derived from the navigational history of end-users? Yes, it does. Does it mean that Google will stop to collect those same click streams? No, I do not think so. These click streams will still end-up in Google’s database, (without a unique cookie id). Google will still show ads, and it will still need logs for its AdSense accounting, click fraud prevention, service management and research. In addition, it’s hard to imagine opting out of Google’s immense network of services in way that does not allow these logs to be correlated with individual end-users. In other words, the opt-out only touches a tip of the iceberg of data processing that is taking place.
Today, the European Court of Justice issued its judgment in the case Ireland v. the European Parliament and Council. The Court concludes that the Data Retention Directive (2006/24/EC) relates predominately to the functioning of the internal market, so it was necessary to adopt it on the basis of Article 95 EC Treaty.
The Court makes clear at the outset that its judgment concerns not the question whether the Directive violates fundamental rights such as the right to privacy. It bases its judgment about the appropriateness of the legal base on three arguments, each of which seems enough (for the Court) to come to that conclusion:
It’s not too hard to comment on the ruling because I am not very impressed by its logic. Since I have already commented on some of the main arguments, which are informed by the Opinion of the Advocate General, I will restrict myself to one main point, that is the implications of this ruling for the question whether the directive is a violation of fundamental rights.
Although it is true that the Court was not asked directly to rule on the interference of blanket data retention with fundamental rights, the Court’s complete separation of that issue from this case is striking. In fact, Slovakia directly claimed the Directive could only be a third pillar measure because the interference could only be argued to be proportional in view of the fight against crime and terrorism.
It is questionable whether such far-reaching interference may be justified on economic grounds, in this case the enhanced functioning of the internal market. The adoption of an act outside the scope of Community competence, the primary and undisguised purpose of which is the fight against crime and terrorism, would be a more appropriate solution, providing a more proportionate justification for interference with the right of individuals to protection of their privacy.
The Court decides to separate these issues. The Commission had stated that “the reference to the investigation, detection and prosecution of serious crime falls under Community law because it serves to indicate the legitimate objective of the restrictions imposed by that directive on the rights of individuals with regard to data protection.” The Court does not address this specific question explicitly but states that “the action brought by Ireland relates solely to the choice of legal basis and not to any possible infringement of fundamental rights arising from interference with the exercise of the right to privacy contained in Directive 2006/24.” Implicitly, it seems to agree with the Commission and the AG (who had adopted the Commission’s position on this matter).
If we combine this argument with the Court’s conclusion that the directive is not about access to the data, the result is striking. The references to the investigation, detection and prosecution of serious crime in the directive no longer serves as a restriction with regard to the purposes of the retained data but merely as an indication that national law can legitimately retain these data for that purpose. Hence the directive does not obligate the member states to restrict lawful access to certain cases, but it also does not obligate them to provide access in certain cases. The preliminary ruling of the German Constitutional Court is thereby legal under European law.
However, it is clear that merely giving an indication of the purpose of an interference is not enough to respect the proportionality and subsidiarity required by Article 8 ECHR. Interferences need to be narrowly targeted. Thus access to the data need to be restricted in some manner, depending on the line that is drawn as a result of this test. The lack of access restrictions in the directive moves the burden to establish the proportionality and subsidiarity entirely to the member states. In my opinion this significantly weakens the already weak case for the proportionality and subsidiarity of the European legislature’s interference with fundamental rights through the enactment of the Directive.
Following up on my last post and trying to answer the question Siva Vaidhyanathan is asking: Should we care about Google’s First Click Free and the possible centralizing force of Google’s policy in this regard?
If there is evidence that Googlebot plus users are treated ‘better’ than others I would say it is problematic. There does not seem to be such evidence and I can think of no obvious incentives.
More advanced, what Google could (try to) do is to ‘contract’ with publishers for special treatment of some users by selecting them. This would allow Google to extract rent from the knowledge about its users, selecting users that are more likely to be turned into a paying customer for the publisher. However, Google does not do so or be willing to do so. This is more the kind of behavior it might be showing on a site like YouTube.
I would love to see evidence if exclusion of websites for certain users happens on a global scale, for instance Google.com using Europeans finding a payment form for a US online newspaper and US citizens finding a First Click Free version. I can imagine Google might make an exception in these cases and not consider this cloaking, but I have to find the answer still.
In some countries (France being an example I believe) there seem to be special deals between Google and publishers in the context of Google News. I have never seen or heard about the details of these contracts. There is an initiative (ACAP) in Europe that has developed a more detailed robots instruction protocol that tries to solve some of the conflicts between search engines and publishers, in my opinion not always in the interest of users. It is based on the idea that robots.txt relates to copyright licensing.
Finally, opaque personalization and geo-targeting of results has in practice done what Carr seems to point to. There is no baseline and there is not one Web for people using Google. Today for instance I found that Google.com shows different results for [mccain] in the US and in the Netherlands. The french fries / potato corporation mccain shows up quite prominently in the Netherlands but not in the U.S., where I am currently located. I plan to post some screen-shots later.
Just a few loose thoughts. In two interesting posts on his blog, Nicholas Carr writes about the centripetal forces towards Google. In the case of Google’s new First Click Free policy, for comments see this post at Google Blogoscoped (incl interesting discussion with Matt Cutts in comments), Google defines it policy in such a way that it enforces the centripetality towards its search operations.
Seth Finkelstein in a related post and comments states that it’s plausible that the highly ranked Wikipedia links in Google suck away some attention from better specialist results. I think I can agree with that. Google + Wikipedia is simply the sum of least possible efforts possible, explaining the prominence of the combination. I am not sure if we have to consider this to be a problem.
First, one has to be somewhat knowledgeable to be able to find and understand ‘specialist’ reports. That takes effort on the side of users, and I do not think a search engine could easily take away the need for that effort.
Second, attention may be the measure for success in terms of the market for eyeballs. From the perspective of real debate and valuable information exchange, such attention needs to be qualified however. Was attention meaningful, did the reader learn something new or used the information for subsequent action? Maybe 10 readers is sometimes better than 50,000. By measuring in terms of eyeballs just seems to explain Google’s choices for popularity as a measure of quality.
I am quite worried about the possibility of publishers ‘speaking’ to Googlebot and Google users differently than to other bots and users and in fact about all such intensified formal interaction between online publishers and Google. The idea that a newspaper gets to choose who to speak to in such a way just seems entirely illegitimate to me. It’s an interesting legal question whether it is legal in such cases to identify oneself (falsely) as a Googlebot or a Google user, if that is technologically possible. The question becomes different in case newspapers get paid for privileged access for Googlebot and Google users. That changes the character of Google’s service.
The New York TImes BITS blog has an interesting piece on BT‘s system PHORM. They are starting a test with 10.000 customers (They tested before but now will do it properly I guess). The service is called Webwise. It consists of their PHORM web traffic profiler, about which the notice for customers is rather vague. On top of that it promises to protect BT’s customers against fraudulent or ‘phishing’ websites.
Why on earth did they bundle these services? (As far as I can see you cannot use them seperately.) They are totally different in character and altogether unrelated. This is one of the worst ‘opt-ins’ I have ever seen.
Bernhard Rieder points to an interesting interview about a not so surprising recommendation technology rolled out by Digg. The recommendations engine compares a user’s ‘digging’ of stories to the digging of others, while stories are being categorized in a variety of categories. It recommends stories that are digged by the users that are digging most similarly, in the categories you digg in. (I think this must be it) The part in the interview about the “editorial” is indeed intersting. Kast states that Digg users do not want anything “editorial”. They want to choose by themselves.
I wonder if that is correctly interpreted (I think they don’t like this kind of stuff), but it is remarkable anyway.
Why people would people not like editorial? If I compare my first experience in an American diner (being asked twenty questions wwhen ordering a simple hamburger) to one of the best meals I ever had in a restaurantt in Rome – (no menu, one price, no choice ) i know the model that I like. There is a certain sense of overestimation of oneself involved in not liking editorial and at the same time you underestimate your ability and knowledge to find the best, and not just something.
But, the problem is that editorial is used in the sense of ‘ open’, ‘no human involvement’ and ‘automated’. I have become increasingly convinced that the control over the selection process of these systems is still editorial. It might be distributed (over users) oepn (no preventive check) and automated (by algorithms) and the responsibility of the platform provider might be different from publishers for a number of reasons, but the service is editorial none the less. Digg has simply created an automated voting system for what goes into the newspaper. That is one model. The expert journalist gatekeeper model is another and there are many others and models in between. I still have to study more how far newspapers online are automated in their personalization, but the readers web experience is surely trying to be optimized as well.
Finally, the recommendation of stories that you could have digged yourself might be inefficient, like the recommendation of possible friends on Facebook that are already your friends. You dont need Facebook for that, do you? It is an excellent example of the Daily Me aspect of some online services, the idea that we are creating echo chambers, information cocoons, and that our society will be doomed because of fragmentation (joking). I do not believe this argument, but I do believe that some of the distributed automated editorial control is very mediocre.