Archive for May, 2008

Dutch Parliament Opts for 12 Month Data Retention

Thursday, May 22nd, 2008

The Dutch Parliament has opted for a 12 month data retention term in its implementation of the Data Retention Directive. The three party coalition split into three camps, arguing for 6, 12 and 18 month respectively. The Dutch government kept arguing for 18 months, but a majority voted for an amendment lowering the term to 12 months. The proposal still needs to pass the Dutch Senate, which has been rather critical of data retention ever since it has been on the EU agenda.

The 12 month term traces back to a report of the Erasmus University about the usefulness and necessity of data retention for telecommunicatiuon traffic and location data. After failing to prove such usefulness and necessity for data older than 3 months, the researchers had talks with police representatives. Based exclusively on those talks, the report recommended a 12 month retention period. Later on, the Council of State referred to that research and the proposed reasonable term of 12 months when it advised the government to lower the term to 12 months.

Although the debate focused a lot on the retention term, there are many other issues that were debated. One of them, the extent of parliamentary involvement with the contents of the decree containing more details about data retention in practice. The costs were also an issue of debate, but since there are no clear data on the precise scope of the data retention obligation for Internet traffic, the available cost estimates are vague. General costs of data retention will not be reimbursed. The question about storage of the data in centralized or decentralized facilities has been evaded. At first, the data will be stored decentrally but this could change in the future. An amendement that would have restricted the possibility of claiming complete data sets - to be used for data mining in the context of combatting terrorism - didn’t make it. If the law is passed, both national security and law enforcement agencies will have the possibility to claim complete parts of the collection of data to be retained.

How to be transparent about ranking?

Wednesday, May 21st, 2008

Why is Google not more transparent about the ranking of Google search results?

This is entirely our fault, and it is by design. We are, to be honest, quite secretive about what we do. There are two reasons for it: competition and abuse. Competition is pretty straightforward. No company wants to share its secret recipes with its competitors. As for abuse, if we make our ranking formulas too accessible, we make it easier for people to game the system. Security by obscurity is never the strongest measure, and we do not rely on it exclusively, but it does prevent a lot of abuse.writes Udi Manber, in charge of search quality at Google.

Search engine transparency has been a mojor issues in regulatory discussions about search engine bias. I understand the spam problem and I also think that there are reasons not to tell competitors what great ranking tricks you have invented. But I think there is a way out of there. If we talk about the problem of bias in search results it might in fact not help a lot to know the actual algorithms of a search engine like Google. They are too complicated to make any causal connection between the algorithms and bias. Instead, it is much more helpful to know HOW a search engine evaluates results and WHY it makes certain changes. I would argue that both reasons for secrecy given above do not stand in the way of being more open about pinciples for evaluation and improvement.

The major principle seems to be user experience. This means that Google likes results if we click on them in the same way that (commercial) tv channels like it when people watch them. An interesting one is diversity of results in the first page. I have heard Google engineers name this principle a few times and I think it is a great. Yet another one is revenue maximalisation. If we do not click on sponsored results Google is unhappy, becasue that is how it makes money. It is also more happy if we click on links to other Google properties. There are many more, many of which I have probably no idea about, and I hope Google will keep its promise and give us more principles that we can evaluate and use to judge how well Google is doing:

But being completely secretive isn’t ideal, and this blog post is part of a renewed effort to open up a bit more than we have in the past. We will try to periodically tell you about new things, explain old things, give advice, spread news, and engage in conversations. Let me start with some general pieces of information about our group. More blog posts will follow.

Dutch Crimefighters Want to Close Remaining Anonimity Gaps

Tuesday, May 13th, 2008

From a report of the Dutch Police with regard to the need for identification of Internet users and the logging of all internet use in Internet cafes including public libraries:

“Besides, there is no societal or economic reason for using the Internet anonymously.”

“Er is overigens geen enkele maatschappelijke of economische reden om anoniem gebruik te maken van Internet.”

In contrast, see here, here, here, and here.

Spring Travel and Presentations

Friday, May 9th, 2008

I have a great agenda for the coming weeks. Next week I go to Bergen in Norway. On the 14th of May I speak at a symposium about data retention . My presentation will focus on data retention at the EU level and the implementation in the Netherlands.

The next day I will present a paper on selection intermediaries and the liability for third party content in or through search results and try to focus in on the role of selection intermediaries vis a vis traditional news agencies and the press. I am very much looking forward to the discussion with people from the Department of Information Science and Media Studies in Bergen, about which I have heard only good things.

Two weeks later, 27 May, I will be on a panel at the FIPR 10 Years! anniversary of the Foundation for Information Policy Research in London.

And after that I will fly to Israel for the conference on social media and the commodification of community. I will be on the panel on governance of social media. The list of participants is very impressive and the program diverse and very promising, so I am thrilled to be able to take part in the discussions.

Robots.txt & Privacy Protection

Wednesday, May 7th, 2008

Lately, there seems to be some consensus that the robots.txt instruction, and similar instructions and protocols to be developed in the future, are the way forward to protect privacy and other interests in controlling information online. The eloquent Jonathan Zittrain gives a very nice explanation of this idea in this video:


I tend to agree that meta-data will help to solve some of the problems. The robots.txt gives some effective control for publishers over who gets to index data. One can imagine that it could be extended in a way that it allows third parties, e.g. people in a picture that can be identified though facial recognition, to express their wishes about publication and reprocessing of the material, including identification.

There are however a number of problems. First of all, to express privacy wishes is a privacy problem itself. So the system has to be very sophisticated. Second, major search engines do obey these sort of instructions as robots.txt but there are search engines and automated content aggregators that don’t and this can be legitimate. The fact that Google respects robots.txt instructions has implications for the accessibility of information and robots.txt and similar instructions are used for all sorts of reasons, including illegitimate ones. I would see similar problems arise trying to protect online privacy with instruction protocols. The control has to be mediated and breakable. I am currently reading Zittrain’s book and did not reach chapter nine yet. I am very curious about his particular proposal and hope to write more about this myself in the future.