Archive for March, 2008

Phorm’s Targeted Advertising: Revolution or Disaster

Monday, March 17th, 2008

The U.K. is currently debating a new advertising system based on internet traffic data of ISPs. Developer Phorm states it is creating two revolutions. The first one in online advertising [making money by scanning and analyzing the internet traffic data of their customers' customers], the second one in privacy [letting machines do the work for them, supposedly without processing personal data]. Phorm’s website quotes some comforting statements and a report on its website regarding privacy protection. FIPR, the foundation for information policy research, reacted with an open letter to the UK Information Commissioner.

Since privacy protection (general data protection and in electronic communications) is harmonized at the EU level, it seems highly relevant how the debate will be settled in the U.K.

The whole deal reminds me of the argument of Google employee Matt Cutts in the Google privacy debate. He basically said: if Google’s data collection worries you, you should be much more worried of ISPs:

[...]your ISP has a superset of data that Google has, because everything you do passes through your ISP. So your ISP may have much more detailed records about places where you go on the net, plus they have a verified identity with something like a credit card, and they actually know which IPs you’re on. [...] Many of the questions about privacy I see are interesting because ISPs have more data than Google does, but you rarely see people ask questions about ISPs, even though at least some ISPs do sell clickstream data.” Phorm proves his point.

A question I would like to see addressed is whether ISPs are the proper proxies for Phorm’s system. Why aren’t internet users Phorm’s primary customers? And related to that, what is a reasonable price for consenting to a degenerated Internet access provider that profiles all your internet traffic? Relevant advertisements?

European Commission clears Google’s DoubleClick deal

Tuesday, March 11th, 2008

It does not come as a surprise anymore. The European Commission has approved the acquisition of DoubleClick by Google. The press release is here. The review has focused on the relevant advertisement markets, not so much on consumer side issues such as privacy and the tracking of online behavior, as discussed by the FTC and the European Parliament. In my last three posts, I addressed the competitive relevance of the personal data assets of both companies and I look forward to comment on the full decision.

More about Data on ‘information use’ and Competition

Monday, March 10th, 2008

The New York Times conducted a study of Web user data collection. One quote that caught my interest:

Some advertising executives say media companies will have little choice but to outsource their ad sales to companies like Microsoft and Yahoo to benefit from their data. The Web companies may prove they can use their algorithms and consumer information to better select which ads for visitors better than media companies can.

Economics, Privacy and Future Payments

Wednesday, March 5th, 2008

Two days ago I discussed Hal Varian’s reply to a NYT reader’s question about the reasons for Google’s persistent dominance as a provider of Web search. The answer to that question was not very satisfactory (Google is the market leader because we have learned best how to do search or more shortly: we are the best because we are the best.)

I concluded that Varian might have implicitly refered to search logs and other data sources and Google’s ability to learn from it, when he speaks about Google’s secret sauce: “[…]we have better recipes. And we are continuously improving those recipes precisely because we know the competition is only a click away.“I pointed to the incredible amount of data Google has (10 years of increasing dominance in search without any significant deletion of user data) and concluded that the secret sauce must be to use these data in all relevant and possible ways and of course to innovate further.

Now Hal Varian continues along this path - connecting the dominance discussion to the Web search privacy discussion- by describing the reasons “why data matters” to Google in a post on Google’s official blog. An excerpt:

Today’s web search algorithms are trained to a large degree by the “wisdom of the crowds” drawn from the logs of billions of previous search queries. This brief overview of the history of search illustrates why using data is integral to making Google web search valuable to our users.

Varian’s post serves as an explanation why Google needs room to have extensive search logs. However, in relation to the other post (about Google’s secret sauce) it must also be seen as a warning. The relation between dominance and personal/user data collection by search engines provides all the evidence for the prediction that the Web search privacy problem will become worse and worse. By the daily payments we all make to Google’s dominant part of the database of intentions, we further secure Google’s dominant position and commit ourselves to future payments to be able to collect our (public) interest in the form of relevance. It’s about time we start wondering whether our new banks might need their own type of regulatory oversight.

The reasons for Google’s dominance in Web Search

Monday, March 3rd, 2008

Last week the New York Times Freakonomics blog featured a Q&A between readers and Google’s chief economist Hal Varian. One of the questions was as follows:

Q: How can we explain the fairly entrenched position of Google, even though the differences in search algorithms are now only recognizable at the margins? Is there some hidden network effect that makes it better for all of us to use the same search engine?”

Google’s dominance in a significant part of Europe is striking. In the Netherlands, Google’s user share goes over 95 % (in searches, not in searchers - although I do not think there will be a big difference). The question is very relevant. Dominance in search has been a recurring issue in my research. Unfortunately, but understandingly Varian evades the question. The European Commission still needs to approve the Doubleclick merger. The issue is altogether sensitive. Varian’s answer is as follows: (He posted a more lengthy version on the official Google blog.)

A: The traditional forces that support market entrenchment, such as network effects, scale economies, and switching costs, don’t really apply to Google. To explain Google’s success, you have to go back to a much older economics concept: learning by doing. Google has been doing Web search for nearly 10 years, so it’s not surprising that we do it better than our competitors. And we’re working very hard to keep it that way!”

I am sure he would not be sitting 12 feet away from Eric Smith if he did not have a better answer to this question. Eszter Hargittai was especially unsatisfied with the part about switching costs. She posted a lengthy response on Crooked Timber, discussing the possible lock-in of Google users on the basis of her extensive experimental academic research of internet usage.

In the replies, Daniel Feygin notes that the search volume reinforces search quality and calls this a network effect. I am not sure whether one should call this a network effect -i think not- but it is certainly true that major search engines can use past searches to increase search quality. One can look into the Web search privacy discussion to discover the value of search engine logs for search engine providers. One can look into the click fraud discussion to find the value of these data to protect and secure the advertisement platforms.

To me it seems that Varian also implicitly refers to search logs and other data sources and Google’s ability to learn from it, when he speaks about Google’s secret sauce:

“[...]we have better recipes. And we are continuously improving those recipes precisely because we know the competition is only a click away.

And later on in his answers he recommends a young person asking for career advice “to take lots of courses about how to manipulate and analyze data: databases, machine learning, econometrics, statistics, visualization, and so on.” Google has lots and lots and lots of data: 10 years of increasing dominance in search without any significant deletion of user data, multiple copies of the Web, the best index of the Web in the world, the broadest set of online advertisers and their preferences. Their secret sauce is to use these data in all relevant and possible ways and of course to innovate frantically and secretly- in Varian’s words to learn from it.

A data source that might not be discussed in the context of the reasons for Google’s dominance is input on spam, and illegal or harmful references in the index. I would suggest that because of its popularity Google is the first search engine to be addressed in case of spam and illegal or harmful content. Google has shown to have sufficient staff and resources to deal with these notices and possible court cases and consequently shows a relatively nice and clean index in return to user queries. In case the spam, illegal or harmful material is not removed from the Web, search engines which did not receive a notice because they are harmlessly unpopular still contain the references. Newcomers have to start from scratch. I would not be surprised if Google and maybe other major search engines thus profit from obligations and requests to remove unlawful references.

I found one other comment by Kamal Jain interesting. (He seems to be working for Microsoft.) He states that because of the absence of price competition, where prices would normalize the differences in perceived qualities, a small perceived difference in quality of search engines can be magnified. Paradoxically, that seems to suggest that the lack of switching costs is working in favour of Google. I suggest the following solution: Microsoft pays its users to get people to use its (perceived) inferior Web search product. This does not have to be a joke. The reward can be relative to the price users pay in privacy/control over their user data.