Indexing & Security

This note is to collect some ideas about security and searching à la Google. It uses powerful governments as an exemplar of an institution worried about such issues but what concerns governments will concern other institutions soon enough.

Web search engines are a boon to just about everyone. It is hard to imagine a world without them even though most of us recently inhabited such a world. They let even the poor find obscure facts about obscure subjects. Knowing who is searching for what, indexed by either ‘who’ or ‘what’ is an ability little discussed so far. Imagine what the CIA would give for that ability. I imagine government agencies have such abilities to varying degrees.

Imagine the CIA’s reluctance to be the ‘who’ of such a search—Who is the CIA vetting today?

A compartmentalized search engine would reveal itself as a spider. There are now a variety of spiders that any web server will make note of in its access_log. I observe several spiders whose provenance I only casually pursue.

It would be difficult but technically possible for a major search engine to provide confidential searches. The distinction I draw here is not search privacy policy, but the degree of assurance that a query is indeed unobserved. Given the multi-continental nature of search infrastructure, this is indeed difficult. See this.

A spider might provide a bus service that showed each page of the web to each of several clients. “To show” is to run client code in an address space that includes the read-only text of the page. The client code is able to send information home, including the web address. The client may be protected against side channel attacks to some negotiated degree. The bus schedule is also negotiated.