This is a revised and updated version of an email I first wrote on the Wikimedia-l mailing list three days ago.
After sustained pressure from the Wikimedia community, the formal agreement for the Knight Foundation to provide $250,000 towards year-one of the “Knowledge Engine by Wikipedia” project was released this week. You can read it for yourself here.
This document specifically and overtly states that its purpose is to start work on a search engine in opposition to Google/Yahoo. More importantly, building such a thing is a potentially valid way of responding to current trends online. Specifically: the shift to mobile; decreased pageviews; and Wikipedia-derived information being displayed in the search results on Google. Again, the issue should not be about defining what a “knowledge engine” is, but that the project was envisaged in secret. As summarised in The Signpost:
- In a November 4 email to all WMF staff, provided to the Signpost by several WMF staffers, Executive Director Lila Tretikov expressly stated that the Knowledge Engine “is NOT … a search engine”.
- On February 11, Jimmy Wales stated on his talkpage: “To make this very clear: no one in top positions has proposed or is proposing that WMF should get into the general ‘searching’ or to try to ‘be google’. It’s an interesting hypothetical which has not been part of any serious strategy proposal, nor even discussed at the board level, nor proposed to the board by staff, nor a part of any grant, etc. It’s a total lie.”
In contrast to these statements, this is what is written in the actual text of the Knight Foundation grant:
“Knowledge Engine by Wikipedia will be the internet’s first transparent search engine, and the first one originated by the Wikimedia Foundation”. It will, “democratize the discovery of media, news and information – it will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests. Today, commercial search engines dominate search engine use of the internet…” [p10]. “The project will pave the way for non-commercial information to be found and utilised by internet users” [p2]
At the bottom of page 13, the primary risk identified is that “interference by Google, Yahoo or another big commercial search engine could suddenly devote resources to a similar project”. As SarahSV pointed out, if the “Knowledge Engine by Wikipedia” is only about improving the inter-connectedness of the Wikimedia sister projects by improving how internal systems work – which no one is disputing is a very useful goal – then Google/Yahoo releasing a new search engine product would not be counted as the project’s “biggest challenge”.

[mockup of a knowledge engine search result page, as presented to the Knight Foundation in April 2015]
Jimmy Wales declared that “suggestions this is some kind of broad google competitor remain completely and utterly false” and simplistic reporting from mainstream media implies that this “search engine” is effectively synonymous with what-Google.com-looks-like. However, the proofs of concept Ask Platypus and Tuvalie, and of course the only other thing that actually calls itself a “knowledge engine” – WolframAlpha, are all valid alternatives of what a search engine can be.
Let’s be clear: the very fact that the Knight Foundation approved the release of the official grant contract is a demonstration of their integrity and why they are such an important partner to the Wikimedia movement. The issue should not be about whether the “Knowledge Engine by Wikipedia” is a good idea, or what it looks like. The issue is that it has been prepared secretively, arguably without clarity even for those who did know about it in advance, and certainly without a strategic plan.
“Non commercial”
The document itself refers to “non commercial” several times, and seems to be using the term loosely. Nevertheless, it seems clear to me that any reasonable person who is not deeply-immersed in copyright debates about the definition of “free” would understand the words “non commercial” in the context of this document to mean that the search engine is operated non-commercially. Now, I do acknowledge that a grant-request is by definition a “sales pitch” and you have to write your request using the terminology and focus areas of the grant-giver. However, it is my understanding that Lila specifically wanted to build this – a competitor to Google – and that this is most clearly expressed in the summary on page 10. It describes the 6 principles through which the “Knowledge Engine by Wikipedia” will “upend the commercial structure [of search engines]”. These are Public Curation, Transparency, Open Data, Privacy, No Advertising and ‘Internalisation’.
Nothing in this document talks about ways to limit the content of the search engine to only “non commercial” stuff (and if it did, we would be talking about partnering with search.CreativeCommons.org).
Lack of Strategy
Now, maybe an open-source search engine would be a good thing for the WMF to create! But that would be a major strategic decision. It would be, in effect, a new sister project to sit alongside (above?) Wikipedia, Commons, Wikidata etc. It is arguably within the Wikimedia mission statement to build something like that. That is not the problem. The problem is the secrecy. Or, as summarised by TheDJ, “Great idea… terrible management”.
The “Knowledge Engine by Wikipedia” concept appears nowhere in the current strategy consultation on Meta. As I wrote on this blog last week in Strategy and Controversy, Part 2: “Of 18 different approaches identified in the…consultation process only one of them seems directly related to [search]: ‘Explore ways to scale machine-generated, machine-verified and machine-assisted content’. It is also literally the last of the 18 topics listed”.
It seems to me extremely damaging that Lila has approached an external organisation for funding a new search engine (however you want to define it), without first having a strategic plan in place. Either the Board knew about this and didn’t see a problem, or they were incorrectly informed about the grant’s purpose. Either is very bad. And let me be very clear – this is not a case of the WMF Grants department going off by themselves. This is an executive decision by either the Board-to-Lila, or Lila by herself. The latter seems more likely given her own statement on her talkpage:
“In the staff June Metrics meeting in 2015, the ideation was beginning to form in my mind from what I was learning through various conversations with staff. I saw the Wikimedia movement as the most motivated and sincere group of beings, united in their mission to build a rocket to explore Universal Free Knowledge. The words “search” and “discovery” and “knowledge” swam around in my mind with some rocket to navigate it. However, “rocket” didn’t seem to work, but in my mind, the rocket was really just an engine, or a portal, a TARDIS, that transports people on their journey through Universal Free Knowledge.”

[The original logo of the project – a search icon inside a rocket – from a presentation dated June 30]
As pointed out by Risker back in May 2015, the Search team had already been created and seemed disproportionately large. It seems clear to me that this was done in anticipation of the “Knowledge Engine by Wikipedia” project, as it was described in this grant document. Since then, this very high initial target has since been reduced, a lot. It is now defined as: “improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new [external] data sources for our projects.” (as described in the Discovery Department’s FAQ response to the question – “are you building Google?“).
However, this change is not represented in the actual grant document. The deliverables for this stage of the project are improvements on existing products – but the overarching purpose of the project is most certainly not. That either means we misled the Knight Foundation at the start, or we changed our mind since then but didn’t tell them. Much more likely, in my opinion, is that the Knight Foundation knew that trying to create a non-profit search engine was high-risk or at the very least extremely ambitious. So instead they gave a smaller exploratory grant which helps to fund some genuinely useful activities (the “outcomes” of this first stage, as listed on page 3 and also page 12).
Also, let me reiterate – improving the “discoverability” of our own content across wikis/sister-projects is a very good goal. Consolidation/Integration of projects’ content is much desired (e.g. the much-longed for ‘structured data on Commons‘ project) and everything on the Discovery team’s own list of current priorities is great. However, those things are not what have been “sold” as the end result of this grant, even taking into account the adaptability inherent in agile-software development projects.
[The first public appearance of the words “knowledge engine”. Slide 50 of the WMF June 2015 Metrics meeting presentation]
Cost
Page 10 of the grant text specifically says that the cost of the first stage of “Knowledge Engine by Wikipedia” is $2.4 million, and that the grant is for 1 year starting in September 2015. Page 2 says that the whole project is in 4 stages, each lasting approximately 18 months = 6 years. This grant of $250,000 therefore only covers 10% of the cost, of the first stage, of the total project.
As SarahSV asked on Wikimedia-l (reiterated by Pine), “The document says the ‘Search Engine by Wikipedia’ budget for 2015–2016 ($2.4 million) was approved by the board [page 9]. Can you point us to which board meeting approved it and what was discussed there?” I second this question, because I’m not seeing it in the current annual plan.
There is no way that Lila approached the Knight Foundation asking to fund only 10% of the first year of a 6 year project. Instead, as revealed in The Signpost, the actual amount initially requested as a grant was $6million over three years and negotiations have concluded at $250,000 for the first year only. The first stage is also the cheapest of the four stages (“discovery, advisory, community, extension” – described on page 2). Per the document quoted in The Signpost, the budget was expected to “increase by 20% per year as we accelerate the growth of the program” with the 2017–18 estimate at $3.5million.
We can therefore reliably extrapolate that $12Million is the absolute minimum amount that was planned to be spent over six years. As pointed out by Doc James on the [public] WikipediaWeekly Facebook group – estimates presented to the board were in the range of tens-of-millions.
The Signpost also revealed that the WMF hoped to fund the difference between their initial request to the Knight Foundation (let alone the much reduced amount they actually received) with “…funding from the Wikimedia Foundation’s general fund or from additional restricted grants”. The WMF “general fund” in this sentence can only mean the revenues raised through the annual fundraiser. This makes its deliberate absence from any documents shown to donors, or itemised in the annual plan, all the more concerning. It cannot be that this project was a secret in order that “Google doesn’t find out”. That would be a misunderstanding of our mission and values, and it also underestimates their intelligence – Google would notice anyway once we started actually building “it”. So, why the secrecy‽
It is inconceivable to me that the Executive Director would privately propose to an external partner that they would undertake a six-year project to build a search engine that will have massive cost, staffing, strategic and content implications – entirely without an official WMF strategy covering that period, no indication in the current annual plan, without the awareness of the community, and unclearly communicated to the Board. I find the fact that this could have been done to be a deep breach of our values – and not wholly unrelated to the current sudden exodus of long-serving senior members of WMF staff.
[The first post in this series “Strategy and Controversy” was published on January 8. Part 2 was published on January 30.]