London Beyond Sight

How do you convey the emotional attachment that Londoners have to their neighbourhood red post box? Or their local pub, park, statue or theatre?

Reading the Wikipedia article about each of these things is a good start, but what you really need is the personal touch… VocalEyes, a UK charity for people with a visual impairment, did that with its project “London Beyond Sight“. Audio descriptions of London landmarks by key Londoners, for blind and partially sighted people. However, don’t let those last few words fool you into thinking these recordings aren’t useful for sighted people too, far from it!

[Tony Robinson describing his local pillar box, embedded from Soundcloud]

A couple of months ago I met with Matthew Cock, the Chief Executive of VocalEyes, to discuss potential Wikimedia collaborations. Actually, we met because we’d not seen each other for several years and wanted to catch up – but we ended up talking about potential collaborations nonetheless! The “low hanging fruit” we identified were these 40 audio files. Their copyright belonged to the organisation, they were short, clearly spoken, fully transcribed, and specifically associated to particular landmarks the local area. Almost all of them correspond on a one-to-one basis with a Wikipedia article. Moreover, the speakers are notable people too!

[Lady Cobham describing the All-England Club, embedded from Soundcloud]

So, Matthew went ahead and relicensed these files to CC-By-SA, transcoded them to .OGG and uploaded them all to Wikimedia Commons! You can listen to the full playlist of the 40 recordings on Soundcloud, on the VocalEyes website.  I have now embedded into the English Wikipedia articles about their subject- see the full list here using the GLAMorous tool.

If you listen to several of these lovely recordings, you might notice a peculiar style… Not quite personal oral-history, not quite scientific-analysis, these are “audio descriptions” made primarily for people with visual impairment. Therefore, their task is to give an accurate impression of the subject. If a picture is worth a thousand words, this these audio descriptions attempt to paint with words. And, as a result, they work marvellously in Wikipedia articles.

One of the benefits of working with free-licenses is that it allows other people to re-purpose your content, giving it an unexpected new context. Wikimedian Andy Mabbett, who just concluded a Wikidata speaking-tour in Australia) took those files and edited them to just the “hello my name is… and I am a…” section, thereby augmenting the existing “voice intro project” that he pioneered. Dozens more Wikipedia biographies and Wikidata items will now have the their subject pronouncing their own name correctly!

[Shami Chakrabarti describing Parliament Hill, embedded from Soundcloud]

People who have been following GLAM-Wiki for several years might recall Matthew’s name from 2010… At the time, he was the Head of Web at the British Museum, and the first GLAM manager to accept my, then controversial, proposal to be a volunteer Wikipedian in Residence. Given this is the first disability access-focused content donation to Wikimedia (that I know of), that means he now has TWO Wikimedia “firsts” to his credit!

If you would like to learn more about Audio Descriptions and the other work of VocalEyes, and especially if you are a Wikimedia affiliate organisation and would like to try to replicate this project in your country, you can contact VocalEyes via their website.

[Bettany Hughes describing the Roman-era river crossing of the River Brent, embedded from Soundcloud]

If someone would like to help me with adding the transcriptions of each recording to their respective Commons description, that would be really helpful. All are available as MS Word documents on the project’s homepage. Just download the document, copy the text onto the description field of its associated Wikimedia Commons file page, and save.

Posted in Wikimedia | Tagged

Knowledge Engine by Wikipedia

This is a revised and updated version of an email I first wrote on the Wikimedia-l mailing list three days ago.

12714391_10156577051260241_1331448585_nAfter sustained pressure from the Wikimedia community, the formal agreement for the Knight Foundation to provide $250,000 towards year-one of the “Knowledge Engine by Wikipedia” project was released this week. You can read it for yourself here.

This document specifically and overtly states that its purpose is to start work on a search engine in opposition to Google/Yahoo. More importantly, building such a thing is a potentially valid way of responding to current trends online. Specifically: the shift to mobile; decreased pageviews; and Wikipedia-derived information being displayed in the search results on Google. Again, the issue should not be about defining what a “knowledge engine” is, but that the project was envisaged in secret. As summarised in The Signpost:

  • In a November 4 email to all WMF staff, provided to the Signpost by several WMF staffers, Executive Director Lila Tretikov expressly stated that the Knowledge Engine “is NOT … a search engine”.
  • On February 11, Jimmy Wales stated on his talkpage: “To make this very clear: no one in top positions has proposed or is proposing that WMF should get into the general ‘searching’ or to try to ‘be google’. It’s an interesting hypothetical which has not been part of any serious strategy proposal, nor even discussed at the board level, nor proposed to the board by staff, nor a part of any grant, etc. It’s a total lie.”

In contrast to these statements, this is what is written in the actual text of the Knight Foundation grant:

“Knowledge Engine by Wikipedia will be the internet’s first transparent search engine, and the first one originated by the Wikimedia Foundation”. It will, “democratize the discovery of media, news and information – it will make the Internet’s most relevant information more accessible and openly curated, and it will create an open data engine that’s completely free of commercial interests. Today, commercial search engines dominate search engine use of the internet…” [p10]. “The project will pave the way for non-commercial information to be found and utilised by internet users” [p2]

At the bottom of page 13, the primary risk identified is that “interference by Google, Yahoo or another big commercial search engine could suddenly devote resources to a similar project”. As SarahSV pointed out, if the “Knowledge Engine by Wikipedia” is only about improving the inter-connectedness of the Wikimedia sister projects by improving how internal systems work – which no one is disputing is a very useful goal – then Google/Yahoo releasing a new search engine product would not be counted as the project’s “biggest challenge”.

wikipedia_search_april_2015

[mockup of a knowledge engine search result page, as presented to the Knight Foundation in April 2015]

Jimmy Wales declared that “suggestions this is some kind of broad google competitor remain completely and utterly false” and simplistic reporting from mainstream media implies that this “search engine” is effectively synonymous with what-Google.com-looks-like. However, the proofs of concept Ask Platypus and Tuvalie, and of course the only other thing that actually calls itself a “knowledge engine” – WolframAlpha, are all valid alternatives of what a search engine can be.

Let’s be clear: the very fact that the Knight Foundation approved the release of the official grant contract is a demonstration of their integrity and why they are such an important partner to the Wikimedia movement. The issue should not be about whether the “Knowledge Engine by Wikipedia” is a good idea, or what it looks like. The issue is that it has been prepared secretively, arguably without clarity even for those who did know about it in advance, and certainly without a strategic plan.

“Non commercial”

The document itself refers to “non commercial” several times, and seems to be using the term loosely. Nevertheless, it seems clear to me that any reasonable person who is not deeply-immersed in copyright debates about the definition of “free” would understand the words “non commercial” in the context of this document to mean that the search engine is operated non-commercially. Now, I do acknowledge that a grant-request is by definition a “sales pitch” and you have to write your request using the terminology and focus areas of the grant-giver. However, it is my understanding that Lila specifically wanted to build this – a competitor to Google – and that this is most clearly expressed in the summary on page 10. It describes the 6 principles through which the “Knowledge Engine by Wikipedia” will “upend the commercial structure [of search engines]”. These are Public Curation, Transparency, Open Data, Privacy, No Advertising and ‘Internalisation’.

Nothing in this document talks about ways to limit the content of the search engine to only “non commercial” stuff (and if it did, we would be talking about partnering with search.CreativeCommons.org).

Lack of Strategy

Now, maybe an open-source search engine would be a good thing for the WMF to create! But that would be a major strategic decision. It would be, in effect, a new sister project to sit alongside (above?) Wikipedia, Commons, Wikidata etc. It is arguably  within the Wikimedia mission statement to build something like that. That is not the problem. The problem is the secrecy. Or, as summarised by TheDJ, “Great idea… terrible management”.

The “Knowledge Engine by Wikipedia” concept appears nowhere in the current strategy consultation on Meta. As I wrote on this blog last week in Strategy and Controversy, Part 2: “Of 18 different approaches identified in the…consultation process only one of them seems directly related to [search]: ‘Explore ways to scale machine-generated, machine-verified and machine-assisted content’. It is also literally the last of the 18 topics listed”.

It seems to me extremely damaging that Lila has approached an external organisation for funding a new search engine (however you want to define it), without first having a strategic plan in place. Either the Board knew about this and didn’t see a problem, or they were incorrectly informed about the grant’s purpose. Either is very bad. And let me be very clear – this is not a case of the WMF Grants department going off by themselves. This is an executive decision by either the Board-to-Lila, or Lila by herself. The latter seems more likely given her own statement on her talkpage:

“In the staff June Metrics meeting in 2015, the ideation was beginning to form in my mind from what I was learning through various conversations with staff. I saw the Wikimedia movement as the most motivated and sincere group of beings, united in their mission to build a rocket to explore Universal Free Knowledge. The words “search” and “discovery” and “knowledge” swam around in my mind with some rocket to navigate it. However, “rocket” didn’t seem to work, but in my mind, the rocket was really just an engine, or a portal, a TARDIS, that transports people on their journey through Universal Free Knowledge.”

12722459_10156577059655241_1851331260_o

[The original logo of the project – a search icon inside a rocket – from a presentation dated June 30]

As pointed out by Risker back in May 2015, the Search team had already been created and seemed disproportionately large. It seems clear to me that this was done in anticipation of the “Knowledge Engine by Wikipedia” project, as it was described in this grant document. Since then, this very high initial target has since been reduced, a lot. It is now defined as: “improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new [external] data sources for our projects.” (as described in the Discovery Department’s FAQ response to the question – “are you building Google?“).

However, this change is not represented in the actual grant document. The deliverables for this stage of the project are improvements on existing products – but the overarching purpose of the project is most certainly not. That either means we misled the Knight Foundation at the start, or we changed our mind since then but didn’t tell them. Much more likely, in my opinion, is that the Knight Foundation knew that trying to create a non-profit search engine was high-risk or at the very least extremely ambitious. So instead they gave a smaller exploratory grant which helps to fund some genuinely useful activities (the “outcomes” of this first stage, as listed on page 3 and also page 12).

Also, let me reiterate – improving the “discoverability” of our own content across wikis/sister-projects is a very good goal. Consolidation/Integration of projects’ content is much desired (e.g. the much-longed for ‘structured data on Commons‘ project) and everything on the Discovery team’s own list of current priorities is great. However, those things are not what have been “sold” as the end result of this grant, even taking into account the adaptability inherent in agile-software development projects.

page50-640px-wmf_strategy_preview2c_wmf_metrics_meeting_june_2015-pdf[The first public appearance of the words “knowledge engine”. Slide 50 of the WMF June 2015 Metrics meeting presentation]

Cost

Page 10 of the grant text specifically says that the cost of the first stage of “Knowledge Engine by Wikipedia” is $2.4 million, and that the grant is for 1 year starting in September 2015. Page 2 says that the whole project is in 4 stages, each lasting approximately 18 months = 6 years. This grant of $250,000 therefore only covers 10% of the cost, of the first stage, of the total project.

As SarahSV asked on Wikimedia-l (reiterated by Pine), “The document says the ‘Search Engine by Wikipedia’ budget for 2015–2016 ($2.4 million) was approved by the board [page 9]. Can you point us to which board meeting approved it and what was discussed there?” I second this question, because I’m not seeing it in the current annual plan.

There is no way that Lila approached the Knight Foundation asking to fund only 10% of the first year of a 6 year project. Instead, as revealed in The Signpost the actual amount initially requested as a grant was $6million over three years and negotiations have concluded at $250,000 for the first year only. The first stage is also the cheapest of the four stages (“discovery, advisory, community, extension” – described on page 2). Per the document quoted in The Signpost, the budget was expected to “increase by 20% per year as we accelerate the growth of the program” with the 2017–18 estimate at $3.5million.

We can therefore reliably extrapolate that $12Million is the absolute minimum amount that was planned to be spent over six years. As pointed out by Doc James on the [public] WikipediaWeekly Facebook group  – estimates presented to the board were in the range of tens-of-millions.

 The Signpost also revealed that the WMF hoped to fund the difference between their initial request to the Knight Foundation (let alone the much reduced amount they actually received) with “…funding from the Wikimedia Foundation’s general fund or from additional restricted grants”. The WMF “general fund” in this sentence can only mean the revenues raised through the annual fundraiser. This makes its deliberate absence from any documents shown to donors, or itemised in the annual plan, all the more concerning. It cannot be that this project was a secret in order that “Google doesn’t find out”. That would be a misunderstanding of our mission and values, and it also underestimates their intelligence – Google would notice anyway once we started actually building “it”. So, why the secrecy‽

It is inconceivable to me that the Executive Director would privately propose to an external partner that they would undertake a six-year project to build a search engine that will have massive cost, staffing, strategic and content implications – entirely without an official WMF strategy covering that period, no indication in the current annual plan, without the awareness of the community, and unclearly communicated to the Board. I find the fact that this could have been done to be a deep breach of our values – and not wholly unrelated to the current sudden exodus of long-serving senior members of WMF staff.

[The first post in this series “Strategy and Controversy” was published on January 8. Part 2 was published on January 30.]

Posted in Montgomerology, Wikimedia, wikimedia foundation | Tagged | 19 Comments

Strategy and controversy, part 2

It’s been a busy time at Wikimedia Foundation HQ since my first post in this series, summarising the several simultaneous controversies and attempting to draw a coherent connecting-line between them. The most visible change is Arnnon Geshuri agreeing to vacate his appointed seat on the WMF Board of Trustees after sustained pressure; including a community-petition, several former Board members speaking out, and mainstream media attention – as summarised in The Signpost. This departure is notwithstanding the entirely unconventional act of Silicon Valley native Guy Kawasaki in voting against the petition to the Board despite the fact that he’s on the Board and that it was effectively his first public action relating to Wikimedia since receiving that appointment – as I described on Meta.

Although this news about Geshuri was well received, I feel that this controversy became the flash point because it was easily definable, and had a binary decision associated with it – go or stay. Most problems aren’t so neatly resolvable. Hopefully then, the fact that it is mostly resolved (pending the now highly sensitive task of finding his replacement) should allow focus to be drawn back to more fundamental issues of leadership.

Earlier this month The Signpost published details from the internal WMF staff survey:

We understand that there was a healthy 93% response rate among some 240 staff. While numbers approached 90% for pride in working at the WMF and confidence in line managers, the responses to four propositions may raise eyebrows:

  • Senior leadership at Wikimedia have communicated a vision that motivates me: 7% agree
  • Senior leadership at Wikimedia keep people informed about what is happening: 7% agree
  • I have confidence in senior leadership at Wikimedia: 10% agree
  • Senior leadership effectively directs resources (funding, people and effort) towards the Foundation’s goals: 10% agree

The Signpost has been informed that among the “C-levels” (members of the executive), only one has confidence in senior leadership.

A week later the head of the HR department Boryana Dineva – the person with arguably the most difficult job at the WMF right now – gave a summary of that survey in the publicly recorded monthly metrics meeting – starting at 42 minutes in:

Notice the complete absence of mention of the part of the survey which was highlighted by the Signpost? You’re not the only one. In the following Q&A came a question from Frances Hocutt, later paraphased on-wiki by Aaron Halfaker – “Why are we not speaking clearly about the most concerning results of the engagement survey? “. Starting at 56 minutes in:

It is my supposition that the extremely low confidence in senior leadership among the staff including by the “C-Levels” is directly connected to both:

  1. a lack of clarity in the organisation’s strategic direction following a long period since the previous strategy expired and several false-starts (such as the 2-question survey), leading to sudden and unexplained departmental re-organisations, and  delays in the current process.
  2. the organisation’s recent apparent failures to abide by its own organisation Values. Notably in this case, the values of “independence”, “diversity”, and “transparency”.

Anne Clin – better known to Wikimedians as Risker – neatly tied these two threads together earlier this month in her keynote to the WMF annual all-staff meeting. In a speech entitled “Keep your eye on the Mission” she stated:

Wikimedia watchers have known for quite a while that the Foundation has decided that search and discovery should be a strategic priority. It’s not clear on what this decision has been based, although one could shoe-horn it into the mission under disseminating information effectively and globally. It wasn’t something that was fleshed out during the 2015 Strategy community consultation a year ago, and it wasn’t discussed in the Call to Action. The recent announcement about the Knight Foundation grant tells us it is for short-term funding to research and prototype improvements to how people “discover” information on Wikimedia projects. No doubt Search and Discovery, which already has a large number of talented staff affiliated with it, will show up near the top of proposed strategic priorities next week when they are announced to the community – and will be assigned a sizeable chunk of the 2016-17 budget. The results of the Knight Foundation funded research probably won’t be available early enough to use it for budgeting purposes.

This is the only picture I can find of that speech – Anne at the lectern discussing “the board” 🙂

Arguably, she actually got that prediction wrong. Of 18 different approaches identified in the now-public strategic planning consultation process only one of them seems directly related to the search and discovery team’s work: “Explore ways to scale machine-generated, machine-verified and machine-assisted content“. It is also literally the last of the 18 topics listed (6 in each of reach, communities and knowledge) and is softened with the verb “explore” (rather than other items which have firmer targets to “increase”, “provide”, etc.). This quasi-hidden element of the strategy therefore invites the question – if this is such a small part of the documented strategy, why is “Discovery” receiving such disproportionate staffing, funding, attention? All of the projects listed on their portal and their three year plan are desirable and welcome, but the team is clearly staffed-up in preparation for significantly more ambitious efforts.

Anne again:

This mission statement was last revised in November 2012 – it is incorporated into the bylaws of the Wikimedia Foundation. And this revision of the mission statement occurred shortly after what many of us remember as the “narrowing focus” decision. Notice what isn’t included in the mission statement:

Not a word about the Wikimedia Foundation being a “tech and grantmaking organization”. While it is quite true that the bulk of the budget is directly linked to these two areas, the Board continues to recognize that the primary mission is dissemination of educational material, not technology or grants….

…Engineering – or as it is now called, “Product”, had three significant objectives set for it back in late 2012: develop Visual Editor, develop Mobile, and make a significant dent in the longstanding technical debt. The first two have come a long way – not without hiccups, but there’s been major progress. And there has been some work on the technical debt – HHVM being only one significant example. But the MediaWiki core is still full of crufty code, moribund and unloved extensions, and experiments that went nowhere. That’s not improving significantly; in fact, we’re seeing the technical debt start to build as new extensions are added that lose their support when someone changes teams or they leave the organization. Volunteer-managed extensions and tools suffer entropy when the volunteer developer moves on, and there’s no plan to effectively deprecate the software or to properly integrate and support it. There’s no obvious plan to maintain and improve the core infrastructure; instead the talk is all of new extensions, new PRODUCTS. From the outside, it looks like the Foundation is busy building detours instead of fixing the potholes in the highways.

It is my understanding that the original grant request to the Knight Foundation was MUCH larger than the $250,000 actually received. Jimmy Wales declared that concerns about the details of this grant are a “red herring” and that ousted Board member James Heilman’s concerns about transparency are “utter fucking bullshit” (causing James to announce he will soon be providing proof of his claims). Hopefully the grant agreement itself will be published soon, as Jimmy implied, so we can actually know what it is that has been promised.

It is worth noting that the “Call to action” mentioned above was part of the mid-2015 to mid-2016 Annual Plan, but that the risk assessment component of that plan was only published this week. Presumably this was written at the time but unintentionally left-off the final publication. Nevertheless, it includes some rather ironic statements when read in hindsight:

Risk: Failure to create a strong, consistent values­ based work culture could cause valued staff to leave.

Mitigation strategies:

  • Establish initiatives that support our commitment to diversity and creating spaces for constructive, direct and honest communications.
  • Communicate and listen effectively with staff on values and initiatives undertaken.

Significantly, the WMF’s Statement of Purpose as described in its own bylaws, states that it will perform its mission “In coordination with a network of individual volunteers and our independent movement organizations, including recognized Chapters, Thematic Organizations, User Groups, and Partners”. This corresponds to the last of the official organisation Values: “Our community is our biggest asset”. At its meeting this weekend, the Board will have to determine whether the current executive leadership can demonstrate adherence to these avowed values – particularly coordination and transparency of its vision to the community and the staff – and is fit to deliver this latest strategy process.

[The first post in this Montgomerology* series “Strategy and Controversy” was published on January 8.]

Edit: Within the hour of publishing this blogpost, and one day before the board meeting, a “background on the Knowledge Engine grant” has now been published on Lila’s talkpage.

*Montgomerology: The pseudo-science of interpretation of meaning in signals emanating from the WMF headquarters at New Montgomerology St., San Francisco. cf. Vaticanology or Kremlinology.

Posted in Montgomerology, Wikimedia, wikimedia foundation | Tagged | 7 Comments

Strategy and controversy

Next week is Wikipedia’s 15th birthday, the first draft of the long awaited strategic plan of the Wikimedia Foundation will be published for comment, and yesterday was the start of its annual “all staff” meeting. Meanwhile… there is a battle going on at the top for its soul.

WMF’s Executive Director Lila Tretikov with Board member Jimmy Wales

It is my supposition that this is not a list of unrelated incidents, but that this is part of a wider theme: That a portion of the Board of Trustees and the Executive Director of the Wikimedia Foundation believe that it should be treated as a technology organisation in the style of a dot-com company, out of step with the staff and without the awareness of the community. By contrast, it’s always been my belief that the Wikimedia Foundation is an education charity that happens to exist primarily in a technology field. Of course software engineering is crucial to the work we do and should represent the major proportion of staff and budget, but that is the means, not the end.

All this background makes next week’s WMF draft Strategic Plan a very important document. For the 2010-15 plan there was a massive community consultation project but this time around there was only a 2-question survey. As Philippe Beaudette, the Community Facilitator on that original strategy process and latterly the WMF Director of Community Advocacy (who also recently left the organisation), said to me [with permission to publish here]:

The Wikimedia Foundation has one unique strategic asset: the editing community. Other orgs have great tech resources, tons of money, good software, and smart staff… but none of them have the editing community.  I am, frankly, saddened by the fact that this one unique strategic asset is not more central to the developing strategy.

The November staff presentation gives a strategy preview that speaks of three priorities (slides 28-30): “1. Engage more people globally (reach) 2. Facilitate communities at-scale (community) 3. Include broader content (knowledge)”; and describes a need to “prioritise core work” (slides 32-33). All laudable goals, but they only include “example objectives” such as “build capacity”, “improve trust”, and “improve tools”.

Nevertheless, I suspect that the major strategic direction has already been privately determined. In short, it appears there will be an attempt to create the internet’s Next Big Thing™ at the expense of improving the great thing that we already have.

  • In May, as noted by Risker, “Search and Discovery, a new team, seems to be extraordinarily well-staffed with a disproportionate number of engineers at the same time as other areas seem to be wanting for them.”
  • The June staff presentation “strategy preview” talks about creating a “knowledge engine where users, institutions and computers around the world contribute and discover knowledge”. The FAQ page for the “Discovery department”, describes this project as “…improving the existing CirrusSearch infrastructure with better relevance, multi language, multi projects search and incorporating new data sources for our projects.”
  • In September the Knight Foundation gave a grant of $250,000 to build a “knowledge engine”. This was announced by the WMF two days ago. This is a “restricted grant” but, as has been described by Pete Forsyth, there is none of the associated documentation – for example the formal grant deliverables – except for a short FAQ.
  • As mentioned above, we now have two new Silicon Valley executives appointed to the Board of Trustees. They join the previously appointed member of the board Silicon Valley venture-capitalist Guy Kawasaki, as well as internet entrepreneur Jimmy Wales himself. There is no one appointed for their professional experience in education, charities, communities or developing countries.

While I agree with the general premise that the search system on the Wikimedia projects can be improved, I don’t know anyone who thinks “an indexed & structured cache [of] Federated Open Data Sources” should be THE strategic priority. Starting something entirely new like Federated Search is HARD and trying to include external sources (that link also suggests trying to also integrate the US Census, and the DPLA) is even harder, especially when there are so many existing technical needs. Quoting Philippe again; “for instance, fixing the inter-relationships between languages and projects, or creating a new admin toolset for mobile, or paying down our technical debt, or establishing a care/command/control operation for critical tools to ensure their sustainability, etc….”.

The Funds Dissemination Committee (on which I sit as a community-elected member) declared in November that it is “…appalled by the closed way that the WMF has undertaken both strategic and annual planning, and the WMF’s approach to budget transparency (or lack thereof).” In response the WMF is considering submitting its 2016-17 annual plan, based on the aforementioned strategic plan, to a “process on-par with the standards of transparency and planning detail required of affiliates going through the Annual Plan Grant (APG) process”.

We will see over the next weeks to what degree the apparent shift towards a Silicon Valley mindset – whether the staff and community like it or not – is indeed true. As the then-Chair of the Board Jan-Bart de Vreede said in describing Lila Tretikov’s appointment as Executive Director,

We are unique in many ways, but not unique enough to ignore basic trends and global developments in how people use the internet and seek knowledge…I hope that all of you will be a part of this next step in our evolution. But I understand that if you decide to take a wiki-break, that might be the way things have to be.

Meanwhile, you might be interested in this three-year roadmap for the Discovery department, for the more technically minded there is the “Discovery” workboard on Phabricator and associated mailing-list. Finally, for what it’s worth, the term “knowledge engine” itself is now deprecated.

[Edit: Since publication, this blogpost has been linked from

Posted in Montgomerology, Wikimedia | Tagged | 17 Comments

A decade on Wikipedia – my 4 favourite unusual things

Recently I reached the milestone of 10 years editing Wikipedia. And what a ride it’s been…

I’ve not updated this blog for several years, and lots changed in my life since then: I got married, worked for the National Library of Australia, and now I live in Italy. I keep meaning to write here regularly, but the longer I leave it the more ‘significant’ I feel that first post needs to be. So, I’ll use this anniversary to post a list of my 4 favourite unusual things I’ve done on-wiki and that will hopefully force me to start writing here again!

For the 800th anniversary of the signing of the Magna Carta, British artist Cornelia Parker created (and crowdsourced) a tapestry of how the Wikipedia article discussing the document looked precisely one year earlier. It’s a beautiful piece of work, and a really interesting artistic statement – called “Magna Carta (an Embroidery)“. The work went on display at the British Library as part of their celebration of the anniversary, and the BL also commissioned this fascinating documentary film:

[Fun fact – because the embroidery is a direct copy of a “share alike” licensed work (a Wikipedia article), the embroidery itself is also, therefore, licensed CC-By-SA!]

Naturally enough, the embroidery itself got its own Wikipedia article. So, when the BL uploaded the video to YouTube with a free-license, I took it, put it on Commons, and incorporated it to that Wikipedia article with the recursive edit summary above.

  • My favourite unusual biography: I created the youngest Wikipedia biography, at  7 months, 24 days BEFORE the subject’s birth, for the “Second child of the Duke and Duchess of Cambridge”.

I’ve created the articles about some very usual people, most especially the accident-prone Australian aeronautical balloonist Henri L’Estrange. But it was the article now known as Princess Charlotte of Cambridge had the most unusual circumstances. I started the article on the day that her mother was publicly announced as pregnant. Due to the child’s 4th position in succession to the British throne the biography was always going to be created at some point, but… how soon is too soon to create an article for something that will eventually be notable? After all, the pregnancy announcement itself was immediately worldwide news, therefore meeting the General Notability Guideline of “significant coverage in reliable sources that are independent of the subject”. As expected, the article was almost immediately, but ultimately unsuccessfully, nominated for deletion.

The debate raised some interesting philosophical arguments. Like, at what point is discussion about the child deserving of being addressed as topic in its own right rather than just as a sub-heading in the article “Catherine, Duchess of Cambridge”. There are interesting parallels with contemporary moral/legal/religious arguments about what point a foetus stops becoming part of its mother and becomes a separate entity…

The discussion also raised some amusing suggestions too, like if the article should be moved to a different title. Options suggested included: “Princess NN of Cambridge” using the latin term for unknown name; “Foetus of the Duke and Duchess of Cambridge” (or “Foetus of Cambridge” for short!) on the grounds that it was not yet a child; “Second pregnancy of the Duke and Duchess of Cambridge” for the same reason; or my absolute favourite, “Second declared pregnancy of Catherine, Duchess of Cambridge” to be indelicately precise!

The rector leaving St James’ after the ceremony

The article was promoted on the morning of the wedding itself, Australia day 2014!

Moreover, our fantastic photographer – Prue Vickery – also agreed to release several images from the day to Commons, and they are now used in the articles for the church itself (this image above), as well as “eagle” and “Queen Victoria Building“. I bet there’s not many people who can say their wedding pics are used to illustrate Wikipedia articles! As it happens, I also make a cameo appearance in the “bells” section of the St James’ article – ringing at position no.5  🙂

In 2014 I created the Wikipedia article for “the perfect anti-object” – the Camden Bench. This is a type of street-furniture in London with design criteria entirely focused on what you cannot do with it rather than what you can, an exemplar of hostile architecture (an article I also created).

This embedded google map shows a series of “Camden Benches” on Great Queen Street, in June 2012.

In describing the creation of this style of seat I needed to find a reference for when and where the benches were first installed, but all I had was the promotional website from the manufacturer itself – not, as is required, an independent source. Coincidentally, in April 2014, Google had introduced the ability to “go back in time” and view earlier street-view images of any specific location. Now, Google isn’t normally identified as a “reliable source” for Wikipedia’s purposes, but in the context of an image library that provides photos of the same streetscape taken over a period of several years…. that makes for an excellent footnote for the most banal of facts – proving the existence of a bench!


I’ve started a lot of other unusual Wikipedia articles – from one about a solidified worm’s burrow to one on a children’s birthday cake book with a cult following, to one about a pneumatic tube service that once posted a cat! I even started Wikipedia-centric unusual articles Orangemoody and Monkey selfie (though both only as redirects).

What are your favourite articles unusual occurrences that you’ve been involved with on Wikipedia?

Posted in Wikimedia

Norman Selfe

Tomorrow the English Wikipedia biography of  Norman Selfe will be featured on the main page. Not only is Norman a fascinatingly interesting fellow, but the very fact of his biography getting to this point is the culmination of a free-culture policy I helped create five years ago.

Norman Selfe, image from the SLNSW collection. gpo1_17900Norman was the president of both the Australian mechanical engineers’ and naval architects’ institutes as well as a member of both the British equivalent organisations. He was elected a full member of the English Institution of Civil Engineers and an honourary member of an American engineering association. He invented all sorts of things, including the first bicycle and refrigeration system in Australia. He founded the Royal Australian Historical Society, the Sydney Mechanics’ School of Arts as well as what has now become TAFE (Australia’s public vocational and technical training organisation – currently fighting to preserve its perpetually eroded funding) – Selfe even won the competition to design the Sydney harbour bridge. The Sydney suburb of Normanhurst was named after him during his lifetime. As the article says, “He was acknowledged upon his death as one of the best-known people in, and greatest individual influences upon, the city of Sydney.”

Selfe’s winning design for the proposed Harbour Bridge c. 1903

As a staunch advocate for the provision of practical and technical education to the masses, and the preservation of history, I suspect that Norman would have been very pleased with Wikipedia, open-access, and “maker” culture. I reckon he was a Wikipedian before his time and this Feature Article is a fitting tribute to his legacy.

However, that’s only half the story.
The original article on which the Wikipedia biography is based was written by ABC Radio National producer Catherine Freyne for the Dictionary of Sydney in 2009. Here is the original publication. At the time I was also working at The Dictionary and was the person who wrote their copyright policy which includes the option for authors to license their work under Creative Commons (CC-By-SA). Crucially, this makes the article content able to be both cited in, and imported into, Wikipedia (as I blogged about at the time).

Selfe’s 1891 scheme for remodelling transport in Sydney’s Rocks district

Since then I’ve gone on to import several Dictionary of Sydney articles to Wikipedia including, in chronological order: Glebe Island; John Mather (artist); Sydney artists’ camps; Hugo Alpen; Florence Violet McKenzie (a “Good Article” about Australia’s first female electrical engineer); AWA Tower; Henri L’Estrange (also a “Good Article” about this accident prone tightrope walker); Sydney Mechanics’ School of Arts; and of course Norman Selfe.

Tourists ride in a coal skip on the Selfe-designed rail incline – now the popular Sydney tourist destination “the Katoomba scenic railway”, 1915

Aafter all these years, this is the first article that I have taken to Feature status – so it’s particularly special to me. Also, because I nurtured Norman through “new”, “Did you know?”, “Good Article” and “Feature Article” processes I was given the “Four Award”, of which I am especially proud. I’ve made several hundred edits since importing Norman, added copious footnotes and taken it through four peer reviews (1, 2, 3, 4), but underneath it is still Catherine’s work – and as such she gets attribution at the very bottom of the article too.

The “Four award“. There are currently only 423 other such articles.

The final reason I’m particularly happy about this article appearing on the main page on Saturday is because, serendipitously, that is also the day that voting opens on the Wikimedia Foundation Board of Trustees community elections – for which I am a candidate. I didn’t nominate Norman to be “Today’s Featured Article” so I’m hoping this is a positive omen for the elections! Active Wikimedians – don’t forget to vote!!

Posted in History, Wikimedia | Tagged , ,

ALRC Copyright Inquiry – Copyright in Public Domain Art

The Australian Law Reform Commission is currently undertaking a review of the Copyright Act – known as the Copyright and the Digital Economy Inquiry. The terms of reference and associated issues paper are broad and give tantalising hope that some genuinely positive user-centric reforms are being discussed. From the terms of reference:

Amongst other things, the ALRC is to consider whether existing exceptions are appropriate and whether further exceptions should:

  • recognise fair use of copyright material;
  • allow transformative, innovative and collaborative use of copyright materials to create and deliver new products and services of public benefit; and
  • allow appropriate access, use, interaction and production of copyright material online for social, private or domestic purposes.

Large copyright graffiti sign on cream colored wall

Currently, Australia is the only (?) country where kids viewing websites in a school classroom requires paying a fee. This is because a statutory license specifically for schools is written into the current law. I don’t mean closed-access journals or even subscription services, I mean just regular, ordinary websites such as blogs or news. With the increasing use of digital educational resources, the amount schools have to pay is taking up an increasing proportion of the Education Department’s ever decreasing budget. Equally, even though Australian local councils are required to publish people’s development applications for public consultation, they are also expected to pay a fee to put the plans on their websites as this would (apparently) otherwise be considered a copyright infringement!  Both cases make a mockery of the “digital economy” – you have to pay fees to do something digitally that would be free if it were on paper.

Unsurprisingly, the beneficiary of these kinds of things is primarily the collecting societies (like CAL) who take a cut of each fee and invest the profits in lobbying for further fees and the general spreading of fear, uncertainty and doubt about copyright – that is, “copyFUD”.  So,  you can imagine that these kinds of organisations are none too happy with the prospect of a nice flexible Fair Use system like in the US to replace these license systems. It would kill the goose that lays their golden eggs.

Given the nature of the questions asked in the aforementioned Issues Paper, I’m cautiously optimistic that this review will mark the turning of the tide in how restrictive copyright has become. I believe we’ve now passed “peak copyright” and are starting to swing back away from the vested interests of “big content”. Even the US Republican party is making surprisingly awesome statements (WELL worth reading).

Meanwhile …

this year I’m studying for a WIPO sponsored Masters in Intellectual Property Law. For the thesis component of this course I chose to write about the subject that got me involved in copyright law in the first place, and ultimately caused me to create GLAM-WIKI: the highly contentious idea that a faithful reproduction of a copyrightable work creates a new, independent, copyright term. To give a specific example: what is the copyright status of the image below (a faithful reproduction of Tom Roberts’ Shearing the Rams (1890), one of Australia’s most recognisable paintings and important cultural artifacts) when it is created by its gallery owner and displayed on its website? Does it have a separate copyright status as a new work or is it Public Domain like the original? You will be unsurprised to learn that I firmly believe that the latter position is correct.

My thesis for the Masters offered a summary of the different approaches that museums in Australia, the UK and the US take on this issue, followed by an analysis of the legal precedents in those countries. It concluded with my recommendations for how the situation could be clarified in Australia. I am particularly indebted to my friend Kenneth Crews for his fantastic series of papers on this topic and to Europeana for walking-the-talk with their PD Charter and their Yellow Milkmaid report. Thanks also Barry Szczesny (American Association of Museums Government Affairs Counsel) for making such a shameless speech – in my opinion the copyright-in-scans equivalent of the “47% comment“.

And… I received the results back the other day – an “I-couldn’t-be-any-happier” 90%!

Knowing that better people than I am would be making solid submissions on the “popular” topics, and that the copyright-in-scans issue would probably be forgotten, I decided to modify my thesis to make it into a submission for the ALRC review. So, mine is officially number 136, right down the bottom in the “from individuals” section of the official list of received submissions between the eight (!)  from the prolific Matthew Rimmer. You can read the whole thing by downloading that PDF, or viewing it on Scribd. I’ve also embedded it here (it’s licensed CC-BY). I would really appreciate it if you do read it and that you leave any comments below.

I think it’s fairly clear that the courts in Australia, the UK and especially the USA are agreed that a mere mechanical reproduction of a Public Domain artwork is also PD – and the major museums are basically just hoping that no one will notice… (see “legal situation” beginning page 8 of the submission for details of this analysis). However, what I find problematic is that most digitisation is NOT merely mechanical as it often involves some element of post-production. Even though precedents such as Bridgeman v Corel couldn’t be clearer in stating PD=PD, neither it nor any other precedent says anything about what happens when the museum uses photoshop, for example, to adjust the digital image in order to better replicate the original colours  – the common practice. I argue that the entire point of the post-production is to imitate ever more closely the original work and remove any artistic interpretation (and therefore copyrightability) in the new digital image (see “post production” beginning page 14). Hiding behind post-production and combining that with copyright-like terms and conditions (and occasionally TPMs/DRM) are the methods by which museums often claim copyright in PD digitisations. The justifications for doing this are twofold – financial and ethical (see “justifications” beginning page 4).

There is increasing evidence that a image-licensing system is not profitable (see “counter-arguments” beginning page 5) but what truly makes me livid me is when I am told that there is a need to claim Copyright “in order to preserve the integrity of the work”. It is indisputable that cultural institutions have a duty of care for their collections. I am not denying that they do. Moreover there are often donor-restrictions or indigenous cultural rights considerations to account for. However, it is never the curators or conservators – those who are charged with that duty – who express the desire to “preserve integrity” through restrictive digital access. Rather, it is the sales and marketing managers who make this argument, using the curators as a shield, so as to preserve privileged access over the collection in order to execute a business model based on enforced scarcity.

I often refer to this justification as “the tea towel problem” and Kathleen Butler brilliantly calls it “Keeping the World Safe from Naked-Chicks-in-Art Refrigerator Magnets: The Plot to Control Art Images in the Public Domain through Copyrights in Photographic and Digital Reproductions“. This is where “Integrity” concerns are conveniently ignored when it comes to the ways the art is remixed as merchandise and sold in the museum’s own shop.

For example, in the Louvre

In the end, I make four recommendations (elaborated in the submission, starting page 18):

  1. Declare a high originality threshold of copyright in works that specifically excludes effort, skill or expense as relevant factors.
  2. Stipulate that a recreator’s intent should be a key test in determining the threshold of originality for digitised works. This is in order to counteract claims that changes in post-production are inherently worthy of new copyright.
  3. Declare that the Public Domain cannot be “contracted out”. That is, agreements which purport to exclude or limit the rights associated with the statutory limitation of Copyright should not be enforceable.
  4. Finally, that it should not be considered an infringement to circumvent a TPM/DRM if the purpose of doing so is solely to access Public Domain works.

The ALRC is due to make its final recommendations in a year’s time, and will release a discussion paper in response to everyone’s submissions at some point before then. Here’s hoping my submission, or at least this particular issue, gets a gurnsey!

I look forward to your comments on my thesis/submission below 🙂

Posted in copyright, museums, Wikimedia | Tagged , , , , ,