Thursday, November 29, 2012

Leveson report: the topics, people and key words in numbers

The Leveson report is out. How many times does it mention 'statutory' compared to 'self-regulation', and do 'failings' dominate 'sucesses'?

Lord Justice Leveson poses with a summary report into press standards
Lord Justice Leveson poses with a summary report into press standards 
Lord Justice Leveson has published his report setting out recommendations for the future of press regulation in the UK. The full document is an impressive 1,987 pages long, and contains over one million words.

We have been through the full text finding how often various words and phrases were mentioned, in an attempt to convey the report's tone as well as the topics and people that appear most frequently.

Perhaps unsurprisingly, references to failure are two thirds more numerous than those to success, with use of words including the text "fail" outweighing the total number of "success"es and "succeed"s by almost 75%.
Of terms related to the subject matter, "public" was one of the most frequently used, appearing on average more than twice per page.

Uses of either "regulate", "regulation" or "regulator" were almost as numerous, with an average of at least one (1.39 to be precise) appearing on every page. Narrowly behind were references to "police", at 1.3 mentions per page.

Interestingly, uses of "statutory" and "statute" were very slightly ahead of "self-regulation", "self-regulate" and other such variations, with a total of 16 more references to the former than the latter throughout the document as a whole.

"Private" and "privacy" cropped up on four out of every five pages, while "standards" appeared more than once every other page.

"Murdoch" dominated as far as names were concerned, appearing on 44.6% of the report's pages, ahead of "Cameron" (26.6%), "Hunt" (22%), "Blair" (12.2%) and "Brooks" (11.8%) among others.

It has been pointed out that of all 1,987 pages, only one is devoted to the internet. This is mirrored in the search term data, with "internet" appearing on less one page in ten.

Below is a list of the terms we've searched for. Can you spot any interesting topics, people or words we have missed?
Data summary
Pages, words, characters and search terms
Item/search term
     Appearances per page (average)

Pages    1,987
All words    1,026,098
Characters (excluding spaces)    5,795,996
"public"    4804       2.42
"regulate", "regulation", "regulator"    2761       1.39
"police"    2578       1.3
"private" + "privacy"    1583        0.8
"data"    1070       0.54
"standards"    1057       0.53
"Murdoch"    887       0.45
"hacking", "hacked" (excluding Hacked Off)    583       0.29
"fail" (, ...ure, ...ed)    542       0.27
"statutory", "statute"    532       0.27
"Cameron"    529       0.27
"self-regulat" (...e, ...ion, ...ors)    516       0.26
"ethics"    488       0.25
"Hunt" (Jeremy, not Lord Hunt)    437       0.22
"phone hacking"    431       0.22
"success", "succeed"    312       0.16
"Blair" (Tony)    242       0.12
"Brooks"    234       0.12
"legislation"    207       0.1
"internet"    194       0.1

No comments:

Post a Comment