Monday, January 02, 2017

Fact-checking Fake News - "It's easy to lie with statistics; it is easier to lie without them."

What is fact? And what is fiction? What might be seen as a fact by one person is seen as fake news by somebody else. Depending on political orientation and cultural background people quickly categorize news as fake or fact.

When beginning of November 2016 right-wing fanatics constructed “pizzagate”, they were claiming that owners and customers of a popular pizza restaurant in Washington were running a covert pedophile operation, directed by a group of people around Hillary Clinton. The mainstream press agreed that this was a fake news smear campaign constructed to damage Hillary Clinton’s reputation and the liberal agenda. Nonetheless, a significant group of the US population took the rumor at face value, see my previous blogpost.

Even “facts” published in highly respected newspapers such as the New York Times can be seen as fiction by other news media. For instance, in a recent article in the New York Times, whistleblower Ed Snowdon was depicted as a puppet of Russian spy agencies in a report produced by US government agencies.  The report listed various claims by US intelligence agencies as “facts”, which, according to other journalists, were not true.

In God we trust. All others must bring data (W. Edwards Deming)

To make sense out of emerging news and to decide whether to categorize them as fact or fiction, it would be useful to track their origin and identify the main promoters of a particular news item. Harvard statistician Gary King and his colleagues have done as much tracking the flow of fake news in China. According to Chinese urban myths, there are up to 2 million microbloggers in China who are paid “50cent” per post by the Chinese government to drown out critical voices on social media and spread news favorable of the government. In a research paper, King and his team have been identifying the “50cent” microbloggers spreading news supporting the Chinese communist party on Sina Weibo and other Chinese blogs. King and his team grouped the posts into five categories: (1) taunting of foreign countries, (2) argumentative praise, (3) non-argumentative praise, (4) factual reporting, (5) cheerleading. Using sophisticated statistical and machine learning methods mining an e-mail archive leaked from the Internet Propaganda office from Zhanggong district, they showed that these “50cent” bloggers primarily engage in a massive amount of positive cheerleading with little to no central oversight, to some extent debunking the urban myth of a vast shadow army of bloggers at the beck and call of the Chinese government.

However, the key problem with the analysis of Gary King and his team is that the analysis tools they used are so complex that only somebody with a graduate degree in statistics has a chance to understand it, and nobody except the team doing the analysis has the full insight into the results. As Winston Churchill reputedly said “Do not trust any statistics you did not fake yourself.”  The average reader thus has close to zero chance to actually understand why the statisticians came to their conclusion. It therefore boils down to trust: does the reader trust the conclusions of the analyst/statistician/journalist?

Faith-based and Science-based Belief Systems

As has been repeatedly shown, humans are much more likely to trust and accept as true news close to their own beliefs and values. What this means is that it depends very much on the belief system of an individual whether a particular news item is accepted as fact or as fiction.  Each individual has to decide for her or himself what is fact and what is fiction. 

At least in the Western world I therefore group the major belief systems into two opposite stereotypes:
  • Faith-focused: Believing in God, nationalistic, supporting the military, less formal academic education.
  • Science-focused: Believing in science, political correctness, with advanced academic (college) education.

In the US electorate, there is high overlap between the faith-focused segment and Republicans, while the science-focused demographics are more leaning Democrat. As conservative radio show host Rush Limbaugh said “…fake news is the everyday news”. According to Limbaugh,… mainstream media “… they just make it up.”

Tracing the Source of Rumors – Turning it into Fake or Fact

To make up one’s own mind about a new rumor, it is therefore extremely helpful to see who is supporting a particular claim, and find out where it originates. For example, article talk pages on Wikipedia article are an excellent starting point for drilling down on fake news. For instance, this fake news about the Berggruen Institute  - a perfectly legitimate institution - right on the Wikipedia talk page of the Institute claims that the Berggruen institute is a “shill for US intelligence/related functions”.

The following example using Condor Coolhunting illustrates how to find the influencers behind a rumor, in this example about “fake news” itself, and shows how to identify their belief system:

To gain a quick overview of the most influential people tweeting about “fake news” in the sense of Rush Limbaugh, I collected 18,000 tweets on December 27, 2016 with the hashtag #fake2016facts. The picture below shows the retweet network. Note the connected component in the core, with just three people being highly central, and the “asteroid belt” in the periphery of the people whose tweets are being ignored and going into the void.

When running Condor’s influence determination algorithm, which looks at who injects new words into the discussion first, and how quickly these words are picked up by others, we find that the most influential people are not the same as identified in the previous picture. Rather a new group of influencers emerges, which is also part of the connected component in the center, but somewhat more peripheral in the network. Their tweets are picked up by more prominent and popular bloggers, who then spread them in the rest of the twittersphere.

Looking at the content of the tweets about fake2016facts, we find that the tweeters like Trump, Obama, and Jesus (shown in green), and loathe Hillary Clinton, election, Russia, Russians, and (some) Americans (shown in red), but not America. Black words are neutral.

Next I analyzed the contents of the self-description of the people tweeting about fake2016facts. Words like Trump, America, Christian, God, Family, and Mom appear in a positive context (shown in green), while words like conservative, politics, and lists are also popular, but used in a negative (shown in red) context.

To resume, it seems that tweeters about #Fake2016Facts – showing a high distrust of mainstream media - are predominantly part of the faith-based belief system.

GalaxyScope  -  Our Web Tool to Find Influencers

We have created an early prototype of a tool that allows everybody to enter a few keywords describing a “fake news candidate”, and see who has been speaking about it on Twitter, where it was mentioned on Wikipedia, and on which blogs and Websites it prominently appears.  The screen dump below shows the search results for “pizzagate”


Green nodes are Wikipedia pages, orange nodes are Twitter users, and blue nodes are Web sites and people mentioned on Blogs and Web sites.
The picture below shows another fake news candidate, looking at the social media network emerging from the search for “DNC hack”, the suspected break in of the Russian secret service into the e-mail server of the Democratic National Committee right before the 2016 US Presidential Elections.

You can try it out for yourself by visiting “” and clicking on “people scope”. Let me know when you find some interesting fake news networks.

1 comment:

  1. Excellent use of mapping. It is also possible to use a similar approach to mapping the knowledge itself. Here for example, the relatively disconnected ideas in Trump's economic plan show that the plan is unlikely to succeed: