Monday, December 12, 2016

Can Social Media Analysis Debunk Fake News? – Analyzing “Pizzagate”

While fake news is nothing new – according to rumors Elvis Presley is still alive, and Bigfoot has been sighted numerous times – social media allows susceptible people to spread unfounded and wrong rumors at the speed of light. Spreading false and damaging news is a proven and tested campaign strategy of fanatics during elections. The “swift boat veterans for truth” campaign by conservaties falsely claiming that 2004 democratic presidential candidate John Kerry showed dishonest behavior in the Vietnam war was seen as a key factor in swing states contributing to John Kerry’s defeat. 

At the end of the 2016 US  Presidential elections, in early November an even more absurd claim was made, accusing Hillary Clinton to run a pedophile ring out of a pizza restaurant in Washington. Called “pizzagate”, it became a favorite call to arms among right-wing extremists and Donald Trump supporters, leading one incensed fanatic to drive a few hundred miles from Salisbury, North Carolina to Washington DC, and firing his automatic gun in the pizza restaurant. As most of this rumor spreading was (and still is) happening on social media, I was curious to see if I could identify some discernible characteristics of fake news on Twitter and the Web, using our social mediaanalysis tool Condor and Coolhunting.

I started by creating the Wikipedia link map around the pizzagate article on Wikipedia. The network below shows the links of Pizzagate to Donald Trump, as well as to Michael Flynn, the son of a Trump campaign team member, who was dismissed after a tweet supporting the conspiracy theory.

I then proceeded to collect 18,000 tweets about “pizzagate” on December 10, 2016 at about 2pm. As there was feverish tweeting, the last 18,000 tweets only covered about 8 hours. There were between 40 and 120 tweets per minute about pizzagate in this time. The sentiment of the tweets, not surprisingly considering the grisly topic, was rather negative, hovering around 0.4 (sentiment of 0.5 would be neutral, from 0.5 to 1 would be positive.)

The next picture shows the word cloud generated from the 18,000 tweets, most of the words are dark red, indicating that they are used in negative context.  The word “Clinton” is in dark red, as the tweeters are mostly accusing Hillary Clinton to molest little children. The word “Trump” stands out in green, as they see him as the savior.
The picture below shows the twitter network, each node is a person tweeting, a link between two people means either that one person is retweeting a tweet sent by the other person, or is mentioning the other person in a tweet.
There is a large cluster in the center of the network, made up of believers in the fake news.  They are reinforcing each other, and increasing the traffic in their echo chamber.  The few supporters of Hillary, trying to debunk the fake news, are pushed aside, their tweets are ignored by the large echo chamber of conspiracy theory believers. The people in the periphery (the “asteroid belt”) are tweeting into the void, as their tweets are ignored from friends and foes alike.

Using Condor’s influencer algorithm reinforces this picture. Condor’s influencer algorithm makes somebody an influencer, if the words she or he is using, are picked up by others and spread quickly through the network. As the picture below shows, there is just one voice of reason left, while the proponents of pizzagate reinforce each other even more, with a cluster of influential spreaders of wild ideas in the center, and other conspiratorialists in the periphery of the cluster, being retweeted by hundreds of others (shown as “parachutes” in the graph).

Comparing Fake News with Real News

Next I wanted to explore if the network characteristics of real (true) news differ from fake news. As a real news event, I chose the protests currently going on in North Dakota regarding an oil pipeline that should go from oil fields in North Dakota to Southern Illinois, crossing two rivers along the way, as well as sacred burial grounds of the Sioux Indians. 

As the Google Trends chart below illustrates, the two events draw comparable numbers of search requests, with “pizzagate” slightly trending ahead in search of “dakota access pipeline” most of the time.
To drill down on the comparison, I collected 18,000 tweets about “nodapl”, the Twitter hashtag of the campaigners against the Dakota Access Pipeline. As the activity and sentiment chart below shows, there are a comparable number of tweets to “pizzagate”, ranging from 25 to 110 per minute over the 14-hour period covered by the 18,000 tweets. The sentiment, however, is more positive than for pizzagate, mostly hovering in the positive clearly above the 0.5 neutral sentiment mark.

The word cloud about “nodapl” confirms this analysis, with most keywords being embedded into a positive (green) context. Only the perceived abuses of the police when fighting with the protesters show a negative context.
The retweet network is quite different from the pizzagate network, in that there is just one large connected component in the center of the graph, where the protesters are exchanging tweets. Other than for pizzagate there does not seem to be a war between two armies, the “pros” and the “cons”. The “asteroid belt” of “unrequited love”, that is of tweets that went into the void and did not elicit any retweet or response, is smaller for the Dakota Access Pipeline than for pizzagate.
The Twitter influencer network of “nodapl” is of comparable size to the pizzagate influencer network, both are shown in the figure below.
The density of the pizzagate network is somewhat higher than for nodapl, meaning that the fake-news-spreaders are connected more tightly, and heat each other up. On the other hand, degree centrality and also betweenness centrality of the pizzagate network is much lower, which means that there are no strong leaders, but a lot of communication among a wide group of information spreaders. This is different in the “nodapl” network, where a small group of activists dominate the discussion.


Comparing the (fake) news network against a brand network

To further understand the differences in retweeting behavior, I next compared fake and real news against the perception on the Internet of a famous brand, for this analysis I chose “adidas”. The chart below shows the Google Trends results comparing the three search terms. People are searching for “Adidas” much more than for “pizzagate” or “Dakota Access Pipeline”.
I next downloaded a comparable number of tweets about “adidas. Collecting 18,000 tweets about “Adidas” gave me slightly more than 6 hours worth of tweets, sent at a frequency of about 60 tweets per minute. The sentiment of these tweets is mostly positive (the blue line in the chart below is hovering between 0.6 and 0.7).
The word cloud below confirms the positive impression, with almost all words highly green (the more green, the more positive is the context of the word in the tweets). The only negative tweets are about “pirated” Adidas sneakers.
The brand retweet network is widely different from the two news networks above. Many more tweets go into the void, i.e. they are in the “asteroid belt”, and are never retweeted, and most likely never read. Many of these tweets are from Adidas vendors who are trying to peddle their shoes (with obviously limited) success this way.  The most central tweets, which are retweeted most frequently, are from celebrity athletes, most likely those sponsored by Adidas.
The twitter influencer brand network is even more different from the news influencer network. There are basically no influentials in this network, other than the official twitter account of Adidas, which is somewhat influential, but pales in comparison to the fake news spreaders of pizzagate, and the liberal activities of nodapl.
On the plus side for adidas, there is almost no negative sentiment around this brand. I filtered the 18,000 tweets for the ones with sentiment less than 0.5, i.e. with negative words. As the word cloud below illustrates, even among these words the key sentence is “adidas is killing the game”, which is meant as a compliment to Adidas that its sneakers are so good that they beat all competitors by a mile.
This impression is confirmed in the retweet network picture below, where all negative tweets are never picked up or retweeted, with the exception of the compliment “Adidas is killing the game”.
In the end I used Condor’s Web fetcher to measure the strength of the 4 different brands, also including “Donald Trump” and “Hillary Clinton” into the analysis.
Pizzagate and Trump are the strongest brands, “Dakota Access Pipeline” is the weakest. The most boosting news outlet is the Wallstreet Journal.

So, coming back to our original question, can we automatically identify fake news through social media network analysis? 
The answer is, unfortunately not directly. However, we can quickly identify the echochamber, which is brewing the fake news. We can also easily identify the most influential conspiratorialists who are the super spreaders responsible for the stickiness of the fake news. In addition we can bring the conflict between promoters and detractors of a disputed fact into the open, clearly showing size and dynamics of the two opposing camps.
And finally, strong popular brands like Adidas can also learn from the passion of pizzagate believers and Dakota Access Pipeline activists how not to dilute their brands.