Tuesday, March 04, 2014

How to Measure “Influence” In Social Networks

In social networks, new ideas and thoughts can spread quickly. An analysis of this diffusion can answer various questions, such as the important topics or the speed of their propagation.  It is of particular interest to find people who successfully share their own ideas and concepts. These people can influence others to change their behavior and bring in new terms to the communication network.

This post describes my master's thesis in which the goal lies in finding the most influential people in social networks. The thesis has been written in collaboration between the MIT and the University of Applied Sciences Northwestern Switzerland and the results are now implemented as new functionality in Condor 3. Various communication networks were used as test data to validate the use of this new metric as a meaningful measurement of influence.

Defining influence


Different applications use different definitions for the term “influence”. For example the so-called Klout Score calculates influence based on the number of followers, frequency of retweets and some other factors. Unfortunately, the exact calculation is proprietary, and thus cannot be compared with other values.

Despite these various definitions of influence, each is trying to measure whether a person can cause a certain behavior change in their environment. Often, this behavior is directly visible in the communication network, for example in the form of new discussion topics, retweets or changes in the structure of the network.

Observable behavior changes can be found in the language used by people of the communication network, which changes over time. An influential person is able to introduce new ideas, beliefs and behavior patterns. Therefore, “influence” can be defined as the amount of new terms, concepts, and ideas which a person has introduced into the network and which are subsequently used by other members of the network.

This definition of influence requires the analyses of messages to measure their impact on the receiver. If the receiver of a message d writes new messages soon afterwards, he might have been influenced by the message he received. To determine whether or not a message has been influenced by d, three things need to be checked:
  •          Time difference to d
  •          Similarity to d
  •          Did the user’s behavior change in any way?
Influential messages provoke the receiver to send new messages soon afterwards and those messages use some of the same words. In addition, it should be checked what kind of messages the person usually sends. For example, if someone always talks about apple-products and retweets nearly every tweet from apple, then a new apple-related tweet won’t have much influence on this person. It would be much more relevant, if this person were to suddenly retweet Google who talks about a new feature in Android. In this case, the tweet by Google would be influential, as it even managed to get an apple-fan to tweet about the rival.

Test data

The new metric “influence” has been tested with various networks. The primary use-cases are Twitter and email networks. The following examples provide an overview of how the metric can be used to gain new information about a network.

Twitter: Swiss politicians


In Switzerland, approximately one-third of the Parliament has a Twitter account. But only part of those are interactive and involve many other people in the conversation. Others may have a large amount of followers because of their political profile, but are not important in the twitter network. The measurement of influence shows a good overview of who is active in the network and manages to introduce new topics and hashtags in the network. People who are influential in the network might not be the most famous politicians, but they are important in deciding what topics other politicians talk about.

The color indicates the political party and the node size the influence of the politician.

Twitter: BMW

By fetching all tweets about a given brand, it becomes possible to find important thought leaders who talk about the company or the product. For the brand BMW, a search for the most important twitter accounts in a short period of time (one single day in February 2014) has been done. In this time frame the accounts @BMW and @BMW_Espana are very central in their subnetwork. However, the account BMW_Ocean was more influential, as they talked about a new showroom in Plymouth (England) where new BMW cars were presented. This caused a lot of discussion in the network about the showroom and the new models that were on display there. Even though BMW_Ocean is not very central to the network and doesn’t generate a lot of retweets, it was very successful in conveying their message. Only the metric “influence” accurately represents this fact.
The image on the right shows the interesting part of the network, where Ocean_BMW managed to influence others.
Email: COINs Seminar

The course “Collaborative Innovation Networks”, or COINs in short, involves students from five universities: MIT, SCAD, Aalto University, University of Cologne and University of Bamberg, who participated at the same time in the course. Cross-university project teams were created who worked together for the term/semester. A special feature of this course is, that the students use the Condor software to analyze the email communication within their project teams. All messages are cc’d to a dummy email address throughout the course.

For the analysis every member of the project teams of the course in 2013 and 2012 has been asked the following question:

Who in your team had the greatest influence on the result of your project?

In total, 45 answers from 16 project teams with a total of 84 people were obtained. Since the question can be answered very subjective, the answers in most teams are not unanimous. The data can be used as a comparison to the calculated value of the Influence measure, but it must be noted that some uncertainties exist. Nevertheless, evaluating the results of the participants' responses against the calculated Influence scores for each project team does serve as valid quality check.
Node size represents the amount of inluence in the network.
The results have shown a very strong correlation between the given answers and the results from the influence calculations. In 10 of the 16 teams the person who received the highest Influence score also had the most votes. In three other teams the person with the highest Influence score received at least one vote and only three teams showed no positive correlation between the number of votes and the Influence score. However, in one case not all communication was sent to the dummy Gmail address.

Simple network metrics, such as the Betweenness Centrality would not work in this case, as the people in the project team each sent messages to everyone else. This would result in a Betweenness Centrality of exactly 0 for every project member. Calculation of influence takes into account a lot more information and is therefore very accurate in predicting important members of a network of email communication.

Conclusion


The inclusion of text analysis allows important insights into the analysis of social networks. The calculation of the influence of a single message, and its direct impact on a receiver is a useful extension and generalization of existing approaches, which often work only for individual, predefined networks.

The biggest challenge is addressing the variety of individual network properties that need to be taken into account in order to convert the messages into a common schema for efficient analysis. However, this study demonstrates that these challenges can be overcome and it is possible to trace the diffusion of new ideas, words and concepts among users over time based on the content of their digital communication.

A disadvantage of the method is it is not optimized for a particular network, or for a specific language. The Influence metric calculation assumes that people have not used identified keywords in prior communications, but this assumption may not always be true, because of the lack of a sufficient historical data going further backward in time.

However, the selected test cases have demonstrated that a relatively wide range of possible applications can be covered with meaningful accuracy. Compared to the common structural network measures, the new influence content measure has outperformed them in identifying the influential people in a communication network.


Wednesday, February 19, 2014

Cowbird - it's all about love. Is it?

My friend Geoff Dutton just finished describing our analysis of Cowbird, an online community of storytellers, envisioned as an anti-Facebook by Jonathan Harris of wefeelfine fame.
We basically found that this is an intrinsically motivated group of writers, who use writing as a therapy for soothing the soul. It is a close-knit community, with writers doubling up as readers of the stories of others, which they can "love". It seems that writing on cowbird is done mostly for being "loved". Somewhat worryingly for cowbird, the average number of loves per story seems to be doing down, as is the overall positivity and emotionality of the stories.
Here is Geoff's story about cowbird on cowbird.

Tuesday, February 18, 2014

Wikihistory – Finding the World’s Leaders through the Ages through Wikipedia Social Networks


All software development has been done by Patrick de Boer

The goal of this project at the MIT Center for Collective Intelligence is to create an interactive history book of the most important people of all times from Wikipedia. In a first step towards that goal, we focus on the English Wikipedia, extracting its 800,000 people pages. In future work we intend to repeat this process with other language Wikipedias, to get an understanding of the key influencers over time in different cultures.

In this first prototype created from the English Wikipedia, all people pages are dated, by extracting the dates of birth and of death of each individual. Moreover, the links originating and pointing to their Wikipedia page are gathered. Using this information, 4900 networks through history, from 3000 BC to 1900 CE are calculated, as shown in figure 1. From all the links originating and pointing back to a particular people page, only the links to and from people living at the same time as the person discussed on that page are included.


For instance, in the graph shown in figure 1 above, from all the links to the page about Plutarch, only the links from and to Hadrian, Caesar, and Nero are kept, while the links to Pyrrhus, who died well before Plutarch was born, and the pages to medieval historian Syncellus and modern historian Pisani are ignored as well. Repeating this process leads to 4900 unique networks. For each of these networks, the most central people are determined using the popular PageRank algorithm. To get a second selection criteria among all the influencers, their indegree, i.e. other people pages pointing back to them, is taken. The following list shows the top 50 most influential people of all times, ranked by the Wikipedians, ordered by pagerank and indegree (the number in this list).


name
indegree
PageRank
1
George_W._Bush
4721
1
2
William_Shakespeare
3914
1
3
Sidney_Lee
3093
1
4
Jesus
2176
1
5
Charles_II_of_England
1519
1
6
Aristotle
1400
1
7
Napoleon
1361
1
8
Muhammad
1123
1
9
Charlemagne
949
1
10
Plutarch
925
1
11
Julius_Caesar
890
1
12
William_III_of_England
890
1
13
Homer
820
1
14
Bede
799
1
15
Athanasius_of_Alexandria
775
1
16
Dante_Alighieri
755
1
17
Gautama_Buddha
747
1
18
Tiberius
697
1
19
Cyril_of_Alexandria
684
1
20
Bernard_of_Clairvaux
655
1
21
Moses
645
1
22
Tacitus
610
1
23
Edward_III_of_England
582
1
24
Justinian_I
532
1
25
David
522
1
26
Ashoka
486
1
27
Origen
337
1
28
Septimius_Severus
334
1
29
Polybius
307
1
30
Confucius
302
1
31
Alexander_Severus
278
1
32
Patriarch_Eutychius_of_Alexandria
276
1
33
Tutankhamun
253
1
34
Akhenaten
238
1
35
Ramesses_II
228
1
36
Pope_Benjamin_I_of_Alexandria
172
1
37
Teti
151
1
38
Amenemhat_II
146
1
39
Pepi_II_Neferkare
145
1
40
Merneith
144
1
41
Terence
142
1
42
Cato_the_Elder
141
1
43
Charles_Martel
116
1
44
Gilgamesh
101
1
45
Deborah
89
1
46
Lugalbanda
68
1
47
Kubaba
65
1
48
Fu_Xi
12
1
49
Henry_I_of_England
417
0.986383431
50
Petrarch
254
0.981669694


These influencers consist primarily of politicians (kings and generals, in red), second of religious leaders (black), and third of poets and historians (blue). It seems it pays to be a historian, to write one's own place in history. This is clearly shown by Sidney Lee, a relatively minor Victorian professor of English and history, who wrote 800 biographies.

These networks can now be used to construct snapshots of social networks of the key leaders through the ages. The following picture shows the Wikipedia link network of 3000 BC to 2000 BC.

As we can see, the Egyptian Pharaohs dominate history in that age, complemented by a tight cluster of Sumerian kings, and a cluster of Chinese kings and princes.

Skipping 1000 years ahead, looking at 1000 BC to 0 BC, Alexander the Great is the dominant figure, surrounded by a tight cluster of patricians of the Roman Republic. The Chinese emperors form a group at the top, while the Indian emperor Ashoka is surrounded by other influencers from the Indian subcontinent.


Making another huge leap to 1800 CE, looking at the 19th century, the US takes center stage: Abraham Lincoln is the most influential person, surrounded by a roster of US poets and scientists. Queen Victoria, other European policians and scientists form their own, smaller  and less tight-knit cluster, while Chinese and Southeast Asian kings occupy comparatively peripheral positions.



We can also combine these networks in a movie over centuries, below is the world’s leaders from year 0 to year 500, calculated with Condor. As the movie shows, in that age we have two dominant clusters with the Roman and Chinese emperors in the center. From 200 to 300 CE, the Chinese Golden Age of the Han dynasty, the Chinese cluster clearly surpasses the Roman cluster.



If there is one lesson from this preliminary experiment, it is the disproportionally huge role of the historians. Not only is a minor 19th century biographer under the top 10 influencers of all times (which is of course more an artifact of our collection method), but also classical historians like Polybius, Tacitus, and Plutarch get very high ranks. Treating biographers and historians well so they write positively about world leaders is of course no new insight, for instance Roman emperor Vespasian was paying historians  Tacitus, Suetonius, Josephus and Pliny the Elder, in return they speak suspiciously well about him, shaping his image in history. Caesar and Winston Churchill took this concept one step further, writing their history themselves. As todays history is written in Wikipedia, the conclusion seems obvious: treat Wikipedians well!