Friday, December 24, 2010

Latest News Through Wikipedia - Wikipedians are the real Citizen Journalists














People have long predicted the demise of traditional news media and the rise of the citizen journalists. Various initiatives have tried to create new media outlets on the Web, Blog, and Twitter powered by creative swarms of hobby journalists - but none of them has been a breakthrough success so far. Well, it turns out that there is such a citizen Web site, venerable old Wikipedia!

In a series of earlier projects we have analyzed collaboration among Wikipedia authors when creating new Wikipedia articles, for example studying how they collaborate as COINs in different cultures (http://www.ickn.org/documents/COINS2010_Nemoto_Gloor.pdf).
In our current project we are creating a map based on Who-works-with-whom-on-Wikipedia (the "W5-map"). We build a semantic network of concepts by constructing a link between two Wikipedia articles if the same author has worked on both articles. This W5-map shows us to what kind of articles the swarm flocks to. By repeating this process for every month in 2010 we are able to see how the W5-map changes over time.

As the whole Wikipedia includes millions of article, drawing a whole map of Wikipedia in one step is too much. Instead we employed a "snowball sampling" method, which allows us to draw a partial map by selecting a start article or editor. For our first experiment, we used the article about "Wikipedia" as the starting point. We collected the top 10 editors based on the number of edits on this article, then we gathered the top 10 articles of each editor. We repeated this steps recursively up to 3 degrees of separation from the start point. Restricting this analysis to a certain period of time (e.g. one month starting Jan. 1 2010), permits us to obtain a temporal W5 map from this start point. Applying this process repeatedly we calculated 11 snapshots of one month from Jan. 2010 to Nov. 2010. Each node corresponds to an article in Wikipedia. We draw an edge between articles A and B if there are at least 2 editors who made edits both on article A and article B.

The pictures below show our results. Each map was drawn by Gephi, and the size of the article title was determined by the undirected PageRank score of the W5 network. The major topics (based on PageRank Score) for each month are shown below. Surprisingly they reflect the major news item of the month:

Jan. 2010: 2010 Haiti earthquake
Feb. 2010: 2010 Winter Olympics
Mar. 2010: 2010 Polish Air Force Tu-154 crash
Apr. 2010: Telephone (song)
May. 2010: Gaza flotilla raid
Jun. 2010: 2010 FIFA World Cup
Jul. 2010: 2010 FIFA World Cup
Aug. 2010: 2010 Israel-Lebanon border clash
Sep. 2010: 2010 Atlantic hurricane season
Oct. 2010: Copiapo mining accident
Nov. 2010: United States diplomatic cables leak

Furthermore, we can also find clusters of articles, representing a group of similar topics (e.g. a cluster on Lady Gaga or on WikiLeaks).

This means that groups of similarly minded Wikipedians tend to aggregate around a set of articles on a topic they are most interested in.


Looking at Nov. 2010, the United States diplomatic cables leak was strongly connected to WikiLeaks and Julian Assange, which makes perfect sense because both of them are part of the WikiLeaks dispute. Bombardment of Yeonpyeong had many edges from the WikiLeaks cluster while there were no edges from the 2010 Asian Games cluster, which means that Wikipedians working on the Bombardment of Yeonpyeong are interested in the diplomatic problem, not in the topics in Asia.

Our preliminary investigation suggests that looking at Wikipedia through the W5 map might be a new way to identify latest news. We find the news of the world even if we start from a neutral article such as the one about "Wikipedia". The swarm of Wikipedians seems to be a perfect group of coolhunters and citizen journalists to report latest news on politics, celebrities, and sports.

Sunday, December 19, 2010

How Much Are People Smiling in the US, Germany, and Switzerland?

Who are happier, people in Switzerland, in Germany, or in the US? To answer this question, I looked at the use of smiley’s in Twitter tweets – smileys are those emoticons used to express one’s emotions like
:) smile
:D big grin
:( sad, frown
:P sticking the tongue out, “raspberry”

My hypothesis is that the larger the fraction of happy smileys :) and :D in all tweets containing emoticons is, the happier people in this region are.

Using Condor’s Twitter collector, I collected 24 hours worth of tweets containing the smileys listed above in 6 cities in three countries: New York and Los Angeles (USA), Berlin and Hamburg (Germany) and Zurich and Berne (Switzerland). I collected all tweets inside a radius of 25 kilometers around the geocoordinates of these 6 cities returned by Google.

The table below lists the results, showing the number of people using each emoticon in each city, as well as the betweenness centrality of the emoticon in the social network of people using it.

As we can see, there are not too many people tweeting in Berne, compared to the people in New York, which makes perfect sense, considering the number of inhabitants of Berne (130,000) compared to New York’s 19 million.

I constructed the retweet network in Condor, drawing a link from person A to person B, if B retweeted A (see network picture above). The table only lists the number of people, ignoring the number of tweets per person, as I was interested in the emotional state of each person.
The picture below visualizes the results. Percentages in the pie charts for each type of smiley are based on betweenness centrality of the people using these smileys. This also accounts for the influence of somebody who for example used two different types of smileys and is being retweeted a lot.

A few things immediately stand out:

(1) The Europeans seem much happier than the Americans!
(2) Germans seem slightly happier than the Swiss, although not by much.
(3) People in Hamburg are the happiest (68% happy smileys ":)" and ":D"), followed by the people in Zurich.
(4) People in Berne have the biggest smile (30% have ":D").
(5) People in New York are the least happy (23% of ":(") with a large margin to all other cities.
(6) People in LA are the most skeptical (27% sticking their tongue out ":P").

When looking at the most active tweeter in each of the cities, it is amazing that most are young girls and artists mostly from Indonesia. For example the most emotional person in Hamburg (130 tweets) is “Bijiganja”, an Indonesian singer and “sinner”, as can be read on his profile on Myspace. The most emotional tweeter in Berne is a girl from Brazil. This means that the good mood in Switzerland and Germany might actually be imported from other regions of the World, where people traditionally are more extrovert than the somewhat reserved Germans and Swiss.

This is very different in the US. In New York, the most active emotional tweeter is a disc jockey and radio host, mostly promoting himself, while Actress and singer ciara is the most active tweeter in LA. This shows that Twitter in the US seems to be much more used as a platform for (commercial) self-promotion, although not a particularly happy one!
Let’s hope that the mood will pick up also in the US – after all there are a lot of people from Asia and Latin America here that might improve the collective mood!