Next Youngsday








Home / Uncategorized / Detecting Opinion Leaders in Twitter

Detecting Opinion Leaders in Twitter

Today, I want to share with you a project I worked on recently: detecting opinion leaders in Twitter.

Let me first introduce the concept of “opinion leaders”.  An opinion leader is an influencer in his own social network that increases the traffic and usage of the entity he shares on the web. For instance, in case of Twitter, when an opinion leader starts a hashtag or uses a hashtag already started by another user, hashtag’s occurrence increases.

The concept of “opinion leaders” has emerged in 1940s and was further developed through the studies of Katz and Lazarsfeld.  These authors have generated the model of “two-step flow” in communication.  According to this model, mass media information is channeled to the “masses” through opinion leadership.

The opinion leaders in social media, today, have a significant role in the world economics,  especially in consumer economics, due to the following cycle:

– Consumers are connected through social media that consists of word of mouth  (WOM) communication links.

– The opinion leader shares his up to date consumption experience on a certain   product to his network neighbors. (to his “followers” in case of Twitter)

– The “Bayesian belief networks”, graphical tools that aid decision making under uncertainty, depend on the Bayes’ theorem which suggests the following:

p(X|Y)  =  (p(X) *p(Y|X)) /p(Y)

More over, the above equation gives the probability of X happening, given that Y has happened.

Suppose we have the hypothesis H, evidence E (for our hypothesis) and context  C (for the evidence).

p(H|EC) =  ( p(H| C) * p(E|HC) ) / p(E|C)

p(H|EC) =  ( p(H| C) * p(E|HC) ) /  ( p(E|HC) *p(H|C) +p(E|~HC) * p(~H|C))

Furthermore, the dependency of the hypothesis on the evidence in the given context is measured.  Therefore, Bayesian approach in social media proposes that there are key actors (opinion leaders) that have an influence in decision-making.  As a result, there is a  possibility that the followers’ purchase decision will change according the consumption experience of  the opinion leader.

Bearing in mind the significance of social media and my special interest, I started my project with the following hypothesis: “A keyword (hashtag) on twitter experiences an increase on traffic when used by an opinion leader

Then I followed the methodology below to prove/disprove my hypothesis:

1. Collect data

2. Generate a traffic graph for each data set

3. According to the traffic graph, detect a peak before the instant the relative hashtag has become a trend topic and determine a range just before the abovem-entioned peak

4. For each data set, compare the Twitter users in the abovementioned ranges to find the common users.

Here is the pseudo code for the first step:

Data is collected through the use of “Twitter Search API”.
For all tweet in searchResult.ResponseObject  
      oSheet.Cells[querycounter,1] <- tweet.CreatedDate 
      oSheet.Cells[querycounter,2] <-tweet.UserName 
      oSheet.Cells[querycounter,3] <- tweet.Text  
   if minute = prevminute then     
      minutecounter <- minutecounter + 1 
   else then   
      oSheet.Cells[minuteindex,7] <- tweet.CreatedDate   
      oSheet.Cells[minuteindex,8] <- minuteindex   
      oSheet.Cells[minuteindex,9] <- minutecounter + 1   
   minuteindex <- minuteindex + 1    
   minutecounter <- 0 
   Prevminute <- minute 
   lastId <- tweet.Id 
   querycounter <- querycounter + 1
In order to get concrete results, I chose a specific topic to collect related data.  Since soccer is a major topic of discussion among Twitter users in Turkey, I decided to work on a Turkish soccer team: Fenerbahçe.  Here is a data set:
The traffic data is plotted on a graph through Matlab as seen below: 

I generated two different methods for defecting a peak:

Method 1
Find time x such that 2x≥x+1
 datalength = i;
            int peakindex = 0;
            int j=0;
                if (deneme[j] >= deneme[j – 1] * 2)
                    peakindex = j+1;
Method 2
We first detect the highest peak in the traffic graph.  The instant t of the peak is given the index x.  Then the tweet data starting from the time  t1 corresponding to 0.9x up until t is collected
while (excelApp.Cells[i,9].Value != null)
                 deneme[i-1]= excelApp.Cells[i, 9].Value;
                 if (deneme[i – 1] > max)
                     maxindex = i-1;
                     max = deneme[i – 1];
halfstart = maxindex * 9 / 10;

 The next step was to compare the data sets.  The users that used 2 or more of the hashtags out of 55 hashtags in the data sets are detected.

 The table on the left is a sample from the document “Memberlist” which consists of data collected from the determined time range in Step 3. The highlighted users have used at least 2 of the 55 hashtags.


In total, I analyzed 55 data set; each including a different hashtag and 68784 data, each data consisting of the user, the time and the tweet.
Results of Method 1 project_6
•There are 280 users who have used at least 2 of the 55 hashtags during the critical time ranges.
•There are 83 users who have used at least 3 of the 55 hashtags during the critical time ranges.
•There are 14 users who have used at least 4 of the 55 hashtags during the critical ranges.
•There are 4 users have used at least 6 of the 55 hashtags during the critical ranges.
Results of Method 2
•There are 239 users who have used at least 2 of the 55 hashtags during the critical time ranges.
•There are 49 users who have used at least 3 of the 55 hashtags during the critical time ranges.
•There are 14 users who have used at least 4 of the 55 hashtags during the critical ranges.
•There are 4 users have used at least 5 of the 55 hashtags during the critical ranges.
•There are 2 users have used 6 of the 55 hashtags during the critical ranges.
In both methods, I came to the conclusion that there are 4 “most potential” opinion leaders, and 3 of them are common.

So, the results above proved my hypothesis.  The same analysis can be applied to any area of interest including but not limited to politics in general, women rights, sports in other countries, the Arab spring, series on TV, shopping, etc.  Detecting opinion leaders will also help companies with  their  marketing  activities.   For example, if a similar work to what we have done so far is repeated on the consumption of cosmetics in Turkey;  the related companies will use the data and reach out the relative opinion leaders to spread their campaigns and get significant customer feedback.  Similarly, if you are planning to do a marketing activity for your new e-commerce website for a niche segment, this should definitely be your starting point. You’ll save money! You’ll save time and energy!


Pinar Bilgic
a novice poet who describes herself as an innovative and a creative thinker ; founder of; member of SOGLA (Social Entrepreneurs Young Leaders Academy); almost an entrepreneur


  1. Topic looks interesting but I don’t like math!! Could you please tell it more simple next time?? 🙂

  2. Great article, thanks! There would be further interesting findings if you could include some demographics in this exercise. Also, scope of this exercise could be enlarged by using the time (date) of signing up and number of followers that opinion leaders have.

  3. from beginning to end, i found the article very interesting. twitter definitely differs from facebook from what you’ve written here.. i don’t think it’s possible to come to such a concrete result with facebook.. twitter user have a lot more consciousness and a strategic approach.

  4. Companies will actually have a great use of such an analysis. I’m working for one of the biggest FMCG companies and unfortuantely, in our marketing activities through twitter, we fail to reach our target audience. It’s just a waste of money and waste of time. I’ll share this article with my colleagues at our IT department..

  5. Great! I liked it 🙂

  6. Any other solid examples?

  7. it’s funny that a female has done a project on soccer 🙂 cool!

  8. the twitter api you’ve used is sort of problematic. as far as i know, it doesn’t provide the full data, but gives a random set.

  9. serdar ertugrul

    +1 thomoson a

  10. if this was a school project, i wanna go to that school too

Scroll To Top
Sign up for our Newsletter to keep updated for

Enter your email and stay on top of things,

Youngsday on Twitter!
Follow us on Twitter!