Graph your Twitter network with Gephi

by Guido García on 21/01/2012

Gephi is a really cool open-source (GPL) project for visualizing and analyzing network graphs.

Getting started

If you want to start using Gephi you have two choices:

  • The blue pill, a simple GUI, pretty easy to use, that offers many network metrics, statistical algorithms (clustering, etc) to analyze your own graphs. The story ends.
  • The red pill, the Gephi Toolkit. The toolkit is a standard Java library that can be integrated with your own code if you need to analyze graphs. That is what we need.

Both of them are really modular, and can also be extended with big variety of available plugins, contributed by third party developers.

Crawling your Twitter network

Twitter offers a REST API. In this case I have used Spring Social, an extension of the Spring Framework that simplifies the connection with social networks such as Linkedin, Facebook or Twitter (docs).

For example, obtaining the list of followers of a given user is something as simple as:

TwitterTemplate twitter = new TwitterTemplate();
List<TwitterProfile> result =
        twitter.friendOperations().getFollowers("palmerabollo");

The bad news is that the API is rate limited to 150 requests/hour, and you can increase it if you use OAuth to authenticate your requests. This limit is too low (I think Twitter is trying to protect their data from being extracted) and forced me to introduce a basic cache layer (just a Map) to save some requests. The cache is coupled with the application code, it could be certainly improved.

Final result

You can find the code at github. Remember that this is a proof of concept, so it can be improved a lot. I am waiting for your pull requests.

Lessons learned and future work

  • Twitter API is very restrictive. A cache tries to solve this problem but it is still not possible to retrieve the deepest levels of your network. It would be fun to use a NOSQL graph database such as neo4j, that even has a Gephi plugin available, to store the data. This question I asked in the Gephi Forum is a good starting point if you are interested on it.
  • Gephi Toolkit and its plugins are not available from a Maven repository, so I included them as system libraries in my pom.xml. This is the first time I do it, and it is probably a bad practice. I would like to hear your opinion about this.
  • Gephi is a really interesting and powerful project, but the documentation and examples could be improved.
  • It would be very interesting to play with other Gephi advanced features such as filtering, clustering. For example, to detect “communities” or to detect the most influential nodes in your network.
  • Spring Social simplifies your life, offering a common interface to work with different social networks, saving you the pain and hassle of dealing with every different API out there.

By the way, if anyone else is interested in offering this service through a web interface, please let me know.

 
5 Likes
Hold on

There are 4 comments in this article:

  1. 5/12/2012Chris Kang says:

    I am interested in trying out your code. Could you e-mail me or write a detailed walk through on how to use your code from Github? Thanks. (I am still learning how to use Gephi).

  2. 5/12/2012Guido says:

    I have just added the three required steps to the README in github. Please, let me know if that answers your question. Thank you for your interest and do not forget to contribute to the project if you can !

  3. 28/03/2013John says:

    Hello! This is a great starting point for what I’m looking to do with Gephi. I’m taking a Social Network Analysis class on Coursera.org that might help you too, in terms of learning more about how to use the tool.

  4. 15/03/2014Check Yo Self: 5 Things You Should Know About Data Science says:

    […] If you’re making Gephi graphs out of tweets, you’re probably doing more data science marketing than data science analytics. And stop […]

Write a comment: