Littera Deusto

Modern Languages, Basque Studies and Humanities

Information mining with Twitter

octubre 11th, 2010 · No hay Comentarios

People from over the world are constantly making posts on Twitter, sharing details of their personal life, comments on recent events, their opinions about any subject, etc. The fact that all of this information is publicly available can have many advantages, specially when we find it organized by hashtags. Ross (2010) describes hashtags as “a simple way of grouping messages with a ‘#’ sign followed by a name or code which forms a unique tag for a specific purpose.”

Indeed, if we can locate which tweets are connected to the topic we are interested into, it is possible to inform ourselves concerning the latest news or even to follow a conference which we are not attending. Researchers can study how people communicate with each other, what type of concerns they have and how do they express them. Companies may look for hashtags related to a particular brand or item in order to access their customers’ responses to different products.

Although some hashtags are both easily identifiable and predictable (e.g. #disney), certain tags formed by confusing acronyms or made-up words might not become as successful. A popular site that lets you look for the real-time statistics on the usage of any hashtag is hashtags.org. The picture below shows an example of the type of graph it generates:

Though hashtagging serves as a good method to collect tweets related by theme, there are other online tools which can help us to analyze data by taking into account the entirety of the twitter message. This includes emoticons, which are far from being meaningless. Bifet (2010), writing about the opportunities that online streaming of thoughts offers to opinion mining studies, considers that:

Sentiment analysis can be cast as a classification problem where the task is to classify messages into two categories depending on whether they convey positive or negative feelings. (…) Labeling tweets manually as positive or negative is a laborious and expensive, if not impossible, task. However, a significant advantage of Twitter data is that many tweets have author-provided sentiment indicators: changing sentiment is implicit in the use of various types of emoticons. Hence we may use these to label our training data. Smileys or emoticons are visual cues that are associated with emotional states.

Here are some user-friendly services which anybody can employ to investigate the contents posted on Twitter:

Twitter Search
Twitter’s official search engine, which looks for tweets containing any keyword you may enter. The advanced search form and operators are even more precise, allowing users to filter results by author, language, place, date, whether they contain links or not, and others.

Twitter Venn
It lets you introduce two or three keywords and checks how frequently they coincide on the same tweet. It also shows which other words each is most often accompanied by.

Trendsmap
A world map that shows which topics are the most popular according to keywords, depending on the geographical position.

Twitter Spectrum

It lets you introduce two different keywords so that you can compare which words each is usually associated with.

TweetVolume
It allows you to type up to three keywords and search how often they come up on Twitter. You can decide whether to examine the past day, past week or past year.

Twitt(url)y
A site that takes note of every URL tweeted and ranks the 100 most popular ones.

References

Etiquetas:

  • Etiquetas