It’s more than 2 months ago since I blogged about the groovytweets status, but there have been numerous minor updates and improvements. The friends list (the Twitter users we collect tweets from) has been expanded to ~422 followers (and likely more when you read this), the regular expressions used to decide if a tweet is ‘groovy’ has been adapted to the changing groovy universe (like gparallelizer renamed to gpars or following vmware news now), and so it goes on.
But the real meat is a bit behind the scenes. The features you’re likely to see quite early are language detection (filtering by language) and a new link ranking. I still have to improve the quality of language detection as tweets often use English terms even if the tweet itself would be written in a different language. Larger texts submitted to the Google Translation API of course yield better results; tweets just having 140 characters makes this a bit harder.
The groovy link ranking feature can already be seen live in an early version. I am now collecting the links and tracking their usage in tweets the same as retweets for tweet. The nice thing is that I am tracking the final URL, so if someone used bit.ly to create a short version of a URL I am actually following the redirects to find the final destination. Next, I am prepared to limit the links shown in the UI to the last weeks (2 currently) and in addition the relevancy of the links degrades over time. This means a link from today having 5 mentions in the groovy community will eventually be higher ranked than a link from yesterday having 6 mentions, simply becuase time is an important factor for relevancy.
The real real change for groovytweets is yet to come though. As you might have heard, the new Twitter Retweet API is on it’s way. It has been changed multiple times now, based on a lot of user input flowing to Twitter and hopefully even mine. It will fundamentally change how Twitter aggregators/relevancy tools can count retweets. For now a Retweet was a community-agreed syntax, like RT @originaluser text. In groovytweets code I was analyzing each incoming Tweet to decide if it fits in one of the many retweet syntaxes and tried to find the original tweet, then tried to look that tweet up and increase the relevancy.
Well, now Twitter is making the Retweet an official concept of Twitter. They even give you a new API method to look up the total retweets of a tweet, which sounds great. The downside is that each Twitter account may currently use 150 API calls per hour. If I wanted to update 50 tweets displayed on the groovytweets homepage every minute, this means 50 Tweets * 60 Calls per hour = 3000 calls per hour. Well, I got 150. An that is not including the minutely check on new tweets coming from groovytweets friends. So: we’re in trouble here. One solution would be to get whitelisted for more API calls, but there is a better one (or two).
The one solution I still got some hope for is that Twitter will simply include a retweet count with each Tweet. The problem here, I guess, is that I am interested in the retweets within a specific community only. And providing the count only for *my* friends instead of a global retweet count (which is way less relavant some might argue) might potentially be a pretty resource intensive task for them.
The next and more likely solution involves using the Twitter Streaming API. The good thing about the API is that it will show retweets. Although the API just changed again, making the Retweet now the top element instead of the Tweet (and including a retweet_details element), it is then very easy to detect a Retweet. The bad news: Groovytweets is hosted on Google Appengine, and Appengine kills each request after about 30 seconds. So I invested some time finding a cheap vServer on which I open a permanent streaming connection to Twitter. I will then call an API over on groovytweets to feed the retweet information into the app. This splits the system into two parts, which I wanted to avoid, but it looks like the best solution.
Follow me @hansamann to get the news as it happens.