Sup,
I have been collecting tweets for 4 days now, using a app that I haven coding for the last 5 months.
The reason why I did this app was because I wanted to make user-based recommendations, and other types of data mining using Mahout, and I didnt find enough data for my experiments.
About the app
I am using a single 8GB-ram Centos Server hosted on the Rackspace cloud with a cost of less than 15 dollars per day. It can process up to 100 (90 - 105) twitter profiles per second. It works with a average of 2GB of ram and 90% CPU. Its completely fault tolerant. It can process other social networks as well using a simple parse-template.
I was able to collect 90+ million tweets from more than 6 million -- the db has 20MM users -- users using JAVA, memcache, mysql, php (visualization), a non ACID architecture, using a object-like structure (no-sql?).
I hope this datasets helps you get into the big data world.
The current sql dump is too big (66GB) to put in one of my servers so please skypeme:calufaxp or email me calufa{a}gmail.com if you want the data. BTW, the data is FREE...
If anyone has a server where I can upload this sql and let others download it let me know.