Archive for the ‘code’ Category
Being Thrifty with Thrift
I’ve been using Thrift a RPC framework from Facebook. It allows you to define a service outline and generate a template in various languages which you can then use to create the service. What this means is you can say use Haskell for you server and hook it up with a Ruby client or vice versa.
I installed thrift on my Mac from the subversion repository. Here are the commands to install the pre-reqs and thrift itself assuming you have MacPorts.
sudo port install boost libevent
svn co https://svn.apache.org/repos/asf/incubator/thrift/trunk thrift
cd thrift
./bootstrap
./configure --prefix=/opt/local/
make
make install
Now that’s settled here is an example of a thrift definition file. Named myauth.thrift
namespace rb MyAuth
namespace py myauth
struct User {
1: string username,
2: string password
}
enum LoginStatus {
SUCCESS,
FAIL
}
service Authentication {
string say_hello(),
LoginStatus login(1:User cred)
}
As you can see this has several components to it. The first the namespace definitions. These are the modules/packages/namespaces which this service belongs to. You can define one for each of the languages that you are going to code for. The second is the User struct which holds the data that we will be working with in the service, third the enums and finally the service definition. The service definition has two methods: say_hello which returns a string and login which returns a LoginStatus taking in a User struct as an argument.
This only uses a subset of the definition syntax to show a small example of what you can do with the service. To see a more thorough example of the definition file go to the Thrift Wiki.
Once you have written the definition compile the file (here I’m going to use Ruby and Python):
thrift --gen rb --gen py myauth.thrift
Now write the server code (in ruby):
require 'thrift'
$:.push('gen-rb')
require 'Authentication'
require 'myauth_constants'
class AuthenticationHandler
def say_hello
puts "thrift client connected"
"hello thrift client"
end
def login cred
if cred.username == 'hello' && cred.password == 'world'
puts "logged in"
return MyAuth::LoginStatus::SUCCESS
end
puts "great pie of fail"
MyAuth::LoginStatus::FAIL
end
end
handler = AuthenticationHandler.new
processor = MyAuth::Authentication::Processor.new(handler)
transport = Thrift::ServerSocket.new(9090)
transportFactory = Thrift::BufferedTransportFactory.new()
server = Thrift::SimpleServer.new(processor, transport, transportFactory)
puts "Starting the server..."
server.serve()
puts "done."
Write the client code (in python):
import sys
sys.path.append('gen-py')
from myauth import Authentication
from myauth.constants import *
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
transport = TSocket.TSocket('localhost', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
auth = Authentication.Client(protocol)
transport.open()
print auth.say_hello()
user = User()
user.username = 'hello'
user.password = 'world'
print "Login: %s" % auth.login(user)
user2 = User()
user2.username = 'failed'
user2.password = 'world'
print "Login: %s" % auth.login(user2)
What thrift gives you is a transport layer as well as a protocol layer so you don’t have to worry about mucking around with sockets. There are also several protocol layers defined so you could use a JSON protocol for example.
All in all this is a useful technology for using the right tool for the job, i.e the right programming language for the job. By working with different languages thrift gives the programmer the capability of writing different services using the strengths of the appropriate language.
Hadoop Pig – IP Access Count Script
This is a Pig Latin script to count the number of times a person has accessed a site. It is a very simple example which took me a while to figure out… I guess being a History major my coding has gotten a bit rusty.
A = load 'access.log' using PigStorage(' ');
site_access = foreach A generate $1, $0, 1;
access_mapped = group site_access by ($0, $1);
access_reduced = foreach access_mapped generate $0, COUNT($1);
dump access_reduced;
Google App Engine
Google makes its foray into web infrastructure with Google App Engine. So what the cool and what’s not?
The cool thing is that it gives you access to to Google’s Authentication System, a DataStore that is probably using BigTable, and there is a “free account [which] can use up to 500MB of persistent storage and enough CPU and bandwidth for about 5 million page views a month.” Also, it has support for Django which is a pretty nice web framework.
The not so cool things are that it is currently limited to Python, which I don’t really mind, but Ruby and other languages would be nice to have. Further, Django’s model framework is useless if you are trying to deploy an app on here and elsewhere. Hopefully, Django will add that abstraction sometime soon.
LaTeX for Humanity Papers
Long have I searched for the perfect document management system. That one jewel that would save me from the tortures of managing notes and thoughts side by side with my writing. A tool that’d let me manage many things and be as simple as a text file.
At long last, have I (re)found such a system. Maybe I didn’t get it the first time, maybe I didn’t know it would be so wonderful, but I have (re)discovered LaTeX and I love it!
Update: I was trying to figure out why it was that I liked LaTeX so much and it hit me that a system like LaTeX deals with text as if it were code. Where as WYSIWYG hide away many of the hard details of what a document looks like, it also adds much complexity. We are given way more information than is needed at any given time. However, I do agree that LaTeX was made to write longer texts.
ShoqBox CommandLine on Mac or Linux
I recently got a ShoqBox as a gift and since I couldn’t access it on my Mac since I don’t have MusicMatch I decided to write my own script. This is written in ruby and requires the mp3info gem. So what needs to be done to run it?
- First place your mp3s in /Volume/FLASH/_system/media/audio folder.
- Run the program like so: shoqbox.rb > /Volume/FLASH/_system/media/audio/songsql
- Next change to /Volume/FLASH/_system/media/audio/ directory
- Run: sqlite2 MyDB
- Under sqlite run: .read songsql
- Quit sqlite and the songs should be copied.
This is a really bare minimum copy into since it doesn’t take into account the Artist and Song information at all. The songs should appear under All Songs in the player.
Here is the source for the file:
Mercurial, my brand spanking new SCM
I have decided to use Mercurial as my SCM moving away from Subversion. I must say it is quite nice as it is very simple to use and the commands are mostly compatible with Subversion. So what initiated me to move to this SCM instead of sticking with trusty old subversion?
Well I was getting weary of having a centralized repository as my server is quite unstable since I tend to run the latest and greatest of software so I thought it would be appropriate to move to a more distributed SCM. Also, since the repository is distributed it would make it easier to work when I don’t have access to the internet. How is this possible? Well Subversion works such that when you want to commit changes you have to interact with central server. Not so with Mercurial because any commits you do stay local. If you have a central repository type structure you can “push” the changes to the “server.”
So what’s the big deal yet another SCM right? Well it seems the big shots are starting to use it such as OpenSolaris, Mozilla and Xen. So I will post a follow up after working with it for a while.
Distributed revision control with Mercurial
Ruby and the Way Forward
Ruby has just won me over. I mean there are so many things to love about it and several to hate, but I must say the language itself is quite nice. I have always been a Python fan, but I feel Ruby is a better Python. It is what Python should have always been.
Things I like:
- Any object is extendable. I love this because it keeps things pure. Who wants to figure out whether it is len(blah_list) when you can do blah_list.length. Objects have properties and those properties are extendable or should be easily so to meet the needs of the developer. I’m not saying other languages don’t have something like this, but Ruby just makes them so easy. I know this is ripe for abuse by people who may want to convert their Array objects to do array objects weren’t really meant to do, but overall it’s still a cool feature.
- Blocks! Every Ruby user uses blocks as an excuse and I’m also going to be one of them. However, blocks are wonderful. Python has them too and the recent with statement is going to make it even easier.
- Gems. Alright not really part of the language itself, but might as well be. My pain with Python has always been to install third party libraries and recently it has been getting better with easy_install and eggs, but considering the Ruby community dealt with that issue as the community has been growing is great since it makes it easy to deal with dependencies of libraries and if future Ruby releases have gems built in it will make it even better. This is one idea from Perl that is just awesome and Ruby implemented it pretty well.
- I love the Regex support as part of the language itself. I mean Python was great and all, but I found it a pain to use Regex through libraries then again maybe I’m just an idiot? But having full support in the language itself is just friggin awesome.
- ` for running system commands. Simple and returns a string so you can muck around with the output if you so choose. Great for those fast scripts.
Things I dislike:
- Ruby has some weird method names for some of the standard libraries. Like how the hell am I supposed to know that ‘test’.intern converts a string to a symbol.
- The global variables! It seems like these could be abused and would be best if they weren’t there. I mean all these things can be implemented mostly without them or with more logically named ones.
- Looking at the Ruby code there seems to be a break of style within code for various libraries. Python is amazingly well organized in a consistent style while the Ruby code mostly the core libraries seem to be inconsistent in their style. I may be anal in that I hate seeing
def method param, param2and
def method(param, param2)
in the same class for different methods. It just doesn’t look good! So I would think there should be a style set for the core language code itself. - Documentation. Ruby’s documentation isn’t really it’s strong point, but it is getting better. Meanwhile, Programming Ruby is an amazing book and I recommend it to everyone learning Ruby.
Python vs Ruby for a simple script
I rewrote the TV Torrent downloading script in Python and Ruby and I must say for a simple script it doesn’t really matter which language you use. Although I would think Ruby has an advantage in that it doesn’t need an external module since RSS support is built into the core libraries.
Both scripts can be used like so in a cronjob:
20 * * * * /Users/abhi/bin/tvtorrent.py | xargs open 20 * * * * /Users/abhi/bin/tvtorrent.rb | xargs open
Here is the code for Python:
#!/usr/bin/env python
import feedparser
import urllib2
DownloadPath = '/Users/abhi/Downloads/Feeds/'
DownloadLog = '/Users/abhi/.tvdownloads'
FeedUrl = 'http://pipes.yahoo.com/pipes/GBPk1Pm82xGRSPpPJxOy0Q/run?_render=rss'
def is_downloaded(link):
return link in open(DownloadLog, 'r+').read()
def write_log(link):
open(DownloadLog, 'a+').write('%s\\n' % link)
def get_tvtorrents():
parser = feedparser.parse(FeedUrl)
for item in parser['items']:
if not is_downloaded(item['link']):
torrentfile = DownloadPath + item['title'].replace(' ', '_') + '.torrent'
torrentdata = urllib2.urlopen(item['link']).read()
open(torrentfile, 'wb').write(torrentdata)
write_log(item['link'])
print torrentfile
if __name__=='__main__':
get_tvtorrents()
Here is the code for Ruby:
#!/usr/bin/env ruby
require 'rss'
require 'open-uri'
DownloadPath = '/Users/abhi/Downloads/Feeds/'
DownloadLog = '/Users/abhi/.tvdownloads'
FeedUrl = 'http://pipes.yahoo.com/pipes/GBPk1Pm82xGRSPpPJxOy0Q/run?_render=rss'
def is_downloaded?(link)
open(DownloadLog, 'r+').read.to_s.include?(link)
end
def write_log(link)
open(DownloadLog, 'a+').write("#{link}\\n")
end
def get_tvtorrents
open(FeedUrl) do |rss|
result = RSS::Parser.parse(rss.read, false)
result.items.each do |item|
unless is_downloaded?(item.link)
torrentfile = DownloadPath + item.title.gsub(' ', '_') + '.torrent'
torrentdata = open(item.link).read()
open(torrentfile, 'w+').write(torrentdata)
write_log(item.link)
puts torrentfile
end
end
end
end
get_tvtorrents
As you can see not only are they both better scripts then my last tv torrent script, but they are about the same size and look pretty much the same. The only real reason that Ruby is longer is because of the end statements. So which language do I like? I like both languages and will continue to learn both languages each has its strengths and weaknesses and both are evolving and it is really hard to say based on such a short script anyways.
Using Yahoo Pipes to create TV Torrent RSS Feeds
Yahoo Pipes is pretty cool allowing someone to create their own feed based on an aggregate of other feeds. It simplifies things immensely for creating nifty feeds of various things. More appropriately, Yahoo Pipes is ripe for abuse by those of us who want to create a feed of tv show torrent files we would like to download.
Where before I was filtering feeds locally, I choose to create a Pipe of the shows I want to download. This has the added benefit of letting you share the piped feed with other people so they can download the same torrents as you if they wish.
So how do we go about creating this feed? Here is a slideshow of how to do just that.
After it would be the same deal as before, but now we no longer need to do filtering.
So here is the new code:
#!/opt/local/bin/python2.5
import feedparser
import urllib2
DownloadPath = '/Users/abhi/Downloads/Feeds/'
DownloadLog = '/Users/abhi/.tvdownloads'
FeedUrl = 'http://pipes.yahoo.com/pipes/GBPk1Pm82xGRSPpPJxOy0Q/run?_render=rss'
def is_downloaded(link):
return link in open(DownloadLog, 'r+').read()
def write_log(link):
open(DownloadLog, 'a+').write('%s\n' % link)
def get_tvtorrents():
parser = feedparser.parse(FeedUrl)
for item in parser['items']:
if not is_downloaded(item['link']):
torrentfile = DownloadPath + item['title'].replace(' ', '_') + '.
torrent'
torrentdata = urllib2.urlopen(item['link']).read()
open(torrentfile, 'wb').write(torrentdata)
write_log(item['link'])
if __name__=='__main__':
get_tvtorrents()
TV Show Torrent Downloader Python Script
Democracy Player was being annoying and using up a lot of my computer’s resources including giving me the infamous MacBook whine. So I did what any script hacker would do and wrote my own script.
#!/usr/bin/env python
import feedparser
import urllib2
import pickle
import os
FILE_PATH = '/Users/abhi/Downloads/Feeds/'
DOWNLOADS_LOG = '/Users/abhi/bin/tvdownloads.txt'
FEED_URL = 'http://tvrss.net/feed/eztv/'
filters = ('Daily Show', 'Colbert', 'The Simpsons', 'Family Guy', 'Top Gear', 'Office',)
def main():
parser = feedparser.parse(FEED_URL)
try:
downloaded = pickle.load(open(DOWNLOADS_LOG))
except:
downloaded = []
for item in parser['items']:
for filter in filters:
torrent_link = item['enclosures'][0]['href']
filename = FILE_PATH + item['title'].replace(' ', '_') + '.torrent'
if filter in item['title']:
if not item['id'] in downloaded:
open(filename, 'wb').write(urllib2.urlopen(torrent_link).read())
downloaded.append(item['id'])
if os.uname()[0] == 'Darwin':
os.system("open " + filename)
pickle.dump(downloaded, open(DOWNLOADS_LOG, 'w'))
if __name__=='__main__':
main()
So what does this code do?
- file_path is where the torrent files are downloaded to.
- feed_url is the url of the torrent files containing rss feed.
- downloads_file is the txt file that contains the already downloaded files.
- filters is a tuple that contains part of the tv show that you want to download.
So make this a cron job and the feed is checked every n minutes/hours for new updates.
20 * * * * /Users/abhi/bin/tvtorrent.py > /dev/null