Archive for the ‘programming’ Category
Phusion Passenger for Traytwo
I moved to Phusion Passenger on traytwo. It looks like deploying rails apps will be a whole lot easier. No longer do I need to worry about maintaining mongrel ports and what not. A simple addition of a vhost file and a restart of apache to launch an new app is all that is needed. Also, apparently it is a whole lot faster than Mongrel so we’ll see if that is true.
Outsourcing Site Elements
I am working on a website where I am outsourcing key elements having to do with handling RSS feeds to the Google AJAX Feed API. The outsourcing of this component is great for a several reasons. First, I don’t have to worry about caching and updating the RSS feeds, that is handled by Google. Second, it allows me to focus more on the major components of the site leaving the mundane tasks of updating rss feeds to a company that is actually competent at it.
This move to a component architecture will probably be the way the web will move especially with OpenID and OAuth making such headwinds. This is economical since sites can specialize allowing the more mundance components that take up resources to be outsourced.
git
I’ve been content with Mercurial as it’s fast and lets you do development and for the most part keeps out of your way. It has few dependences and is a great DVCS. However, the problem I am finding with Mercurial and one of the great benefit’s of DVCS is branching and Mercurial, I think, gets its branching legacy from CVS/Subversion.
To branch in Mercurial you basically clone the repository into a new repository and work in the new repository. For example, say you have repository A. Branching it involves running cloning A into B. You make your changes in B and merge them back into A when you are ready. This works but it sucks especially for Rails development which I do.
The way one usually does Rails development is you have a checkout of a repo and you run a development server and work on those changes. However, when you want to branch in Mercurial you have to go to the new directory and start a new server instance and develop, merge, stop the server, start new server, etc. For some of you this may be fine, but for me it is a pain in the ass!
This is one of the things I like about git all the branches are within the same directory tree so you don’t need to do all those change directories. You can basically branch within the same tree and work with that. No need to clone all these repos to different directories.
Now the things I dislike about git is that it’s written in C! When I went to install it on Debian it had n+1 different packages it had to install! Though, I am biased toward scripting languages I personally think it would have been easier to port the app in the long run.
However, another feature that I love about git is git-svn. It make branched development so easy as the branches are local so best of both worlds. This is especially since I work with a lot of subversion code bases. If I need to make different patches it makes it simpler to do so, especially when requirements change unexpectedly and you want to start a new branch without checking out the trunk again.
So I’m going to play with git for a bit longer, but overall for now I like it. The documentation has clearly improved since I last looked at it.
Perlbal
I’ve long been a slave to Apache. I mean it’s a great server, but it is just so bloated. However, lately I’ve been noticing that I only really use my server for Rails and static file hosting both of which do not need a full blown Apache install. This is especially true since I have yet to get Rails and Apache to play nice.
So I have started using a different solution which basically is Perlbal and Mongrel. Perlbal is a reverse proxy and simple web server written by the guy who did LiveJournal. So far it is working well for what I am using it for and things seem to be running a lot faster.
A cool feature of Perlbal is it can dynamically add or remove nodes. So for example you can use EC2 to dynamically start instances as loads increase and just forward requests to those EC2 instances from Perlbal. Thought this might not work well with SSL, for most insecure connections it works absolutely fine.
Update: I have stopped using the built in webserver and have moved to using lighttpd. I assume I just need a full blown web server for somethings which just can’t be handled by Perlbal so I have added it.
New Job
I landed a new gig as a Web Developer for i5labs and fynspire run the by Jason Wong and Scott Thorpe respectively. It is a refreshing change and I get to play with Ruby on Rails as a full time gig. It is nice to also be able to work in San Francisco and right next to Union Square nonetheless.
Besides that I am working on a startup with a couple of friends. Our code is coming along and the design is nice. I’ll write more as we come closer to launch.
Day with Amazon EC2
Today I created an Elastic Computing Cloud image on Ubuntu Feisty and wanted to share some of the oddities, weirdness and coolness that I have experienced along this path of being a guy who created an AMI and made it actually run on EC2.
Amazon has been great lately coming out with such nifty technologies that help startups and the little guy such as Simple Storage Service, Elastic Computing Cloud and Simple Queue Service. The recently released Amazon Flexible Payments Service also seems like a great competitor to Paypal, but I wish it had a non-cobranded version. However, these other services are not the story, but EC2 is.
EC2 is a virtualization technology based on Xen provided by Amazon where you pay per instance hour. That is you pay per the hour that the instance is running. It is great for sites that need to add servers on the fly as load is added. Although it is not cheap as it runs about $70 dollars a month it does provide decent specs for the hardware.
What I realized is that EC2 is more like running a system using a LiveCD then an actual system install. What does this mean? It means that storage is temporary and as soon as the instance is shut down all data is lost. However, if you reboot the instance the files seem to stay alive.
Also most importantly the root filesystem gets very little space. That means you have to work with /dev/sda2 which is usually mounted to /mnt. You are provided with 150gb of space on sda2. I ran into this snafu when I was trying to configure MySQL. Since MySQL uses /var/lib/mysql (on Ubuntu) to store the logfiles I kept on getting “Storage out of space” errors. To solve this I wrote a startup script that moves the mysql files to /mnt.
Uploading a new image was relatively simple, however, I got annoyed that I had to type out the Private Key and Public Key on the command line. Bundling the image itself was relatively straightforward as it was just converting a loopback device.
Creating a Ubuntu image was relatively straightforward especially with these directions and debootstrap taking care of the hard part. Remember to install libc6-xen if you are creating a Linux image. I wish Amazon would be less rpm specific and provide debs for the bundling apps. However, alien was great and got me through the process of installing the files.
Overall, I must say if you are really into creating your own images/distribution that you want to use to run various specific services then EC2 is amazingly efficient at it. Though I wish they would provide state if you want to run a bunch of images to do processing such as video transcoding, Hadoop clusters or mass indexing then Amazon is a gift from the web gods.
Plus I recommend everyone create an image and upload it. I learned quite a bit in the process.
Path taken to put something in SCM
This is the path I take to get something into a SCM.
- I work on project.
- I don’t want to screw up the project I have been working on so I start organizing files in a directory.
- I put the directory under version control basically “hg init”.
- I add the files that I want under version control.
- I check in.
- I pull to another computer as a backup.
- I regularly push to the backup after doing a bunch of changes.
Ah isn’t life simple with distributed version control. No setting up repository servers or anything!
Linus is talking about the benefits of git which is another distributed version control. You should definitely watch it since he makes many good points about distributed version control and is quite right. Although, I wish he were nicer to the subversion folks. However, I totally recommend Mercurial as it simple to use and quite nice.
Twitter as a Status Reminder
Twitter is a semi-useful service. I say semi because a lot of users just post crap such as, “I got up this morning.” I can only say who the hell cares? However, there is a use for Twitter and that is as a notification service for stuff happening on your computer.
Say I am living in the dorms and my floormates and I regularly download tv shows and sometimes we download the same show wasting bandwidth. I want to know if anyone is downloading that same show before I start downloading it. A twitter notification would be great for this since all anyone has to do is check the notifications and see if people are downloading the same file. You can then ask your friend to let you download the file once the file is downloaded. If this were done automatically it’d be awesome which makes me think how did such a simple service get so popular?
Create YouTube using Amazon AWS
It seems that infrastructure technology to start a startup is almost nil and creating a site like YouTube probably takes a lot less initial investment. This is especially true for sites utilizing Amazon Services like S3 and hopefully once out of beta the EC2.
Think of what YouTube does. It takes in videos encodes them as Flash and just shows them. It’s all a very simple and with Amazon services you can maximize the efficiency of your resources.
How would this be done?
- Have a public server. This should actually be a server you own and not one using Amazon. All user interactions happen through this site. Authentication and what not.
- User uploads file to S3. There doesn’t seem to be a way to do this directly through the browser, but possibly through the use of Flash?
- The uploaded data is put in a separate bucket. Preprocessed?
- Creates a queue with the file needing to be processed in Amazon SQS.
- Â On your server or Amazon’s using EC2 have a server listen for changes to the queue and distribute the encoding to other servers.
- Now the magic. You may have heavy loads at times where a bunch of people start uploading files and you need more processing power to encode files then you have so what do you do? Well using EC2 you create server instances on the fly based on the load on the queue. So you have a bunch of instances just encoding data and when you don’t need them you can destroy the instances saving you resources.
- You move the processed files into S3 and write to your database possibly via another queue so things happen in bulk. (Possibly MySQL Cluster nodes via EC2?)
Now getting all that to work is an exercise in itself, but the point is you have most of the infrastructure of servers taken care of so all it really takes is a good idea and good execution.
Ruby and the Way Forward
Ruby has just won me over. I mean there are so many things to love about it and several to hate, but I must say the language itself is quite nice. I have always been a Python fan, but I feel Ruby is a better Python. It is what Python should have always been.
Things I like:
- Any object is extendable. I love this because it keeps things pure. Who wants to figure out whether it is len(blah_list) when you can do blah_list.length. Objects have properties and those properties are extendable or should be easily so to meet the needs of the developer. I’m not saying other languages don’t have something like this, but Ruby just makes them so easy. I know this is ripe for abuse by people who may want to convert their Array objects to do array objects weren’t really meant to do, but overall it’s still a cool feature.
- Blocks! Every Ruby user uses blocks as an excuse and I’m also going to be one of them. However, blocks are wonderful. Python has them too and the recent with statement is going to make it even easier.
- Gems. Alright not really part of the language itself, but might as well be. My pain with Python has always been to install third party libraries and recently it has been getting better with easy_install and eggs, but considering the Ruby community dealt with that issue as the community has been growing is great since it makes it easy to deal with dependencies of libraries and if future Ruby releases have gems built in it will make it even better. This is one idea from Perl that is just awesome and Ruby implemented it pretty well.
- I love the Regex support as part of the language itself. I mean Python was great and all, but I found it a pain to use Regex through libraries then again maybe I’m just an idiot? But having full support in the language itself is just friggin awesome.
- ` for running system commands. Simple and returns a string so you can muck around with the output if you so choose. Great for those fast scripts.
Things I dislike:
- Ruby has some weird method names for some of the standard libraries. Like how the hell am I supposed to know that ‘test’.intern converts a string to a symbol.
- The global variables! It seems like these could be abused and would be best if they weren’t there. I mean all these things can be implemented mostly without them or with more logically named ones.
- Looking at the Ruby code there seems to be a break of style within code for various libraries. Python is amazingly well organized in a consistent style while the Ruby code mostly the core libraries seem to be inconsistent in their style. I may be anal in that I hate seeing
def method param, param2and
def method(param, param2)
in the same class for different methods. It just doesn’t look good! So I would think there should be a style set for the core language code itself. - Documentation. Ruby’s documentation isn’t really it’s strong point, but it is getting better. Meanwhile, Programming Ruby is an amazing book and I recommend it to everyone learning Ruby.