Archive for May 2006
pickle Me This
[tags]serializer, pickle, python[/tags]
Alright so I finally figured out how to store serialized data in Python. After a bunch of wtf are you talking about and some intense Google Searches. Yes, my history is filled with serialize this, serialize that. But let us get down to it!
Say you have an an object instance that you want to use again in a later program. You would use pickle as shown in the following code.
# Need to use pickle
import pickle
# Example Class
class Test:
def __init__(self, input):
self.input = input + " This is gonna get Serialized!"
# Class Instance
test = Test('test serialize!')
# test.input value
print test.input
# We want to save the above instance so we can run it later on!
# So to save it to a file we can do this:
pickle.dump(test, open('test.pickle', 'w'))
# Delete test just so we know it doesn't exist!
del test
# The above writes the serialized data to disk
# Now to read that data back again and ue it we do the following.
test = pickle.load(open('test.pickle'))
# This should be the same as above!
print test.input
# Say you want to store the pickle dump elsewhere like a database
# Well we can get a pickle dump as a string
string_pickle_dump = pickle.dumps(test)
# Delete test again
del test
# Let's see what it has!
print "Pickle Dump Start"
print string_pickle_dump
print "Pickle Dump End"
# Now we can store the dump in a database at this point in time!
# Now lets get back the serialized data!
test = pickle.loads(string_pickle_dump)
# Let's see what we get
print test.input
You should get something like the following output:
test serialize! This is gonna get Serialized! test serialize! This is gonna get Serialized! Pickle Dump Start (i__main__ Test p0 (dp1 S'input' p2 S'test serialize! This is gonna get Serialized!' p3 sb. Pickle Dump End test serialize! This is gonna get Serialized!
Designing Scheduling Database for Callug
[tags]database, berkeleydb, callug, python[/tags]
I’m creating a Scheduling App for CalLUG, but I’m running into this roadblock. I’m creating a database that is going to be read more often then actually be written to so I’m doing a BerkeleyDB using binary trees. This would make it relativly fast as opposed to using a RDBMS.
Now the problem starts in how do I design the key and value. I’m thinking multiple values for the same database.
key1=controlnumber value1=(all other info seperated by commas) key2=courseTitle value2=key1 key3=departments value3=key1
Now unless I find a pattern between the CourseNumbers and the Departments I am pretty much screwed and am going to be repeating myself a lot! This totally breaks the DRY principle.
So I am really contemplating if RDBMS aren’t worth the slowness at least initially. I am probably going to need to abstract the database access code from the main code so I could probably get away from using Berkeley DB until I figure out how to build it to accees these metrics.
I could probably do the secondary metric thing, but I will need to use the actualy pybsddb code and need to figure out how it works. So what does all this mean? I need to learn C…
Update
I’ve decided with Stephen’s help that the best course of action is to use an RDBMS for this project. Unfortunately the database has a lot of ways that it can access the information. So until I figure out a way to make these accesses in BerkeleyDB I’m going to unfortunately stick with MySQL.
BerkeleyDB and RDBMS
[tags]berkeleydb, database, rdbms, python[/tags]
I have been researching databases lately. Why? I really don’t know just a current interest. It’s really interesting since databases are so important in our lives how we can get the data we need without the overhead associated with the transaction.
As I am learning more and more about this stuff some of it may be inaccurate so please don’t take my final word for anything. Well that’s just a general rule as the Buddha said, “Question everything!”
So what is BerkeleyDB? Well in it’s simplest form it is what full featured RDBMS like MySQL use for as the storage engine. Alright that wasn’t simple let’s try again. BerkeleyDB basically allows you to store a value for a certain key. Much like Hashes or Dictionaries! However, the beauty of this method is that it is fast.
When calling a RDBMS through a SQL statement the problem becomes speed as users grow. That is as more and more users are added to the system the overhead to access the database becomes greater and greater! So for reading things it becomes tediously slow unless there are some fast machines at your disposal. The way this database works it is best if things are read more often then saved. Or if when you save values that they happen in bulk.
An example is say Google’s Database. Most of us will never have that much data, but if you consider it they are pretty friggin’ fast for each result. How do they do it? Well they use databases like BerkeleyDB. In fact on their site, they say Google’s Authentication system uses BerkeleyDB! People may try to access the database all the time, but how often do you update those changes? Monthly, yearly? Why it is great for times when changes don’t happen a lot.
Now the question is how do I use it? Well here is some code!
>>> import bsddb
>>> db = bsddb.btopen('test') # B-Tree open
>>> db['test'] = 'test'
>>> db['test2'] = 'test2'
>>> db
{'test': 'test', 'test2': 'test2', 'hello': "['yo', 'no', 'do']"}
>>> db.keys()
['hello', 'test', 'test2']
>>> db.pop('hello')
"['yo', 'no', 'do']"
>>> db
{'test': 'test', 'test2': 'test2'}
See just like a dictonary! Also note I am using a binary tree (b-tree) here which is a subject of another post.
Design for Content
[tags]design, web, beauty, ads, content[/tags]
I thought I’d try something different now and talk about design. Please note I’m not an expert at all. I see a lot of sites on the net now that are unappealing even if the content is great. That may be one reason people may be going back to those sites time and time again. It is the content. I don’t think anyone will ever want to see the ad ridden sites.
This is turning into a rant so bear with me. When I go to a site I don’t want to see ads at all! The first impression is what keeps me in and if there are ads I get driven away. I just don’t like seeing them. When I enter a site if I don’t see content and all I get is a bunch of ads I think, “Hmm. This person/company doesn’t have a lot of content I guess.” When I come to a site I want to see the content!
Don’t get me wrong you can have ads. This site has ads, but you don’t see them on the front page. When people come here I want them to see the content. The content is king! However, your design should be promoting the content, not the ads!
Ads placement should complement the content. That is on this site if you go the post page you see the content then ads then comments and then ads. The ads following the comment screen is there for you to maybe glimpse at while you are writing the content, but I don’t expect you to take time to click on them, unless your interested.
So what about all those extra fluff like links and pictures and random imagery. Well I don’t have any of that. I want to keep it simple and focus on the content! So basically focus on content and all else follows!
Context Based Tags
[tags]tags, python, yahoo[/tags]
Here is some code to get context based tags based on the text. It uses Yahoo, but in the future I’m going to see if I can add other services and give the results based on the tags that are present in all of them so as to get the best result.
def make_tags(text, numtags=5):
'''
Connects to Yahoo and gets tags based on content.
text - the text to extract tags from
numtags - the number of tags to return.
'''
# Header params
url = 'http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction'
params = urllib.urlencode({
'appid': 'upbylunch',
'context': text})
# Send Request
response = urllib2.urlopen(url, params).read()
# Make tag list
doc = xml.dom.minidom.parseString(response)
tags = [str(i.childNodes[0].nodeValue) for i in doc.getElementsByTagName('Result')]
# Send the a list of numtags about of tags
return tags[:numtags]
Tags Vim Post
[tags]tags, vim, post, wordpress[/tags]
I finally figured out how to post tags to WordPress. It was really stupid of me since all I had to do was read the documentation for Ultimate Tag Warrior. I just had to enable embedded tag support. So if you want to use this new version you have to do that also.
New Features
- Tags support
- If no tags are inserted goes to Yahoo and grabs some tags based on content of your post
- Basically this is feature complete
So here is the new version of the script. Put this in your .vimrc.
python Code above do a search and replace of [ to [ and ] to ]. Format of the post </pre><pre>[code] Title tags, split, by, commas, or, left, blank, for, automatic, picking Content ~ ~ ~ ~ ~ ~
Features I may want to add in the future
- Way to add links automatically from text that is caps and are nouns.
- Better tag selection based on Tagyu
- Remove sensitive information from code posts. Since I always seem to forget removing them.
Search as the New Command Prompt
This article is very true, I really do prefer to search through things then to actually locate them. It just makes things so much easier then actually having to organize and find where things are.
The new Google Desktop makes is much more easier to search through things with the Ctrl-Ctrl making the Search Box appear. Very Nifty!
So what is the point of this ridiculously no-content-just-repeating-someone-elses-post post about? Well nothing just that repeating someone elses post! Sorry for wasting you time.
Google Calendar VIM Add Entry V2
Here is the new version to put in your .vimrc
python %s
%s
%s
''' % (title, name, google_email, start_date, end_date)
cal.insert_entry(event)
vim.command('set nomodified')
EOF
This is the brand spanking new version of Google Calendar Post for Vim. New features include:
- Add as many entries as your’d like
- New line format: Title#mm-dd-yyyy hh:mm#mm-dd-yyyy hh:mm
- extract_dates() is much, much cleaner!
- Less patchy feeling to it.
- Code easier to understand.
- Alright most of these aren’t really features!
Vim Buffer should be like so (example post):
Doctor's Meeting#05-26-2006 12:30#05-26-2006 13:00 Gonna Die#05-26-2006 13:00#05-26-2006 13:15 ~ ~ ~ ~ ~ ~ ~
Enjoy!
Update: This will be the last version that I will create. I thought I was going to use this often, but it turned out I actually prefer using the browser. If anyone else wasn’t to use it go ahead. It’s released into the Public Domain now. Thank you.
Plans for new Comp
So I’ve decided to get a MacBook. Why? Cause they’re so damn great! I’m not even kidding about that. MacBooks have it all even the second degree burns included in the package.
For the price that it is selling, it is one time to get a Mac since it is pretty much around the price point of a normal laptop and very powerful. So what does this mean? Well I’ll be able to use iTunes and keep programming at the same time. I’ll be able to use UNIX and just have fun.
Mind you I have heard of the overheating, but I think it can be fixed easily by not putting the laptop in your lap. That defeats the name laptop, so we’ll call it a notebook. There that wasn’t so hard was it?
Google Calendar VIM Add Entry
Time for another VIM plugin! This one allows you to post to Google Calendar from a VIM Buffer. It is still not done as I have to refactor a lot of code. However, at this point in time it is good enough that it works. So here is the code that you need to put in your .vimrc:
python
<title>%s</title>
%s
%s
%s
''' % (title, content, name, google_email, start_date, end_date)
cal = CalendarService(google_email, google_password)
print cal.insert_entry(event)
vim.command('set nomodified')
EOF
Also put this in your .vimrc if you want to map an easy way to post the entry.
" Creates a map so pressing will post to Google Calendar. map :py gcal_new_event()
To post an entry the following format must be followed.
Line1: Title Body: Description of Event LastLine: Start and End Date Format:mm-dd-yyyy hh:mm;mm-dd-yyyy hh:mm