Postgres collation errors on CITEXT fields when upgrading to 9.1
21 May 2012
0 comments
Web development
Just in case this hits you too when you use CITEXT fields that were originally defined in a Postgres before version 9.1.
ProgrammingError: could not determine which collation to use for string comparison HINT: Use the COLLATE clause to set the collation explicitly.
This can happen if you use something like:
WHERE name='peter'
when field name is a case insensitive text field.
After some googling around and shooting in the dark I found the the only way to crack this is to run this command:
CREATE EXTENSION citext FROM unpackaged;
Hope that helps some poor schmuck with the same problem.
UPDATE
If you have problems applying this to new tables in Postgres 9.1 you might need to run this instead:
CREATE EXTENSION citext WITH SCHEMA public ;
Are WebSockets faster than AJAX? ...with latency in mind?
22 April 2012
6 comments
Web development, JavaScript
The advantage with WebSockets (over AJAX) is basically that there's less HTTP overhead. Once the connection has been established, all future message passing is over a socket rather than new HTTP request/response calls. So, you'd assume that WebSockets can send and receive much more messages per unit time. Turns out that that's true. But there's a very bitter reality once you add latency into the mix.
So, I created a simple app that uses SockJS and an app that uses jQuery AJAX to see how they would perform under stress. Code is here. All it does is basically, send a simple data structure to the server which echos it back. As soon as the response comes back, it starts over. Over and over till it's done X number of iterations.
Here's the output when I ran this on localhost here on my laptop:
# /ajaxtest (localhost) start! Finished 10 iterations in 0.128 seconds meaning 78.125 messages/second start! Finished 100 iterations in 0.335 seconds meaning 298.507 messages/second start! Finished 1000 iterations in 2.934 seconds meaning 340.832 messages/second # /socktest (localhost) Finished 10 iterations in 0.071 seconds meaning 140.845 messages/second start! Finished 100 iterations in 0.071 seconds meaning 1408.451 messages/second start! Finished 1000 iterations in 0.466 seconds meaning 2145.923 messages/second
Wow! It's so fast that the rate doesn't even settle down. Back-of-an-envelope calculation tells me the WebSocket version is 5 times faster roughly. Again; wow!
Now reality kicks in! It's obviously unrealistic to test against localhost because it doesn't take latency into account. I.e. it doesn't take into account the long distance the data has to travel from the client to the server.
So, I deployed this test application on my server in London, England and hit it from my Firefox here in California, USA. Same number of iterations and I ran it a number of times to make sure I don't get hit by sporadic hickups on the line. Here are the results:
# /ajaxtest (sockshootout.peterbe.com) start! Finished 10 iterations in 2.241 seconds meaning 4.462 messages/second start! Finished 100 iterations in 28.006 seconds meaning 3.571 messages/second start! Finished 1000 iterations in 263.785 seconds meaning 3.791 messages/second # /socktest (sockshootout.peterbe.com) start! Finished 10 iterations in 5.705 seconds meaning 1.752 messages/second start! Finished 100 iterations in 23.283 seconds meaning 4.295 messages/second start! Finished 1000 iterations in 227.728 seconds meaning 4.391 messages/second
Hmm... Not so cool. WebSockets are still slightly faster but the difference is negligable. WebSockets are roughly 10-20% faster than AJAX. With that small a difference I'm sure the benchmark is going to vastly effected by other factors that make it unfair for one or the the other such as quirks in my particular browser or the slightest hickup on the line.
What can we learn from this? Well, latency kills all the fun. Also, it means that you don't necessarily need to re-write your already working AJAX heavy app just to gain speed because even though it's ever so slightly faster, the switch from AJAX to WebSocket comes with other risks and challenges such as authentication cookies, having to deal with channel concurrency, load balancing on the server etc.
Before you say it, yes I'm aware than WebSocket web apps comes with other advantages such as being able to hold on to sockets and push data at will from the server. Those are juicy benefits but massive performance boosts ain't one.
Also, I bet that writing this means that peeps will come along and punch hole in my code and my argument. Something I welcome with open arms!
Secs sell! How frickin' fast this site is! (server side)
05 April 2012
0 comments
Linux, Web development, Django
This is part 2. Part 1 is here about how I managed to make this site fast.
The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.
However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.
So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:
Document Path: / Document Length: 73272 bytes Concurrency Level: 10 Time taken for tests: 4.426 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 734250000 bytes HTML transferred: 732720000 bytes Requests per second: 2259.59 [#/sec] (mean) Time per request: 4.426 [ms] (mean) Time per request: 0.443 [ms] (mean, across all concurrent requests) Transfer rate: 162022.11 [Kbytes/sec] received
I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.
Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.
So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.
The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.
In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.
Secs sell! How frickin' fast this site is! (client side)
30 March 2012
6 comments
Web development

After a lot of optimization work on this website I finally now get a score of 98 on YSlow! Phew! Finally!
I've managed to get near perfect scores in the past but never on something as "big" and mixed and "multimedia" as this, ie. the home page. The home page on this site contains a lot of content. Lots of thumbnails and lots of code.
As always, it really helps if you can control the requirements. Meaning you can say "No, we don't want an embedded Flash widget with 30kb Javascript". In my case I didn't want content to be dynamic per each user request so the underlying HTML can be properly cached. Also, I don't need any Javascript for the home page because all it does is static content.

My individual blog pages are the only pages that require Javascript. What I did there was let Google host a copy of the latest jQuery and I just add some minified code to handle the AJAX of the comment posting. It's pretty cool that the individual blog post pages get a score of 99 on YSlow even though they contain a decent amount of Javascript.
What I've also done is moved every single image, css and javascript element to the Amazon CloudFront CDN. Yes, this costs money but certainly not much. My web server is located in London, England which is a good location but considering that 70% of my visitors are based in north America it's more fair that 90% of the web page content is served near them instead. This is clearly illustrated with this screenshot from Pingdom.

I'm quite aware that it's 100 times easier to build a fast website when you can simply disregard certain features such as fat picture galleries and massive blocks of Javascript stuff. But mind you, choosing not to add those features is a large part of making fast websites too. The number one rule of making a request fast is to not make it at all.
I'll soon blog more about how I made these things happen from a technical point of view.
Going real simple on HTML5 audio
14 October 2011
0 comments
Web development, JavaScript
http://donecal.com/testsound
DoneCal users are to 80+% Chrome and Firefox users. Both Firefox and Chrome support the HTML <audio> element without any weird plugins and they both support the Ogg Vorbis (.ogg) file format. change log here
So, I used use the rather enterprisey plugin called SoundManager2 which attempts to abstract away all hacks into one single API. It uses a mix of browser sniffing, HTML5 and Flash. Although very promising, it is quite cumbersome. It doesn't work flawlessly despite their hard efforts. Unfortunately, using it also means a 30kb (optimized) Javascript file and a 3kb .swf file (if needed). So, instead of worrying about my very few Internet Explorer users I decided to go really dumb and simple on this.
The solution basically looks like this:
// somewhere.js
var SOUND_URLS = {
foo: 'path/to/foo.ogg',
egg: 'path/to/egg.ogg'
};
// play-sounds.js
/* Call to create and partially download the audo element.
* You can all this as much as you like. */
function preload_sound(key) {
var id = 'sound-' + key;
if (!document.getElementById(id)) {
if (!SOUND_URLS[key]) {
throw "Sound for '" + key + "' not defined";
} else if (SOUND_URLS[key].search(/\.ogg/i) == -1) {
throw "Sound for '" + key + "' must be .ogg URL";
}
var a = document.createElement('audio');
a.setAttribute('id', id);
a.setAttribute('src', SOUND_URLS[key]);
document.body.appendChild(a);
}
return id;
}
function play_sound(key) {
document.getElementById(preload_sound(key)).play();
}
// elsewhere.js
$.lightbox.open({
onComplete: function() {
preload_sound('foo');
}
});
$('#lightbox button').click(function() {
play_sound('foo');
});
Basically, only Firefox, Chrome and Opera support .ogg but it's a good and open source encoding so I don't mind being a bit of an asshole about it. This little script could be slightly extended with some browser sniffing to work with Safari people but right now it doesn't feel like it's worth the effort.
This make me happy and I feel lean and light. A good feeling!
New feature on Too Cool For Me: Everyone I follow
08 October 2011
0 comments
Web development
http://toocoolfor.me/screenshots#5
I've added a new feature to Too Cool For Me that lists all the users that you follow and splits them up into "Follows me" and "Too cool for me".
To try it you have to authenticate with Twitter (READ ONLY mode) then go to toocoolfor.me/everyone
This means you can use Too Cool For Me without having to use the Bookmarklet.
Google's new Page Speed Online hard to beat
04 April 2011
3 comments
Web development
I like the new Google Page Speed Online for it's simplicity. However, I threw it the URL of my Crosstips site http://crosstips.org and it only gave me a 80 out of 100 even though there were no high priority suggestions.
Seems hard to beat. Surely, to win over the remaining 20 points I don't have to tick all the medium and low priority suggestions.
How I profile my Nginx + proxy pass server
16 February 2011
3 comments
Web development, Python
Like so many others you probably have an Nginx server sitting in front of your application server (Django, Zope, Rails). The Nginx server serves static files right off the filesystem and when it doesn't do that it proxy passes the request on to the backend. You might be using proxy_pass, uwsgi or fastcgi_pass or at least something very similar. Most likely you have an Nginx site configure something like this:
server {
access_log /var/log/nginx/mysite.access.log;
location ^~ /static/ {
root /var/lib/webapp;
access_log off;
}
location / {
proxy_pass http://localhost:8000;
}
}
What I do is that I add an access log directive that times every request. This makes it possible to know how long every non-trivial request takes for the backend to complete:
server {
log_format timed_combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" $request_time';
access_log /var/log/nginx/timed.mysite.access.log timed_combined;
location ^~ /css/ {
root /var/lib/webapp/static;
access_log off;
}
location / {
proxy_pass http://localhost:8000;
}
}
If you do that your access log file will look the same as before except now it will have a last column that contains the timing. Now, let your site spin for a couple of days/weeks/months and later download the access log:
$ rsync -avzP root@myserver.com:/var/log/nginx/timed.mysite.access.log .
Excellent, now download this script and save it next to your log file. When you run it you get a nice little menu that should be sufficiently self-explanatory:
$ python analyze.py timed.mysite.access.log
What do you want to know?
1) Slowest performer
2) Most common
3) Total cumulative time
The most interesting one is probably the last one because that's where most time is spent by your web application and perhaps that's the place to start if you want to make your site faster or if you want to know what is most important in terms of test coverage. Here's a sample output from my on the "Total cumulative time":
GETS
537.582 /?replypath=/comment-20041222-x54i/comment-20041231-0gug
519.277 /?replypath=/comment-20041222-x54i
306.039 /
259.845 /rss.xml?oc=Django
251.064 /Bush-country/
233.165 /plog/blogitem-040601-1
224.459 /plog/blogitem-040601-1/?replypath=/c0610287425
186.032 /plog/interior-octopus/octopus.jpg?display=large
170.430 /plog/blogitem-040601-1/?replypath=/comment-20050714-12mf
POSTS
182.964 /plog/button-tag-in-IE
65.311 /plog/unicode-to-ascii
30.086 /plog/blogitem-040406-1/compressor
7.581 /plog/blogitem-040806-1/test-printsql
1.676 /gkc/callback
0.372 /plog/blogitem-040404-1/getCommentCookie
0.364 /plog/donecal-homepage-10k-req-per-sec/manage_editKeywords
0.363 /plog/blogitem-040627-1/previewComment
0.274 /plog/createelement-a
0.246 /plog/blogitem-20031027-2106/previewComment
I hope it helps. Perhaps other people can help me improve the script and later we can turn it into a package. One thing I would like to see for example is to use the median to reduce crazy spikes (e.g. a URL that normally takes 10 milliseconds just once takes 10 seconds)
How to book a ticket on the Royal Academy of Music's website
13 November 2010
1 comment
Web development
I've finally managed to book my ticket to see Zappa. It's the Royal Academy of Music Manson Ensemble who play about 10 Frank Zappa classics. It's here in London on Baker Street.
The Royal Academy of Music website sucks. Its ticket booking part is completely broken. Fortunately I found a way to "hack" it so that I could get a ticket. And it only cost me £1 extra.
On that note, why isn't the box office open on weekends? And why is no one answering any of their phones on a Saturday?
To make a purchase you need an account. (if I wasn't lazy I would now link to multiple studies that have shown what a bad idea that is) but you can't create an account because of a Javascript bug that pops up and says something like "- Field not valid format". Obviously not telling you which field it has failed to evaluate and no, I didn't enter anything in invalid format. So, use a web browser where you can disable Javascript and try to submit the form. (I use Firefox and the Web Developer extensions) Remember you re-enable Javascript after you've created the account.
Now, when you submit the form it will just become a blank page with nothing on it. Don't worry. At this point they will have emailed you your password. Pick up that email and go here http://tickets.ram.ac.uk/peo/crm_login.asp to log in. Now, you can try to buy the ticket as normal and proceed to checkout.
On the checkout page, even if you're logged in and type in everything correctly it will still respond with an error. One of those annoying errors that means you have to click the Back button with the risk of losing what you've typed in. The trick is that you have to select "I would like to donate". Something I genuinely don't mind but if they're going to endorse such crappy websites it hurts a little to be generous towards them. Anyway, select £1 as the donation and at this point you should be able to make the purchase.
Granted, these guys are awesome when it comes to music and me, a mortal web developer, can barely rip a CD. However, if selling tickets is something they intend to do more and if there's some sort of relationship between selling tickets, profit and happiness I would urge them to re-evaluate their booking website.
PS. For the techy geekys, doing a W3C source validation on their site yields 184 errors and 8 warnings. Impressive!
wkhtmltopdf and font size shrinkage
10 September 2010
0 comments
Web development
wkhtmltopdf is by far the best tool available to make PDFs. Yes. I have tried ReportLab and PISA. ReportLab might be more powerful but what you gain in fine-grained control you lose in hours and hours of productivity.
Anyway, I've learned something about font-size shrinkage and using wkhtmltopdf.
Basically, if use percentage to change a font size (Arial in this case) you get a PDF where the letters are unevenly spaced between. It took me a while to figure out what the hell was going on until I changed the font-size from 90% to exactly 11px.
('font-size:90%'; the spots of red are my highlights of the ugly spacings)
('font-size:11px'; not perfect but much better)
So, at first I thought this was the first time wkhtmltopdf has disappointed me but I guess I'll just have to remember not to use percentages and continue to favor wkhtmltopdf as my choice of weapon in the PDF production world.
