Peterbe.com

A blog and website by Peter Bengtsson

Filtered home page! Currently only showing blog entries under the category: Linux. Clear filter

This is how you check if a command (with or without any output) exited successfully or if it exited with something other than 0, in bash:

#!/bin/bash
./someprogram
WORKED=$?
if [ "$WORKED" != 0 ]; then
  echo "FAILED"
else
  echo "WORKED"
fi

But how do you inspect this on the command line? I actually don't know, until it hit me. The simplest possible solution:

$ ./someprogram && echo worked || echo failed

What a great low-tech solution. I just works. If you're on OSX, you can nerd it up a bit more:

$ ./someprogram && say worked || say failed

pgFouine in action on my server
pgFouine is a PostgreSQL log analyzer. You basically, configure your Postgres server to be very verbose about all statements. Then, you simply run the pgfouine.php command against the log file and it spits out a page like this:

www.peterbe.com/pgfouine.html

Running all this verbose logging will obviously slow down the database server a bit so I'm only going to be running this temporarily. The overhead is actually pretty small but it's also piling on quite a few bytes in terms of the size of the log file.

So, at the time of writing, it's been about 1 day running and it has captured about 70,000 queries (by the time you look at the file it might have gone up significantly). I haven't started actually looking at it in detail yet but it's clear that there's some use of the LIKE operator that Postgres spends most of its time on.

You can configure your pgFouine to filter on specific databases. I have not done so because I'm at the moment just interested in what the whole database server is getting up to. Most of these guilty queries comes from the Crosstips site. Maybe it's time to optimize the worst performing queries there a bit.

UPDATE

After running for 24 hours, I did some low-hanging fruit optimization to the biggest culprits and reset the logs. The first 24 hours report is still here: www.peterbe.com/pgfouine.1.html

UPDATE 2

I've stopped logging all queries now. The results are still there. I'm quite pleased with the results so far.

This is part 2. Part 1 is here about how I managed to make this site fast.

The web framework powering this site is Django and in front of that is Nginx which serves all the static content (once before Amazon CloudFront CDN takes over) and all non-static traffic is passed on to a uWSGI daemon which is running 6 worker processes. The database that stores the content is PostgreSQL and all caching is done in Redis. Actually another Redis database is used for other things such as maintaining a quick look-up index of keywords to primary keys so that I can quickly mesh together blog posts by keywords.

However, as we all know the deciding factor of a web sites server-side speed is effectively the speed of the database or any other disk-bound I/O device. To remedy this I've set up some practical caching strategies which I'm quite happy with.

So, how fast is it? Here's an ab stress test against home page with 10,000 requests spread across 10 concurrent users:

Document Path:          /
Document Length:        73272 bytes

Concurrency Level:      10
Time taken for tests:   4.426 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      734250000 bytes
HTML transferred:       732720000 bytes
Requests per second:    2259.59 [#/sec] (mean)
Time per request:       4.426 [ms] (mean)
Time per request:       0.443 [ms] (mean, across all concurrent requests)
Transfer rate:          162022.11 [Kbytes/sec] received

I could probably make that 2,300 requests/second to 3,000 or 4,000 if I just increase the number of workers. However, that costs memory and since I'm currently running 19 other uWSGI workers on this server that all (all 25) in total take up a steady 1.4 Gb I don't feel like increasing that number much more. Besides since this site doesn't really get any traffic, I'm not so concerned about massive throughput on concurrent benchmarks but more about serving each and every page as fast as possible the few times it's called.

Every single page on this site is behind some sort of internal cache. The only time the PostgreSQL is involved is in rendering a page is when it's first requested after a comment has been entered or I've added (or edited) a new post. Thing is, I don't want to be inconvenienced by a stupid cache that forces me to wait an hour every time I change something. No, instead lots of Django database model signals are put in place that fire off cache invalidation when certain pieces of data is changed. You can see the code for that here.

So, for the home page for example: For each request, a small piece of Python code checks the Redis for what the latest comment add-date is and based on that tells the Django page_cache decorator to either render the page as normal or to serve the whole HTML payload from Redis. In other words, on a successful cache "hit" it actually needs two Redis look-ups. Even that could be improved and blindly just spare these look-ups by serving from the workers allocated Python memory instead but that would make things fragile, hard to unit test and it would only make the benchmarks faster which is not necessary.

The most important thing to optimize on a web site is the static content. Well, there's little point in serving the static content fast if it takes 3 seconds to say what static content to serve. Also, a fast website is likely to appear more favorable on the Google bot which effectively makes the site appear higher on Google searches.

In the next part, I'll try to share more in-depth technical bits and pieces of what I actually did although they're no secrets I think some of them are best practice and even senior web developers sometimes get them wrong.

Short answer: about 5%

I had a few minutes and wanted to see if changing from Apache + mod_wsgi to Nginx + gunicorn would make the otherwise slow site any faster. It's not this site but another Django site for work (which, by the way, doesn't have to be fast). It's slow because it doesn't cache any of the SQL queries.

# with Apache + mod_wsgi
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    39 [#/sec] (mean)
...
# Uses about 110 Mb

That's after running multiple times and roughly averaging the requests per seconds.

# with Nginx + guncorn --workers=4
$ ab -n 1000 -c 10 http://thelocaldomain/
...
Requests per second:    41 [#/sec] (mean)
...
# uses about 70 Mb

So, if you want to make a site fast forget about how the code is being served until all the slow db I/O is taken care of properly.

This took me a long time to figure out so I thought I'd share.

Basically, I'm a newbie supervisor administrator and I was setting up a new config and I kept getting these errors:

# supervisord -n
2011-04-04 17:25:11,700 CRIT Set uid to user 1000
2011-04-04 17:25:11,700 WARN Included extra file "/etc/supervisor/conf.d/gkc.conf" during parsing
Error: Cannot open an HTTP server: socket.error reported errno.ENOENT (2)
For help, use /usr/local/bin/supervisord -h

The reason was that in my config I had the line:

[unix_http_server]
file=/var/lib/tornado/run/gkc.sock

but the directory /var/lib/tornado/run didn't exist. Creating that solved the problem.

Lesson learned from all this is that when specifying locations of .sock files always make sure the directories exist and that the current user can write to them.

This is helping me sooo much that it would a crime not to share it. It's actually nothing fancy, just a very convenient thing that I've learned to get used to. ff is an executable script I use to find files in a git repository. Goes like this:

$ ff list
templates/operations/network-packing-list.html
templates/sales/list_orders.html
$ ff venue
templates/venues/venues-by-special.html
templates/venues/venues.html
templatetags/venue_extras.py
templatetags/venues_by_network_extras.py
tests/test_venues.py

It makes it easy to super quickly search for added files without having to use the slow find command which would also otherwise find backup files and other junk that isn't checked in.

To install it, create a file called ~/bin/ff and make it executable:

$ chmod +x ~/bin/ff

Then type this code in:

#!/usr/bin/python
import sys, os
args = sys.argv[1:]
i = False
if '-i' in args:
   i = True
   args.remove('-i')
pattern = args[-1]
extra_args = ''
if len(args) > 1:
   extra_args = ' '.join(args[:-1])
param = i and "-i" or ""
cmd = "git ls-files | grep %s %s '%s'" % (param, extra_args, pattern)
os.system(cmd)

A couple of days ago I wrote about how blazing fast the DoneCal API can be on HTTP (1,400 requests/second) and how much slower it becomes when doing the same benchmark over HTTPS. It was, as Chris Adams pointed out, possible to run ab with Keep-Alive on and after some reading up it's clear that it's a good idea to switch on shared ssl_session_cache so that Nginx's SSL TCP traffic can cache some handshakes.

With ssl_session_cache shared:SSL:10m :

 Requests per second:    112.14 [#/sec] (mean)

Same cache size but with -k on the ab loadtest:

Requests per second:    906.44 [#/sec] (mean)

I'm fairly sure that most browsers with use Keep-Alive connections so I guess it's realistic to use -k when running ab but since this is a test of an API it's perhaps more likely than not that clients (i.e. computer programs) don't use it. To be honest I'm not really sure but it never the less feels right to be able to use ssl_session_cache to boost my benchmark by 40%.

It's also worth noticing that when doing a HTTP benchmark it's CPU bound on the Tornado (Python) processes (I use 4). But when doing HTTPS it's CPU bound on the Nginx itself (I use 1 worker process).

This is fairly obvious stuff I guess but it has troubled me for a long time. Some programs on Linux don't spit out their results to stdout. Instead they start a little program similar to less. So what is a console nerd to do?

Pipe it cat! I don't know why I've never thought of this before:

$ psql -l | cat

So I upgraded to the latest Ubuntu Lucid Lynx 10.04 the other day and to my horror it removed Python 2.4 and Python 2.5. I rely more on those programs than I do on some silly Facebook connecting social widget crap. On my laptop I have lots of Zopes requiring Python 2.4 and I have about 10 active Django projects that rely on Python2.5. This fuckup by Ubuntu caused me to write this complaint.

So my estimeed colleague and Linux wiz Jan Kokoska helped me set things straight by showing me how to downgrade these packages to Karmic version and how to pin them in the apt preferences. First of all, make your /etc/apt/source.list look like this:

deb http://gb.archive.ubuntu.com/ubuntu/ karmic main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ karmic main restricted universe multiverse

deb http://gb.archive.ubuntu.com/ubuntu/ karmic-updates main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ karmic-updates main restricted universe multiverse

deb http://gb.archive.ubuntu.com/ubuntu/ karmic-backports main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ karmic-backports main restricted universe multiverse

deb http://security.ubuntu.com/ubuntu karmic-security main restricted universe multiverse
deb-src http://security.ubuntu.com/ubuntu karmic-security main restricted universe multiverse

deb http://gb.archive.ubuntu.com/ubuntu/ lucid main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ lucid main restricted universe multiverse

deb http://gb.archive.ubuntu.com/ubuntu/ lucid-updates main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ lucid-updates main restricted universe multiverse

deb http://gb.archive.ubuntu.com/ubuntu/ lucid-backports main restricted universe multiverse
deb-src http://gb.archive.ubuntu.com/ubuntu/ lucid-backports main restricted universe multiverse

deb http://security.ubuntu.com/ubuntu lucid-security main restricted universe multiverse
deb-src http://security.ubuntu.com/ubuntu lucid-security main restricted universe multiverse

If you know what you're doing you might have other additional sources in there then keep those as is. Next thing to do is to update and upgrade:

# apt-get update
# apt-get dist-upgrade

You should now see that it's intending to upgrade a bunch of juicy packages like python2.4-dev for example. To check that python2.4 is now getting in from Karmic run this:

$ apt-cache madison python2.4

Now for the trick that really makes the difference:

# apt-get install python2.4=2.4.6-1ubuntu3.2.9.10.1 python2.4-dbg=2.4.6-1ubuntu3.2.9.10.1 \
python2.4-dev=2.4.6-1ubuntu3.2.9.10.1 python2.4-doc=2.4.6-1ubuntu3.2.9.10.1 \
python2.4-minimal=2.4.6-1ubuntu3.2.9.10.1

The command is quite self-explanatory. You use the equal sign to basically say what version you want to install. If you now for example want to install something like python-profiler for your Python 2.4 since this isn't available as a PyPi package. First, find out what version you have to install:

$ apt-cache madison python-profiler | grep karmic

From that list you'll get a bunch of versions. Chose the one from karmic-updates or karmic-security. Then install it:

# apt-get install python-profiler=2.6.4-0ubuntu1

Now, to avoid this causing a conflict and thus be removed the next time you do an upgrade you need to pin it. Create a file called /etc/apt/preferences and put the following into it:

Package: python-profiler
Pin: version 2.6.4-0ubuntu1
Pin-Priority: 999

And that concludes it. A word of warning from Jan:

"he slight problem is that with this setup, suppose a big security flaw was found in python-imaging and got patched in karmic that is still supported... you wouldn't get the package update. That is because it's pinned and while asterisks can be used in the version number, we don't know in advance what the version will match and what the Lucid version that we don't want will match"

"so you basically lose security upgrades for affected packages"

"minor annoyance when you have one or two packages on a laptop, but a big deal if you have a dozen packages on 100 VMs on server"

Having written about this helps me remember it myself for the next time I need it. Also, hopefully it will help other people who get bitten by this. Hopefully this will shame the Canonical guys into action so that the next time they don't haste their deprecation process and actually think about who's using their products. I bet a majority of Ubuntu's users care more about programming or something like that than they do about the ability to buy music on Ubuntu One or whatever it's called.

uwsgi is the latest and greatest WSGI server and promising to be the fastest possible way to run Nginx + Django. Proof here But! Is it that simple? Especially if you're involving Django herself.

So I set out to benchmark good old threaded fcgi and gunicorn and then with a source compiled nginx with the uwsgi module baked in I also benchmarked uwsgi. The first mistake I did was testing a Django view that was using sessions and other crap. I profiled the view to make sure it wouldn't be the bottleneck as it appeared to take only 0.02 seconds each. However, with fcgi, gunicorn and uwsgi I kept being stuck on about 50 requests per second. Why? 1/0.02 = 50.0!!! Clearly the slowness of the Django view was thee bottleneck (for the curious, what took all of 0.02 was the need to create new session keys and putting them into the database).

So I wrote a really dumb Django view with no sessions middleware enabled. Now we're getting some interesting numbers:

fcgi (threaded)              640 r/s
fcgi (prefork 4 processors)  240 r/s (*)
gunicorn (2 workers)         1100 r/s
gunicorn (5 workers)         1300 r/s
gunicorn (10 workers)        1200 r/s (?!?)
uwsgi (2 workers)            1800 r/s
uwsgi (5 workers)            2100 r/s
uwsgi (10 workers)           2300 r/s

(* this made my computer exceptionally sluggish as CPU when through the roof)

fcgi vs. gunicorn vs. uwsgi If you're wondering why the numbers appear to be rounded it's because I ran the benchmark multiple times and guesstimated an average (also obviously excluded the first run).

Misc notes

  • For gunicorn it didn't change the numbers if I used a TCP (e.g. 127.0.0.1:9000) or a UNIX socket (e.g. /tmp/wsgi.sock)
  • On the upstream directive in nginx it didn't impact the benchmark to set fail_timeout=0 or not.
  • fcgi on my laptop was unable to fork new processors automatically in this test so it stayed as 1 single process! Why?!!
  • when you get more than 2,000 requests/second the benchmark itself and the computer you run it on becomes wobbly. I managed to get 3,400 requests/second out of uwsgi but then the benchmark started failing requests.
  • These tests were done on an old 32bit dual core Thinkpad with 2Gb RAM :(
  • uwsgi was a bitch to configure. Most importantly, who the hell compiles source code these days when packages are so much much more convenient? (Fry-IT hosts around 100 web servers that need patching and love)
  • Why would anybody want to use sockets when they can cause permission problems? TCP is so much more straight forward.
  • changing the number of ulimits to 2048 did not improve my results on this computer
  • gunicorn is not available as a Debian package :(
  • Adding too many workers can actually damage your performance. See example of 10 workers on gunicorn.
  • I did not bother with mod_wsgi since I don't want to go near Apache and to be honest last time I tried I got really mysterious errors from mod_wsgi that I ran away screaming.

Conclusion

gunicorn is the winner in my eyes. It's easy to configure and get up and running and certainly fast enough and I don't have to worry about stray threads being created willy nilly like threaded fcgi. uwsgi definitely worth coming back to the day I need to squeeze few more requests per second but right now it just feels to inconvenient as I can't convince my sys admins to maintain compiled versions of nginx for the little extra benefit.

Having said that, the day uwsgi becomes available as a Debian package I'm all over it like a dog on an ass-flavored cookie.

And the "killer benefit" with gunicorn is that I can predict the memory usage. I found, on my laptop: 1 worker = 23Mb, 5 workers = 82Mb, 10 workers = 155Mb and these numbers stayed like that very predictably which means I can decide quite accurately how much RAM I should let Django (ab)use.

UPDATE:

Since this was publish we, in my company, have changed all Djangos to run over uWSGI. It's proven faster than any alternatives and extremely stable. We actually started using it before it was merged into core Nginx but considering how important this is and how many sites we have it's not been a problem to run our own Nginx package.

Hail uWSGI!

Voila! Now feel free to flame away about the inaccuracies and what multitude of more wheels and knobs I could/should twist to get even more juice out.