Thu, 08 Feb 2007

(Warning: Extreme nerdiness follows. You are not expected to understand this if you are a family member whose last name is not "Sambol". Feel free to move along.)

So instead of doing anything useful, I spent all of my free time today designing and implementing and measuring and optimizing and refining an algorithm for determining the prime factors of an integer. Here's the final result (written in Ruby):

def factor(n)
    def primes_up_to (n)
        r = (2..n).to_a
        primes = []
        while r.first**2 < n
            primes.push(r.first)
            r.reject! { |i| i % primes.last == 0 }
        end
        primes + r
    end
    factors = {}
    for p in primes_up_to((n**0.5).ceil)
        while n % p == 0
            factors[p] ? factors[p] += 1 : factors[p] = 1
            n /= p
        end
    end
    factors[n] = 1 if n != 1
    factors
end

I got the urge to do this after incidentally running across a largish number in the middle of a random sysadmin task, and idly wondering whether or not it was a power of 2. The actual factorization algorithm was fairly easy to come up with and not all that interesting: just test whether every eligible prime number divides the integer evenly and, if so, count how many times it does so. The interesting part is generating the list of prime numbers (which is the job of the "primes_up_to" subroutine).

I vaguely remembered that some ancient Greek guy had come up with a simple method for finding all prime numbers smaller than a given number, and I preferred to look this up rather than reinvent that particular wheel. Some googling revealed that the "sieve of Eratosthenes" was what I had in mind. It goes a little like this: start with the list of all the integers from 2 to whatever; note that the first member of this list is prime and remove it and all its multiples from the list; keep repeating that last step until the list is empty. The code for doing exactly that is actually a bit different from the function mentioned above:

def primes_up_to (n)
    r = (2..n).to_a
    primes = []
    while not r.empty?
        primes.push(r.first)
        r.reject! { |i| i % primes.last == 0 }
    end
    primes
end

The final version differs from this one primarily in the test for terminating the iteration. By tracing out the way that the list changed during each iteration of the sieve, I discovered that this initial implementation did quite a bit of unnecessary filtering. After a certain point relatively early on, there is nothing but prime numbers left in the list, so there's no point in wasting energy trying to weed out their multiples. A serendipitous consultation of my senior-level algorithms textbook unearthed the (unproved) assertion that, "Trial division by all integers up to B is guaranteed to factor completely any number up to B2." After ruminating on why this must be a true theorem [1], it quickly became obvious that a corollary is that an array of the integers from 2 to B2 will contain only primes if you eliminate multiples of the primes from 2 to B [2]. This meant that I could terminate the while loop in primes_up_to as soon as the first member in the list is larger than the square root of the largest integer in the list. At that point, all the numbers left in the list can be dumped into the list of primes that's been built up; no further testing needed.

My curiosity led me to another succinct and elegant implementation of Eratosthenes' sieve:

def primes_up_to (n)
    (2..(n**0.5).ceil).inject((2..n).to_a) { |r, i|
        r.select { |j| j == i || j % i != 0 }
    }
end

I'm charmed by the functional style exhibited here. I've yet to see any code that uses Ruby's inject properly that doesn't exude a fanciful cleverness. Unfortunately, measurements clearly showed that the running time for this version is much worse than that of my imperative version. Intractably so, as far as I can tell. It wastes time performing a filtering run for each of the integers up to square root of the maximum integer, regardless of whether they're actually prime.

Perhaps I'll more formally analyze the running time of these two implementations of Eratosthenes' sieve, but that will have to wait for another day.


[1] If B2 has a prime factor greater than B (let's call it A), then A can only form B2 by multiplying with a number smaller than B (let's call it C), else A*C would be too large. Since C is less than B, it's already been tested as a factor, which simultaneously reveals A as a factor (B2/C = A).

[2] If the list contained a composite number at this point (let's call it X), then that composite number would have only factors larger than B. Since the factors are larger than B, then the result of multiplying the factors (i.e. X) would have to be larger than B2. But the list doesn't contain numbers larger than B2, by definition, so X can't be in the list.

| last updated: 23:23 | show only this entry | printable version | category: /computing | 6 comments |
Wed, 03 May 2006

A while back, Becca and I swapped cell phones for a day just for the fun of seeing how they differed. Since it was quite plain that she loved one of the games that my phone had built in, I resolved to write an implementation of it for her that she could use without borrowing my phone. So here it is. I had a lot of fun brushing up on my JavaScript knowledge in the process.

Update 4:04 May 4, 2006: I tweaked the code to work on KHTML-based browsers like Safari and Konqueror.

| last updated: 14:13 | show only this entry | printable version | category: /computing | 0 comments |
Tue, 07 Feb 2006

Most of this past Sunday was spent finishing up a first workable stab at solving a small problem in Web development that I'd been thinking about for a long time. I don't ask for much: just a simple little template system that works well for a set of HTML documents which are mostly static, as opposed to a template system that's only really appropriate for writing highly dynamic Web applications. You know, just a little something that makes it easy to factor out the common bits of information that should be the same for all pages in a Web site so you can edit the common stuff in just one place and have the change propogate to all pages on the site.

This isn't a new problem by any stretch of the imagination, but all the solutions I'd ever seen (from HTML pre-processors to "include" directives in SSI or PHP) turned the pattern I wanted to use inside out. Instead of studding every single page in a big collection of HTML documents with commands to suck in a common header and footer, I wanted to keep my documents clean and free of their surrounding context as much as possible. The more generic container should include the specific document content, not the other way around.

Since I finally got around to treating myself to a serious education in Rails last week, I discovered that my prayers for a more beautiful alternative to the inclusion pattern were answered by its layout system. My only problem was that Rails is very heavily geared toward writing database-backed Web applications that follow the MVC pattern, and doing anything else with it goes against the many assumptions that are part and parcel of the framework. However, Ruby on Rails scores big points in my book by being very flexible about its assumptions. If you are patient enough and clever enough to figure out how, you can very cleanly override Rails' assumptions and I was thus able to scratch my Web development itch with a quite elegant solution.

Since I want to keep the bulk of the content for the Web sites I maintain in a directory-structured filesystem instead of a database, I was already violating Rails' most obvious assumption of database-centered storage. But it turns out that the more significant issue was that Rails assumes that each URL that it receives should correspond to a method in an instance of a controller class, while I wanted to retain the more boring, traditional custom of mapping from URLs into files in the filesystem tree. I certainly wasn't going to try to predict and enumerate the name of every file that I'd want to serve and hard-code a method for it in the URL-handling class for Rails. So I got around this at first with Rails' routing facility, specifically by using a PathComponent in a route that would direct everything to a single method in my document-serving controller class with the filename part of the URL passed handily as a parameter.

Very neat and clean: the only code I had to write was a very simple method that does a little preprocessing on the requested file name and calls a function to render the appropriate document with the right layout. But there's still a problem. When I actually deployed the site as a CGI program run by Apache HTTPD, it was horribly slow. The computer I use for my Web server might not have the most modern hardware in existence, but I still don't consider a 1.2 GHz CPU with 512 MB of RAM underpowered at all. And yet, the overhead for starting the Rails framework code is so large that it's completely impractical to wait for it for every single HTTP request. This is obviously why the Rails community is so big on FastCGI. So I tried using FastCGI. I really honestly tried. But neither mod_fastcgi nor mod_fcgid for Apache 1.x or Apache 2.x worked reliably for me. I spent enough time trying to debug the setup with completely insufficient error messages to get fed up with the affair.

It was easy for me to give up because I knew I had an alternative. I knew that I was only really using a small bit of what Rails provides, namely ActionPack, although that piece does happen to be one of the most significant ones. Since ActionPack can be used easily all by itself, I could use it to write a simple, fast, standalone CGI script that is nearly identical to the controller I wrote for use within the context of Rails. The main adaptation that had to be made was to compensate for the loss of the above-mentioned routing system. Without Rails' routing, I wasn't sure how to map all URL requests to the a single method within my controller class. I was back to working around the assumption that every different URL would be handled by a different method in the controller class. I knew there had to be a way to write a Ruby class that would respond to any method at all, without having to mention the name of the method in my code. I was goaded by the fact that I knew exactly how to do this simplistic sort of metaprogramming in Python, as I'd used this sort of trick in the past: just override the __getattr__ method, and the name of the attribute/method that the caller tried to invoke will be found as the first parameter. After doing enough looking around, I discovered that Ruby's equivalent is the "method_missing" method.

Voila! A CGI script that neatly renders any old HTML document that you've templated with embedded Ruby code, assuming that the name of the document is given in the "action" parameter passed either as a GET or POST variable. But we don't want URLs that look like "http://example.com/cgi-bin/document_controller.cgi?action=some/thing.html". We want "http://example.com/some/thing.html". So Apache's mod_rewrite comes to the rescue with the following .htaccess incantations:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /cgi-bin/document_controller.cgi?action=$1 [L]

And we're done. Documents that share a common layout without being riddled with superfluous repetition. Pretty URLs. All done with mere teeny snippets of code. I'm happy.

(Well, OK. The script certainly has room for improvement. Most significantly, different sets of documents on the same site should be able to use different layouts. Ideally, the decision of which layout to use would be based on which directory the document resides in. But that's something for another day.)

| last updated: 12:19 | show only this entry | printable version | category: /computing | 4 comments |
Tue, 20 Dec 2005

Hooray! After a night of letting it cool down, the old hard drive responded well to the repair procedure on the vendor's diagnostic/recovery CD. This means that I've been able to grab all the data I care about from the old disk. Since this is the second time that this drive has decided to fail within the past few months, I no longer trust it and I bought a replacement for it this morning. Since the failing drive is still under warranty I'm going to get the vendor to send me a replacement also. It will be nice to have two hard disks because I will then be able to take all the tedium out of making backups through the magic of RAID 1 (which does disk mirroring).

| last updated: 12:27 | show only this entry | printable version | category: /computing | 0 comments |
Mon, 19 Dec 2005

Bah! My hard drive seems to have died as of a few hours ago. Fortunately, I'm well-prepared, so the computer is back online again with a small replacement drive that I've got handy for just such an emergency. My last data backup was made a month ago, so the loss is stinging, but not staggering. You'll notice that the diary entries in the past 30 days or so have disappeared, and that is why. If anyone happens to have copies of those old diary entries saved somewhere, I'd be happy to accept them. If I'm lucky, I might be able to pry some data off of the old drive, but it's not to be counted on. This sort of thing is still very annoying, no matter how prepared you are.

| last updated: 20:26 | show only this entry | printable version | category: /computing | 2 comments |