I’ve got to admit: Python is pretty cool when it comes to quickly writing powerful scripts. I wanted to extract the number of all-time downloads from DrJava’s SourceForge statistics page, but it wasn’t on the same line as the “Total” word, so a simple sed one-liner wasn’t enough.
Of course I could have written it in Java, but that would have involved compiling it and having class files in addition to the Java file. I probably could have written it as a bash script, but to be honest, bash is pretty clunky. Python did the job easily and well.
import shutil
import os
import time
import datetime
import math
import urllib
import re
from array import array
filehandle = urllib.urlopen('http://sourceforge.net/project/stats/detail.php?group_id=44253&ugn=drjava&type=prdownload&mode=alltime&file_id=0')
found = False
for lines in filehandle.readlines():
if found:
text = lines.strip()
p = re.compile(r'<.*?>')
text = p.sub('', text)
# p = re.compile(r',')
# text = p.sub('', text)
print text
break
if lines.find('Total') != -1:
found = True
filehandle.close()
There are probably better, more elegant ways of doing this, but Python is one of those languages that I use but never learned, just like Perl or PHP. Maybe it’s a “P” thing. No, it isn’t, I actually learned Pascal at my Gymnasium (German secondary school) and at university in Germany.
Anyway, using this script I have now integrated a download counter on the DrJava website that gets updated every midnight. We’re seriously getting close to a million, faster than I expected. This is probably because of the new DrJava beta version we released.
With less than 5,000 downloads to go, we might hit the million early in May already!