sayap's blog

while 1: yield None

Entries tagged “python”

Mythtv's xmltv grabber for Malaysia channels

written by sayap, on Dec 30, 2008 11:08:00 PM.

Ever since I got mythtv up and running months ago, I have always wanted to use the Electronic Program Guide (EPG) feature. Unfortunately, getting tv schedules in a format understandable by mythtv (i.e. xmltv) is not so easy.

From a bit of googling, I found 2 (non-)solutions. The first one involves using tvxb through wine to grab tv schedules from Astro through screenscraping. Apparently, it doesn't work anymore, as the tvxb site is showing the following message:

All Astro satellite channels (No longer works - needs updating. 2008/10/12)

The other solution is a perl script written by Shahada Abubakar that also screenscrapes Astro listing. Like the first one, this solution has also ceased to be working, due to the flaky nature of screenscraping.

Of course, the googling and testing were just unnecessary foreplay. I was set at the beginning to come up with my own solution anyway. With the help of wonderful python libraries such as BeautifulSoup and lxml, I wrote a xmltv grabber that:

  • can screenscrape either Astro or The Star listings for channels rtm1, rtm2, tv3, ntv7, 8tv, and tv9

  • is functioning as of 2008-12-31

Here's the script: grabmy.py

To get it to work, install the requirements first:

easy_install BeautifulSoup lxml httplib2 python_dateutil

Then, run the script to generate a xmltv file:

python grabmy.py -f my.xml

Feed mythbackend with the file:

mythfilldatabase --file 1 my.xml

And finally, here's the EPG in its full glory if you channel-flip at 2am:

epg

I zip, I slice, and I zip again

written by sayap, on Sep 21, 2008 3:24:00 AM.

Yesterday, James Reeves posted some Haskell code on slashdot and sort of challenged others to come up with equivalent solution in other languages:

listToForest :: Eq a => [[a]] -> Forest a
listToForest = map toBranch . groupBy ((==) `on` head) . filter (/= [])
           where toBranch = Node . (head . head) <*> (listToForest . map tail)

According to James, "assuming you know Haskell pretty well, [the code]'s fairly clear as well". He may be right, but for anyone who doesn't know Haskell, it looks downright scary. Anyway, what the code does is to convert some grid form of data (a list of lists in Python, e.g. rows of query result):

A I A
A I G
B D B
B W H

into some hierarchical form:

A -> I -> A
       -> G
B -> D -> B
  -> W -> H

Sounds easy? I thought so. After several failed attempts with Python, I finally realized the key for the Haskell code to be so concise was groupBy. And sure enough, Python got the equivalent in itertools.groupby. Nice. With that, here's a version in Python that is (hopefully) understandable by a mere mortal:

from itertools import groupby

def make_tree(data):
    return [[node, make_tree([x[1:] for x in iterator if x[1:]])]
            for node, iterator in groupby(data, lambda x: x[0]) if node]

>>> data = [['A', 'I', 'A'], ['A', 'I', 'G'], ['B', 'D', 'B'], ['B', 'W', 'H']]
>>> print make_tree(data)
[['A', [['I', [['A', []], ['G', []]]]]], ['B', [['D', [['B', []]]], ['W', [['H'
, []]]]]]]

With groupby, slicing the data is a piece of cake. We iterate through the iterator returned by groupby, chops everyone's head off, and passes the bodies as a list to the next recursive call. Simple, and get the job done.

Looking further into the documentation for itertools, I found izip and islice. Despite the naming, they are not made by Apple, though they are still cool. They basically allow you to zip and slice an iterator as if it's a list:

from itertools import groupby, islice, izip

def make_tree(data):
    return [[node, make_tree(izip(*islice(izip(*iterator), 1, None)))]
            for node, iterator in groupby(data, lambda x: x[0]) if node]

I am not sure if this iterator version performs better than the initial version, but I am pretty sure it is more dangerous, especially if the dataset is large, if you catch my drift.

Anyway, what James wanted to achieve is to then transform the tree into xml. Here's how to do so with my pseudo-tree:

indent = 4
def write_tag(tree, level=0):
    for node, children in tree:
        if children:
            print '%s<%s>' % (indent * level * ' ', node)
            write_tag(children, level+1)
            print '%s</%s>' % (indent * level * ' ', node)
        else:
            print '%s<%s/>' % (indent * level * ' ', node)

>>> tree = make_tree(data)
>>> write_tag(tree)
<A>
    <I>
        <A/>
        <G/>
    </I>
</A>
<B>
    <D>
        <B/>
    </D>
    <W>
        <H/>
    </W>
</B>

It's late and I need to sleep. To recap, the Haskell solution took 3 lines, and the Python solution took 3 lines. It's a draw. Peace. Let's all point to Java and laugh.