I zip, I slice, and I zip again
Yesterday, James Reeves posted some Haskell code on slashdot and sort of challenged others to come up with equivalent solution in other languages:
listToForest :: Eq a => [[a]] -> Forest a listToForest = map toBranch . groupBy ((==) `on` head) . filter (/= []) where toBranch = Node . (head . head) <*> (listToForest . map tail)
According to James, "assuming you know Haskell pretty well, [the code]'s fairly clear as well". He may be right, but for anyone who doesn't know Haskell, it looks downright scary. Anyway, what the code does is to convert some grid form of data (a list of lists in Python, e.g. rows of query result):
A I A
A I G
B D B
B W H
into some hierarchical form:
A -> I -> A
-> G
B -> D -> B
-> W -> H
Sounds easy? I thought so. After several failed attempts with Python, I finally realized the key for the Haskell code to be so concise was groupBy. And sure enough, Python got the equivalent in itertools.groupby. Nice. With that, here's a version in Python that is (hopefully) understandable by a mere mortal:
from itertools import groupby def make_tree(data): return [[node, make_tree([x[1:] for x in iterator if x[1:]])] for node, iterator in groupby(data, lambda x: x[0]) if node] >>> data = [['A', 'I', 'A'], ['A', 'I', 'G'], ['B', 'D', 'B'], ['B', 'W', 'H']] >>> print make_tree(data) [['A', [['I', [['A', []], ['G', []]]]]], ['B', [['D', [['B', []]]], ['W', [['H' , []]]]]]]
With groupby, slicing the data is a piece of cake. We iterate through the iterator returned by groupby, chops everyone's head off, and passes the bodies as a list to the next recursive call. Simple, and get the job done.
Looking further into the documentation for itertools, I found izip and islice. Despite the naming, they are not made by Apple, though they are still cool. They basically allow you to zip and slice an iterator as if it's a list:
from itertools import groupby, islice, izip def make_tree(data): return [[node, make_tree(izip(*islice(izip(*iterator), 1, None)))] for node, iterator in groupby(data, lambda x: x[0]) if node]
I am not sure if this iterator version performs better than the initial version, but I am pretty sure it is more dangerous, especially if the dataset is large, if you catch my drift.
Anyway, what James wanted to achieve is to then transform the tree into xml. Here's how to do so with my pseudo-tree:
indent = 4 def write_tag(tree, level=0): for node, children in tree: if children: print '%s<%s>' % (indent * level * ' ', node) write_tag(children, level+1) print '%s</%s>' % (indent * level * ' ', node) else: print '%s<%s/>' % (indent * level * ' ', node) >>> tree = make_tree(data) >>> write_tag(tree) <A> <I> <A/> <G/> </I> </A> <B> <D> <B/> </D> <W> <H/> </W> </B>
It's late and I need to sleep. To recap, the Haskell solution took 3 lines, and the Python solution took 3 lines. It's a draw. Peace. Let's all point to Java and laugh.