Weighty Choices
I was tasked at work last week with coming up with a 2 million+ record dataset for some load tests we’re running on our application. I had a day’s worth of production data that I need to extrapolate to six months. Another opportunity to use my favorite python module: random
What I wanted was something like the choice function, but one to which I could pass a dictionary with the keys representing the choice list and the values representing their relative weights so that I could get a representative distribution. Since the random module doesn’t offer one itself, I was left to my own devices, which are admittedly clumsy and slow. The situation begged for a stupid lambda trick, but I was pressed for time and so I just threw this together:
def weighted_choice(ChoiceDict):
wsum = sum([w for w in ChoiceDict.values()])
n = random.uniform(0, wsum)
for k in ChoiceDict:
if n < ChoiceDict[k]: break
n = n - ChoiceDict[k]
return k
A little later, I passed the lambda challenge off to a couple of my colleagues. Later in the day, I had a chance to rattle off my own:
weighted_choice = lambda d: random.choice(reduce(list.__add__, [[a for a in k for n in range(d[k])] for k in d.keys()]))
But the Letterman spot goes to my colleague Leonard, for the conciseness and nice symmetry of his solution:
weighted_choice = lambda d: random.choice([k_ for s in [[k]*w for k,w in d.iteritems()] for k_ in s])
Find a script testing these and a couple other variations on the function here:
http://klenwell.googlecode.com/svn/trunk/pastebin/weighted_choice.py