赞
踩
I need to generate n percentages (integers between 0 and 100) such that the sum of all n numbers adds up to 100.
If I just do nextInt() n times, each time ensuring that the parameter is 100 minus the previously accumulated sum, then my percentages are biased (i.e. the first generated number will usually be largest etc.). How do I do this in an unbiased way?
解决方案
A couple of answers suggest picking random percents and taking the differences between them. As Nikita Ryback points out, this will not give the uniform distribution over all possibilities; in particular, zeroes will be less frequent than expected.
To fix this, think of starting with 100 'percents' and inserting dividers. I will show an example with 10:
% % % % % % % % % %
There are eleven places we could insert a divider: between any two percents or at the beginning or end. So insert one:
% % % % / % % % % % %
This represents choosing four and six. Now insert another divider. This time, there are twelve places, because the divider already inserted creates and extra one. In particular, there are two ways to get
% % % % / / % % % % % %
either inserting before or after the previous divider. You can continue the process until you have as many dividers as you need (one fewer than the number of percents.)
% % / % / % / / % % % / % % % /
This corresponds to 2,1,1,0,3,3,0.
We can prove that this gives the uniform distribution. The number of compositions of 100 into k parts is the binomial coefficient 100+k-1 choose k-1. That is
(100+k-1)(100+k-2)...101 / (k-1)(k-2)*...*2*1
Thus the probability of choosing any particular composition is the reciprocal of this. As we insert dividers one at a time, first we choose from 101 positions, then 102, 103, etc until we get to 100+k-1. So the probability of any particular sequence of insertions is 1 / (100+k-1)*...*101. How many insertion sequences give rise to the same composition? The final composition contains k-1 dividers. They could have been inserted in any order, so there are (k-1)! sequences that give rise to a given composition. So the probability of any particular composition is exactly what it should be.
In actual code, you probably wouldn't represent your steps like this. You should be able to just hold on to numbers, rather than sequences of percents and dividers. I haven't thought about the complexity of this algorithm.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。