excel和scipy累积二项式分布p值的区别?

我有这张表(NumSucc =成功次数,NumberTrials =试验次数,Prob是成功概率):

Gene NumSucc NumTrials Prob Gene1 16 26 0.9548 Gene2 16 26 0.9548 Gene3 12 21 0.9548 Gene4 17 27 0.9548 Gene5 17 27 0.9548 Gene6 17 27 0.9548 Gene7 8 15 0.9548 Gene8 10 17 0.9548 

我想要一个累积二项分布P值为每一行。 当我把这个精确的表格放入excel列AD中,然后在E列中input函数(例如,对于第2行):

 =BINOMDIST(B2,C2,D2,1) 

输出表如下所示:

 Gene NumSucc NumTrials Prob Binomial Gene1 16 26 0.9548 9.68009E-08 Gene2 16 26 0.9548 9.68009E-08 Gene3 12 21 0.9548 1.40794E-07 Gene4 17 27 0.9548 1.47463E-07 Gene5 17 27 0.9548 1.47463E-07 Gene6 17 27 0.9548 1.47463E-07 Gene7 8 15 0.9548 1.79741E-06 Gene8 10 17 0.9548 5.01334E-06 

另外,当我把这个精确的表格放到Scipy中,用下面的代码:

 import glob import os import scipy from scipy.stats.distributions import binom import sys def WriteBinomial(InputFile,output): open_input_file = open(InputFile, 'r').readlines()[1:] for line in open_input_file: line = line.strip().split() GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3]) print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ-1, num_trials, prob))) WriteBinomial(sys.argv[1],sys.argv[2]) 

输出是:

 GeneName NumSucc NumTrials Prob Binomial Gene1 16 26 0.9548 6.59829603211e-09 Gene2 16 26 0.9548 6.59829603211e-09 Gene3 12 21 0.9548 7.92014917046e-09 Gene4 17 27 0.9548 1.06754559723e-08 Gene5 17 27 0.9548 1.06754559723e-08 Gene6 17 27 0.9548 1.06754559723e-08 Gene7 8 15 0.9548 8.41770305586e-08 Gene8 10 17 0.9548 2.93060582331e-07 

有谁知道为什么这两种方法不会给出相同的结果?

你的Python代码有“num_succ-1”,而你的Excel公式没有“B2-1”。

Python – >“binom.cdf(num_succ-1,num_trials,prob)”Excel – >“= BINOMDIST(B2,C2,D2,1)”

下面的代码应该产生与excel相同的输出。

 import glob import os import scipy from scipy.stats.distributions import binom import sys def WriteBinomial(InputFile,output): open_input_file = open(InputFile, 'r').readlines()[1:] for line in open_input_file: line = line.strip().split() GeneName,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3]) print GeneName + "\t" + str(num_succ) + "\t" + str(num_trials) + "\t" + str(prob) + "\t" + str((binom.cdf(num_succ, num_trials, prob))) WriteBinomial(sys.argv[1],sys.argv[2]) 
Interesting Posts