将海量数据导出合并到“R”中,而不必逐个添加剪切和粘贴行

我在一个数据集中有超过40000个观察数据,其中有250多个variables的公司和各种数量的会议,与会者,代表,代表人数等等。

使用R代码,我创build了一个只有四个variables的新数据集,并且我想要将其导出到Excel中的描述性统计信息引起我的兴趣:

Subset.MergedEx.SO <- mergedex1.SO[, c(10, 72, 73, 120, 121 )] 

variables号对应于以下列名称

 mergedex1.SO <- c("sn", "earntot", "earnctot", "meeting65", "meeting55") 

“sn”代表公司名称,其余是与会议的各种度量,持续时间,演讲人数等相对应的variables。

之后,我使用五个variables,而不是原来的250,在40,000个观测数据集中制作了对应于每个特定公司的数据集的子集。

代码如下:

 BroomeStreet <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Broome Street'),] CompanyA <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company A'),] CompanyB <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company B'),] CompanyC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company C'),] CompanyBC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company BC'),] CompanyCC <- Subset.MergedEx.SO[ which(Subset.MergedEx.SO$sn=='Company CC'),] 

等等超过45家公司。 [后来我将按照公司名称和date创build子集,从1965年到1987年,这就是为什么我要求整个问题只是这个孤立的实例,其中涉及的公司并不重要。

我的任务是为“sn”列之后的每个variables提取描述性统计信息。 我正在寻找标题为“earntot”的variables的平均值,标准偏差,最小值,最大值和观测值的数目; 平均值,标准差,最小值,最大值和观测值的数量,以及variables“会议55”和“会议65”的相同描述性统计。

我能够使用下面的代码和一个特定的公式来完成这个任务:

 EarntotCompanyA <-CompanyA$earntot EarnctotCompanyA <-CompanyA$earnctot meet55CompanyA<-CompanyA$meet55 meet65CompanyA <-CompanyA$meet65 CompanyA_ALL_INFORMATION<-cbind(EarntotCompanyA,EarnctotCompanyA, meet55CompanyA,meet65CompanyA) library(psych) info<-describe(CompanyA_ALL_INFORMATION) n<-info[,2] # vector of total number mean<-info[,3] # vector of mean sd<-info[,4] # vector of sd min<-info[,8] # vector of min max<-info[,9] # vector of max #this is ordered by the naming function below value<-round(c(mean,sd,min,max,n),2) col.names<-naming(CompanyA_ALL_INFORMATION) descriptives<-t(as.data.frame(value)) colnames(descriptives)<-col.names rownames(descriptives)<-"Company A" library(xlsx) write.xlsx(descriptives, "descriptives.CompanyA.xlsx") 

在完成这个之后,我在Excel中获得了一行,并提供了我需要的信息。

然后我按照上述相同的步骤,除了使用不同的公司获得另一个单独的文件,如“descriptive.CompanyB.xlsx”,“descriptives.CompanyC.xlsx”,….

我从50多个打开的Excel窗口中的每一个窗口中剪切并粘贴所有行,并将它们组合到另一个包含我想要的所有信息的独立Excel窗口中。

单行的例子如下所示:

 average.number.of.EarntotCompanyA average.number.of.EarnctotCompanyA average.number.of.meet55CompanyA average.number.of.meet65CompanyA standard.deviation.of.EarntotCompanyA standard.deviation.of.EarnctotCompanyA standard.deviation.of.meet55CompanyA standard.deviation.of.meet65CompanyA min.number.of.EarntotCompanyA min.number.of.EarnctotCompanyA min.number.of.meet55CompanyA min.number.of.meet65CompanyA max.number.of.EarntotCompanyA max.number.of.EarnctotCompanyA max.number.of.meet55CompanyA max.number.of.meet65CompanyA total.number.of.EarntotCompanyA total.number.of.EarnctotCompanyA total.number.of.meet55CompanyA total.number.of.meet65CompanyA Company A 16.58 22.91 1 1.85 15.68 16.81 1.75 2.34 0 0 0 0 84.11 92.11 5 9 176 176 69 229 

我怎样才能让所有的行都出现在一个单独的文件中,而不必单独获取每一行,必须从每个单独的excel文件中剪切和粘贴,然后将其粘贴到单独的文件中。 我在后台打开了超过50个excel文件,并提供了我所需要的准确信息,但一次只能提供一个。

以下是数据的一个可重现的例子:

 > dput((head(Subset.MergedEx.SO, 120))) structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L, 1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA, 21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L, 22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L, 11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L, 15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L, 15L, 15L, 29L, NA, 21L, 6L, 35L, 32L, 32L, 7L, 31L, 23L, 23L, 1L, 29L, 34L, 34L, 34L, 17L, 24L, 24L, 24L, 24L, 7L, 16L, 7L, 23L, 23L, 34L, 29L, 15L, NA, 35L, 24L, 27L, 33L, 35L, 10L, 34L, 33L, 34L), .Label = c("Broome Street", "Company A", "Company B", "Company BC", "Company C", "Company CC", "Company D Clinton", "Company DD", "Company E", "Company ED BroadCompany", "Company G", "Company H BroadCompany", "Company I BroadCompany", "Company I Studio", "Company J", "Company K", "Company L", "Company M", "Company M BroadCompany", "Company M HS BroadCompany", "Company MCC BroadCompany", "Company N", "Company P", "Company Q", "Company Q Company N", "Company Q Company ZZ", "Company R - Company ZZ", "Company SLab", "Company Z", "Company ZE", "Company ZED", "Company ZEQ", "Company ZZ", "Company ZZQ", "Company ZZQ Company N"), class = "factor"), earntot = c(21.85, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, NA, NA, 25.82, 40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 8.25, NA, NA, NA, NA, NA, 51.12, 39.9, NA, 37.48, 32.74, NA, NA, NA, 33.4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 30.82, NA, NA, NA, NA, NA, 5.74, NA, NA, NA, NA, NA, NA, NA, NA, 44.48, NA), earnctot = c(29.43, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 49.9, NA, 37.56, NA, NA, 41.98, NA, NA, NA, NA, NA, 37.32, 49, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 37, NA, NA, NA, NA, NA, NA, 47.68, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57.29, 48.48, NA, 7, 45.9, 0, NA, NA, 15.75, NA, NA, NA, NA, NA, 54.12, 46.65, NA, 45.56, 39.9, NA, NA, NA, 39.98, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.4, NA, NA, NA, NA, NA, 12.9, NA, NA, NA, NA, NA, NA, NA, NA, 52.06, NA), meet55 = c(0L, 0L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, NA, NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L, NA, NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L, NA, NA, NA, NA, NA, 4L, 3L, 5L, NA, NA, NA, 1L, NA, 0L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 5L, NA, NA, NA, NA, NA, 0L, NA, 0L, NA, NA, NA, NA, NA, NA, NA, NA), meet65 = c(0L, 0L, 5L, 0L, 6L, NA, 0L, 5L, NA, 5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, NA, NA, NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, NA, NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA, NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L, 0L, 5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L, NA, 8L, NA, 0L, 1L, 7L, 0L, 4L, 7L, 0L, 3L, 0L, NA, NA, 7L, 5L, 8L, 5L, 5L, 6L, 5L, 6L, 5L, 2L, 0L, 8L, 7L, 7L, 5L, 0L, NA, 0L, 6L, NA, 8L, 8L, 5L, 7L, 7L, 6L)), .Names = c("sn", "earntot", "earnctot", "meet55", "meet65" ), row.names = c(NA, 120L), class = "data.frame") 

我build议

 # install.packages("dplyr") # uncomment and run if you have to library(dplyr) Subset.MergedEx.SO %>% group_by(sn) %>% summarise_each(funs(n(), mean(., na.rm = TRUE), sd(., na.rm = TRUE), min(., na.rm = TRUE), max(., na.rm = TRUE))) %>% write.csv2(tf <<- tempfile(fileext = ".csv")) cat(tf) # open that file in excel 

您可能需要调整write.csv2 (即使用write.csvwrite.tablesep="\t" )取决于您的Excel / OSconfiguration。