如何编写R来循环设置目录中每个文件的每个工作表

我在这里有一个脚本，它可以很好地获得某一列中的数字。现在我想收集不仅在目录中的每个文件的第一张，但每个文件的每张。

现在.csv文件R写了显示2列，列A是文件名，而B是数字R抓取。

我应该添加到下面的脚本有一个csv输出，显示3列，A是文件名，B是sheetnames，C是数字？

require(xlsx) #setwd setwd("D:\\Transferred Files\\") files <- (Sys.glob("*.xls")) f<-length(files) DF <- data.frame(txt=rep("", f),num=rep(NA, f),stringsAsFactors=FALSE) # files loop for(i in 1:f) { A<-read.xlsx(file=files[i],1,startColumn=1, endColumn=20, startRow=1, endRow=60) #Find price B<-as.data.frame.matrix(A) P<-B[which(apply(B, 1, function(x) any(grepl("P", x)))),which(apply(B, 2, function(x) any(grepl("P", x))))+6] #fill price DF DF[i, ] <-c(files[i],P) } write.csv(DF, "prices.csv", row.names=FALSE)

我试过XLconnet，但是不能真的把它变成这个。

你有一个好的开始，但是你正在问如何在文件中添加循环工作表。如果你阅读了?read.xlsx ，你会看到两个参数，你在你的代码中掩盖了（好，使用一个，忽略另一个）：

 Usage: read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL, startRow=NULL, endRow=NULL, colIndex=NULL, as.data.frame=TRUE, header=TRUE, colClasses=NA, keepFormulas=FALSE, encoding="unknown", ...) Arguments: file: the path to the file to read. sheetIndex: a number representing the sheet index in the workbook. sheetName: a character string with the sheet name.

你只需要提供其中的一个。

你可能会问： “我怎么知道工作表中有多less张？” （对于sheetIndex ）甚至是“表单名称是什么？ （对于sheetName ）。 ?getSheets来拯救：

 Usage: getSheets(wb) Arguments: wb: a workbook object as returned by 'createWorksheet' or 'loadWorksheet'. Value: 'getSheets' returns a list of java object references each pointing to an worksheet. The list is named with the sheet names.

您需要使用loadWorkbook(file)而不是read.xlsx才能获取表单名称，但是稍微阅读手册将为您提供需要切换的信息。（你可以使用像getSheets(loadWorkbook(file)) ，但以我的经验，我尝试避免在同一个脚本中多次打开同一个文件，而不pipe自动closures。）

作为替代，哈德利的readxl软件包在其简单性，速度和稳定性方面显示出前景。它有excel_sheets()和read_excel() ，它们可以满足你的需要。（事实上，这就是…简单是“一件好事（tm）”）。

编辑：

 library(XLConnect) ## Loading required package: XLConnectJars ## XLConnect 0.2-11 by Mirai Solutions GmbH [aut], ## Martin Studer [cre], ## The Apache Software Foundation [ctb, cph] (Apache POI, Apache Commons ## Codec), ## Stephen Colebourne [ctb, cph] (Joda-Time Java library) ## http://www.mirai-solutions.com , ## http://miraisolutions.wordpress.com ## Attaching package: 'XLConnect' ## The following objects are masked from 'package:xlsx': ## createFreezePane, createSheet, createSplitPane, getCellStyle, getSheets, loadWorkbook, removeSheet, saveWorkbook, setCellStyle, setColumnWidth, setRowHeight wb1 <- loadWorkbook('Book1.xlsx') shts1 <- getSheets(wb1) shts1 ## [1] "Orig" "Sheet2" "Sheet8" "Sheet3" "Sheet4" "Sheet5" "Sheet6" "Sheet7" for (ws in shts1) { message(ws) # just announcing myself dat <- readWorksheet(wb1, ws) message(paste(dim(dat), collapse=' x ')) # do something meaningful, not this } ## Orig ## 128 x 11 ## Sheet2 ## 128 x 11 ## Sheet8 ## 128 x 19 ## Sheet3 ## 17 x 11 ## Sheet4 ## 128 x 11 ## Sheet5 ## 128 x 11 ## Sheet6 ## 128 x 11 ## Sheet7 ## 128 x 11

编辑＃2 ：

作为更详细的迭代示例：

 library(XLConnect) for (fn in list.files(pattern="*.xlsx")) { message('Opening: ', fn) wb <- loadWorkbook(fn) shts <- getSheets(wb) message(sprintf(' %d Sheets: %s', length(shts), paste(shts, collapse=', '))) for (sh in shts) { dat <- readWorksheet(wb, sh) ## do something meaningful with the data } }

我不确定你在做什么，你的代码（因为你从来没有说过什么是包含在任何电子表格），但另一种方法（我会用来代替以前的双 – 例如）是附上列表中的一切：

 dat <- sapply(list.files(pattern='*.xlsx'), function(fn) { wb <- loadWorkbook(fn) sapply(getSheets(wb), function(sh) readWorksheet(wb, sh)) }) str(dat, list.len=2) ## List of 4 ## $ Book1.xlsx:List of 8 ## ..$ Orig :'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## ..$ Sheet2:'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## .. [list output truncated] ## $ Book2.xlsx:List of 8 ## ..$ Orig :'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## ..$ Sheet2:'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## .. [list output truncated] ## [list output truncated]

如果您不关心如何区分特定工作表来自哪个工作簿，并随后简化处理数据，则可以将嵌套列表“拼合”成一个列表：

 flatdat <- unlist(dat, recur=FALSE) str(flatdat, list.len=3) ## List of 555 ## $ Book1.xlsx.Orig :'data.frame': 128 obs. of 11 variables: ## ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## ..$ c1 : num [1:128] 1 1 1 1 1 1 1 1 1 1 ... ## .. [list output truncated] ## $ Book1.xlsx.Sheet2:'data.frame': 128 obs. of 11 variables: ## ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## ..$ c1 : num [1:128] 1 1 1 1 1 1 1 1 1 1 ... ## .. [list output truncated] ## $ Book1.xlsx.Sheet8:'data.frame': 128 obs. of 19 variables: ## ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## ..$ c1 : num [1:128] 1 1 1 1 1 1 1 1 1 1 ... ## .. [list output truncated] ## [list output truncated]

现在，处理您的数据可能更简单。你寻找“P”的代码有点有缺陷，因为你正在给另一个data.frame内的单元格分配一个data.frame，这个dataframe通常是被忽视的。

这可能会给你带来另一个问题。为此，我强烈build议您提供一个更好的详细问题，包括示例工作表的样子，以及您期望输出的样子。

如何编写R来循环设置目录中每个文件的每个工作表

如何基于相同的sku来协调不同的单元格

是否有可能从剪贴板粘贴Excel / CSV数据到C＃中的DataGridView？

使用python将本地html文件表单列数据提取到.csv文件

寻找一个好的C＃文本parsing库

合并一个Excel工作表中的多个CSV文件

PSQL CSV列别名导致损坏的文件

在javascript生成csv excel导入的换行符

以编程方式从Excel电子表格中提取数据

excel vba将access file.mdb转换为file.csv

在C＃/ ASP.NET中使用不同的编码将文件保存为CSV