自动获取excel表格的列types

我有一个excel文件，几张纸，每个都有几列，所以我不想单独指定列的types，但自动。我想读它们为stringsAsFactors= FALSE会做，因为它正确地解释列的types。在我当前的方法中，列宽度“0.492±0.6”被解释为数字，返回NA，因为“ stringsAsFactors选项在read_excel不可用。所以在这里，我写了一个解决方法，或多或less地工作，但我不能在现实生活中使用，因为我不被允许创build一个新的文件。注意：我需要其他列作为数字或整数，还有其他人只有文字作为字符，因为stringsAsFactors在我的read.csv例子。

 library(readxl) file= "myfile.xlsx" firstread<-read_excel(file, sheet = "mysheet", col_names = TRUE, na = "", skip = 0) #firstread has the problem of the a column with "0.492 ± 0.6", #being interpreted as number (returns NA) colna<-colnames(firstread) # read every column as character colnumt<-ncol(firstread) textcol<-rep("text", colnumt) secondreadchar<-read_excel(file, sheet = "mysheet", col_names = TRUE, col_types = textcol, na = "", skip = 0) # another column, with the number 0.532, is now 0.5319999999999999 # and several other similar cases. # read again with stringsAsFactors # critical step, in real life, I "cannot" write a csv file. write.csv(secondreadchar, "allcharac.txt", row.names = FALSE) stringsasfactor<-read.csv("allcharac.txt", stringsAsFactors = FALSE) colnames(stringsasfactor)<-colna # column with "0.492 ± 0.6" now is character, as desired, others numeric as desired as well

这是一个脚本，导入您的Excel文件中的所有数据。它将每个表单的数据放在一个名为dfs的list ：

 library(readxl) # Get all the sheets all_sheets <- excel_sheets("myfile.xlsx") # Loop through the sheet names and get the data in each sheet dfs <- lapply(all_sheets, function(x) { #Get the number of column in current sheet col_num <- NCOL(read_excel(path = "myfile.xlsx", sheet = x)) # Get the dataframe with columns as text df <- read_excel(path = "myfile.xlsx", sheet = x, col_types = rep('text',col_num)) # Convert to data.frame df <- as.data.frame(df, stringsAsFactors = FALSE) # Get numeric fields by trying to convert them into # numeric values. If it returns NA then not a numeric field. # Otherwise numeric. cond <- apply(df, 2, function(x) { x <- x[!is.na(x)] all(suppressWarnings(!is.na(as.numeric(x)))) }) numeric_cols <- names(df)[cond] df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric) # Return df in desired format df }) # Just for convenience in order to remember # which sheet is associated with which dataframe names(dfs) <- all_sheets

过程如下：

首先，用excel_sheets获取文件中的所有表单，然后遍历表单名称以创build数据excel_sheets 。对于这些数据col_types每一个，通过将col_types参数设置为text ，最初将数据作为text导入。将数据框的列作为文本获取后，可以将结构从一个data.frame转换为一个data.frame 。之后，您可以find实际为数字列的列并将其转换为数字值。

编辑：

截至4月底，新版本的readxl得到了发布， read_excel函数得到了两个与这个问题相关的增强。首先，你可以通过提供给col_types参数的参数“guess”来让函数猜测列types。第二个增强（第一个的推论）是guess_max参数被添加到read_excel函数中。这个新参数允许你设置猜测列types的行数。本质上，我上面写的可以缩写为：

 library(readxl) # Get all the sheets all_sheets <- excel_sheets("myfile.xlsx") dfs <- lapply(all_sheets, function(sheetname) { suppressWarnings(read_excel(path = "myfile.xlsx", sheet = sheetname, col_types = 'guess', guess_max = Inf)) }) # Just for convenience in order to remember # which sheet is associated with which dataframe names(dfs) <- all_sheets

我build议您将readxl更新为最新版本以缩短脚本，从而避免可能的烦恼。

我希望这有帮助。

自动获取excel表格的列types

编辑：

Excelmacros会覆盖variables的值

在dataframe名称变化的循环中操作dataframe

一个模块不会看到公共variables

在powerpoint和excel VBA之间交换variables

Excel连接多个单元格，其中一个是相同的其他更改

VBA查找function不能与variables一起使用

在多个工作表中parsing数组

设置一个variables来改变单元格的格式

VBAselect两个variables的范围

有一个常量为Datevariablestypes的默认值，一个la vbNullString为stringtypes？