在r数据框中设置Column Class时,将####清除为NA错误

我正在使用最初在Excel中格式化的csv文件。 我想将费率列转换为数字,并删除“$”符号。

我在文件中读入: > NImp <- read.csv("National_TV_Spots 6_30_14 to 8_31_14.csv", sep=",", header=TRUE, stringsAsFactors=FALSE, strip.white=TRUE, na.strings=c("Not Monitored"))

数据框如下所示:

 HH.IMP..000. ISCI Creative Program Rate 1 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00 2 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00 3 141 IT14429 Rising Costs30 (Opportunity Scholar - No Nursing) BONUS $0.00 4 476 ITES15443H Matthew Traina (B. EECT/A. CEET) :60 (no loc) Law & Order: SVU $0.00 5 NA IT3896 Rising Costs30 (Opportunity Scholar - No Nursing) NUVO CINEMA $0.00 

当我做转换的时候,我得到一个错误信息: > NImp$Rate <- as.numeric(gsub("$","", NImp$Rate)) Warning message: NAs introduced by coercion ,所有值被强制转换来港。

我也试过, NImp$Rate <- as.numeric(sub("\\$","", NImp$Rate))但是又收到了相同的警告信息。 然而,并不是所有的价值观都成为了NAs – 只有特定的价值 我在excel中打开了csv来检查,我意识到excel强制csv列的宽度太窄,导致“####”单元格。 这些细胞被r强迫为“不适用”。

我尝试了在记事本中打开文件并将记事本文件读入r 。 但是我得到了同样的结果。 值都正确显示在记事本中,当我将文件读入r 。 但是当我更改为数字时,在Excel中显示为“####”的所有内容都将变为NA

我该怎么办?

添加str(NImp)

 'data.frame': 9859 obs. of 19 variables: $ Spot.ID : int 13072903 13072904 13072898 13072793 13072905 13072899 13072397 13072476 13072398 13072681 ... $ Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ... $ Hour : int 0 0 0 0 0 0 1 1 1 2 ... $ Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ... $ Local.Date : chr "6/30/2014" "6/30/2014" "6/30/2014" "6/30/2014" ... $ Broadcast.Week : int 1 1 1 1 1 1 1 1 1 1 ... $ Local.Hour : int 0 0 0 0 0 0 1 1 1 2 ... $ Local.Time : chr "12:08 AM" "12:20 AM" "12:29 AM" "12:30 AM" ... $ Market : chr "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" "NATIONAL CABLE" ... $ Vendor : chr "NUVO" "NUVO" "AFAM" "USA" ... $ Station : chr "NUVO" "NUVO" "AFAM" "USA" ... $ M18.34.IMP..000.: int NA NA 3 88 NA 3 NA 53 NA 37 ... $ W18.34.IMP..000.: int NA NA 86 66 NA 86 NA 70 NA 60 ... $ A18.34.IMP..000.: int NA NA 89 154 NA 89 NA 123 NA 97 ... $ HH.IMP..000. : int NA NA 141 476 NA 141 NA 461 NA 434 ... $ ISCI : chr "IT3896" "IT3896" "IT14429" "ITES15443H" ... $ Creative : chr "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Rising Costs30 (Opportunity Scholar - No Nursing)" "Matthew Traina (B. EECT/A. CEET) :60 (no loc)" ... $ Program : chr "NUVO CINEMA" "NUVO CINEMA" "BONUS" "Law & Order: SVU" ... $ Rate : chr "$0.00" "$0.00" "$0.00" "$0.00" ... 

在Excel中将列设置为“货币”时,数以千计或更大的值中包含逗号和美元符号前缀。 例如,一个值可能看起来像$1,200.00 。 你遇到的问题是因为你删除美元符号而不是逗号,所以当你试图转换为numeric你会得到NA

 as.numeric(c("0", "0", "1,200")) [1] 0 0 NA Warning message: NAs introduced by coercion 

您可以使用gsub一步删除美元符号和逗号。 我find了一个如何在这个答案的评论中做到这一点的例子。

 as.numeric(gsub("[$,]", "", c("$0", "$0", "$1,200"))) [1] 0 0 1200 

所以应该适用于你的数据集的代码是

 as.numeric(gsub("[$,]", "", NImp$Rate))