以正确的编码将希腊字符的Excel文件导入到R中

我有一些麻烦导入以下文件: http : //www.kuleuven.be/bio/ento/temp/test.xlsx到R在正确的编码。 尤其是,

library("xlsx") read.xlsx("test.xlsx",1,header=F,colClasses=c("character"),encoding="UTF-8") 

给我

  X1 1 a-cadinol 2 a-calacorene 3 a-caryophyllene alcohol 4 a-curcumene 5 a-elemol 6 a-muurolene 7 a-terpineol acetate 8 ß-4-dimethyl-3-cyclohexane-1-ethanol acetate 9 ß-bisabolene 10 ß-bisabolol 11 ß-bourbonene 12 ß-caryophyllene alcohol 13 ß-cyclocitral 14 ß-farnesol 15 ß-selinene 16 ß-sesquiphellandrene 17 <U+03B3>-cadinene 18 <U+03B3>-Carboethoxy-<U+03B3>-butyrolactone 19 <U+03B3>-ethyl-<U+03B3>-butyrolactone 20 <U+03B3>-eudesmol 21 <U+03B3>-muurolene 22 <U+03B3>-nonalactone 23 <U+03B3>-octalactone 24 <U+03B3>-selinene 25 <U+03B3>-undecalactone 26 d-cadinene 27 d-cadinol 28 d-muurolene 29 d-undecalactone 

但是a-<U+03B3>-d-应该是alpha-gamma-delta-

任何想法如何我可以导入我的文件在正确的编码?

我在Windows上工作, iconvlist()给我

  [1] "437" "850" "852" "855" "857" [6] "860" "861" "862" "863" "865" [11] "866" "869" "ANSI_X3.4-1968" "ANSI_X3.4-1986" "ASCII" [16] "ASMO-708" "BIG-5" "BIG-FIVE" "big5" "BIG5" [21] "big5-hkscs" "BIG5-HKSCS" "big5hkscs" "BIG5HKSCS" "CP-GR" [26] "CP-IS" "cp1025" "CP1125" "CP1133" "CP1200" [31] "CP12000" "CP12001" "CP1201" "CP1250" "CP1251" [36] "CP1252" "CP1253" "CP1254" "CP1255" "CP1256" [41] "CP1257" "CP1258" "CP1361" "CP154" "CP367" [46] "CP437" "CP50221" "CP51932" "CP65001" "CP737" [51] "CP775" "CP819" "CP850" "CP852" "CP853" [56] "CP855" "CP857" "CP858" "CP860" "CP861" [61] "CP862" "CP863" "CP864" "CP865" "cp866" [66] "CP866" "CP869" "CP874" "cp875" "CP932" [71] "CP936" "CP949" "CP950" "CSASCII" "CSIBM855" [76] "CSIBM857" "CSIBM860" "CSIBM861" "CSIBM863" "CSIBM864" [81] "CSIBM865" "CSIBM866" "CSIBM869" "csISO2022JP" "CSISOLATIN1" [86] "CSPC775BALTIC" "CSPC850MULTILINGUAL" "CSPC862LATINHEBREW" "CSPC8CODEPAGE437" "CSPCP852" [91] "CSPTCP154" "CSWINDOWS31J" "CYRILLIC-ASIAN" "DOS-720" "DOS-862" [96] "EUC-CN" "euc-jp" "euc-kr" "EUC-KR" "EUCCN" [101] "eucjp" "euckr" "GB18030" "gb2312" "GBK" [106] "hz-gb-2312" "IBM-CP1133" "IBM-Thai" "IBM00858" "IBM00924" [111] "IBM01047" "IBM01140" "IBM01141" "IBM01142" "IBM01143" [116] "IBM01144" "IBM01145" "IBM01146" "IBM01147" "IBM01148" [121] "IBM01149" "IBM037" "IBM1026" "IBM273" "IBM277" [126] "IBM278" "IBM280" "IBM284" "IBM285" "IBM290" [131] "IBM297" "IBM367" "IBM420" "IBM423" "IBM424" [136] "IBM437" "IBM437" "IBM500" "ibm737" "ibm775" [141] "IBM775" "IBM819" "ibm850" "IBM850" "ibm852" [146] "IBM852" "IBM855" "IBM855" "ibm857" "IBM857" [151] "IBM860" "IBM860" "ibm861" "IBM861" "IBM862" [156] "IBM863" "IBM863" "IBM864" "IBM864" "IBM865" [161] "IBM865" "IBM866" "ibm869" "IBM869" "IBM870" [166] "IBM871" "IBM880" "IBM905" "iso-2022-jp" "iso-2022-jp" [171] "ISO-2022-JP" "ISO-2022-JP-MS" "iso-2022-kr" "ISO-8859-1" "iso-8859-13" [176] "iso-8859-15" "iso-8859-2" "iso-8859-3" "iso-8859-4" "iso-8859-5" [181] "iso-8859-6" "iso-8859-7" "iso-8859-8" "iso-8859-8-i" "iso-8859-9" [186] "ISO-IR-100" "ISO-IR-6" "ISO_646.IRV:1991" "ISO_8859-1" "ISO_8859-1:1987" [191] "ISO2022-JP" "ISO2022-JP-MS" "iso2022-kr" "ISO646-US" "iso8859-1" [196] "ISO8859-1" "iso8859-13" "iso8859-15" "iso8859-2" "iso8859-3" [201] "iso8859-4" "iso8859-5" "iso8859-6" "iso8859-7" "iso8859-8" [206] "iso8859-8-i" "iso8859-9" "Johab" "JOHAB" "koi8-r" [211] "koi8-u" "ks_c_5601-1987" "L1" "latin-9" "LATIN1" [216] "latin2" "latin3" "latin4" "latin5" "latin7" [221] "latin9" "mac" "mac-centraleurope" "mac-is" "macarabic" [226] "maccentraleurope" "maccroatian" "maccyrillic" "macgreek" "machebrew" [231] "maciceland" "macintosh" "macis" "macroman" "macromania" [236] "macthai" "macturkish" "macukraine" "macukrainian" "MS-ANSI" [241] "MS-ARAB" "MS-CYRL" "MS-EE" "MS-GREEK" "MS-HEBR" [246] "MS-TURK" "MS50221" "MS51932" "MS932" "MS936" [251] "PT154" "PTCP154" "SHIFFT_JIS" "SHIFFT_JIS-MS" "shift-jis" [256] "shift_jis" "SJIS" "SJIS-MS" "SJIS-OPEN" "SJIS-WIN" [261] "UCS-2" "UCS-2BE" "UCS-2LE" "UCS-4" "UCS-4BE" [266] "UCS-4BE" "UCS-4LE" "UCS-4LE" "UCS2" "UCS2BE" [271] "UCS2LE" "UCS4" "UCS4BE" "UCS4LE" "UHC" [276] "unicodeFFFE" "US" "US-ASCII" "UTF-16" "UTF-16BE" [281] "UTF-16LE" "UTF-32" "UTF-32BE" "UTF-32LE" "UTF-8" [286] "UTF16" "UTF16BE" "UTF16LE" "UTF32" "UTF32BE" [291] "UTF32LE" "UTF8" "WINBALTRIM" "windows-1250" "windows-1251" [296] "windows-1252" "windows-1253" "windows-1254" "windows-1255" "windows-1256" [301] "windows-1257" "windows-1258" "WINDOWS-31J" "WINDOWS-50221" "WINDOWS-51932" [306] "windows-874" "WINDOWS-932" "WINDOWS-936" "x-Chinese_CNS" "x-cp20001" [311] "x-cp20003" "x-cp20004" "x-cp20005" "x-cp20261" "x-cp20269" [316] "x-cp20936" "x-cp20949" "x-cp50227" "x-EBCDIC-KoreanExtended" "x-Europa" [321] "x-IA5" "x-IA5-German" "x-IA5-Norwegian" "x-IA5-Swedish" "x-iscii-as" [326] "x-iscii-be" "x-iscii-de" "x-iscii-gu" "x-iscii-ka" "x-iscii-ma" [331] "x-iscii-or" "x-iscii-pa" "x-iscii-ta" "x-iscii-te" "x-mac-arabic" [336] "x-mac-ce" "x-mac-chinesesimp" "x-mac-chinesetrad" "x-mac-croatian" "x-mac-cyrillic" [341] "x-mac-greek" "x-mac-hebrew" "x-mac-icelandic" "x-mac-japanese" "x-mac-korean" [346] "x-mac-romanian" "x-mac-thai" "x-mac-turkish" "x-mac-ukrainian" "x_Chinese-Eten" 

我尝试了很多这些,无济于事…不幸的是,我也不知道什么编码Excel保存我的文件在…

另外,在R中有没有简单的函数可以让我把所有的希腊字母,贝塔,伽玛和三angular洲(原始编码)转换为“alpha”,“beta”,“gamma”和“delta”充分)? 或者做相反的事,即将“alpha”,“beta”,“gamma”等全写成单个希腊字符?

编辑:关于我最后一个问题,我试过了

 togreek=function(compname) { n=as.character(compname,encoding="UTF-8") n=gsub("alpha","\u03B1",n) n=gsub("beta","\u03B2",n) n=gsub("gamma","\u03B3",n) n=gsub("delta","\u03B4",n) n=gsub("epsilon","\u03B5",n) n } tolatin=function(compname) { n=as.character(compname,encoding="UTF-8") n=gsub("\u03B1","alpha",n) n=gsub("\u03B2","beta",n) n=gsub("\u03B3","gamma",n) n=gsub("\u03B4","delta",n) n=gsub("\u03B5","epsilon",n) n } 

function明胶似乎工作:

 library("xlsx") test=read.xlsx("test.xlsx",1,header=F,colClasses=c("character"),encoding="UTF-8") tolatin(test$X1) [1] "alpha-cadinol" "alpha-calacorene" "alpha-caryophyllene alcohol" [4] "alpha-curcumene" "alpha-elemol" "alpha-muurolene" [7] "alpha-terpineol acetate" "beta-4-dimethyl-3-cyclohexane-1-ethanol acetate" "beta-bisabolene" [10] "beta-bisabolol" "beta-bourbonene" "beta-caryophyllene alcohol" [13] "beta-cyclocitral" "beta-farnesol" "beta-selinene" [16] "beta-sesquiphellandrene" "gamma-cadinene" "gamma-Carboethoxy-gamma-butyrolactone" [19] "gamma-ethyl-gamma-butyrolactone" "gamma-eudesmol" "gamma-muurolene" [22] "gamma-nonalactone" "gamma-octalactone" "gamma-selinene" [25] "gamma-undecalactone" "delta-cadinene" "delta-cadinol" [28] "delta-muurolene" "delta-undecalactone" 

但是,如果我再转换回希腊字符我再次遇到问题:

 togreek(tolatin(test$X1)) [1] "α-cadinol" "α-calacorene" "α-caryophyllene alcohol" [4] "α-curcumene" "α-elemol" "α-muurolene" [7] "α-terpineol acetate" "ß-4-dimethyl-3-cyclohexane-1-ethanol acetate" "ß-bisabolene" [10] "ß-bisabolol" "ß-bourbonene" "ß-caryophyllene alcohol" [13] "ß-cyclocitral" "ß-farnesol" "ß-selinene" [16] "ß-sesquiphellandrene" "<U+03B3>-cadinene" "<U+03B3>-Carboethoxy-<U+03B3>-butyrolactone" [19] "<U+03B3>-ethyl-<U+03B3>-butyrolactone" "<U+03B3>-eudesmol" "<U+03B3>-muurolene" [22] "<U+03B3>-nonalactone" "<U+03B3>-octalactone" "<U+03B3>-selinene" [25] "<U+03B3>-undecalactone" "d-cadinene" "d-cadinol" [28] "d-muurolene" "d-undecalactone" 

任何想法我做错了什么?

尝试这个:
Sys.setlocale(category = "LC_ALL", locale = "Greek")