根据索引号对Excel中的Twitter数据进行sorting

问题:

我正在研究我的论文,而且我必须说我对于Excel中更高级的东西来说是相当新颖的,而且我从来没有使用过R。 我做了以下事情:我使用R连接Twitter,并根据某个关键字search并保存了Tweets。 现在我想确保我的数据正确sorting,所以我可以对它进行分析。 然而,我似乎无法得到我的数据固定的权利,也没有与R(因为它不读取数据),也没有与Excel。 目前我的数据如下所示:

数据示例:

,"text","favorited","favoriteCount","replyToSN","created","truncated","replyToSID","id","replyToUID","statusSource","screenName","retweetCount","isRetweet","retweeted","longitude","latitude" 1,"RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.…",FALSE,0,NA,2015-03-17 23:58:23,FALSE,NA,"577982342775615488",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Cecile2511",25,TRUE,FALSE,NA,NA 2,"RT @Matthijs85: Ligt het trouwens aan mij of wordt verschil CDA/VVD nu heel groot uitgelicht, terwijl ze feitelijk 92% hetzelfde stemmen? #…",FALSE,0,NA,2015-03-17 23:58:04,FALSE,NA,"577982262282698752",NA,"<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","meneerharmsen",3,TRUE,FALSE,NA,NA 3,"@PuckPetrus bang makerij bemoei je niet met je buurman les 1 wil jij de les gelezen worden ? #vvd #pvda #d66 #cda",FALSE,0,"PuckPetrus",2015-03-17 23:57:39,FALSE,"577980323885105152","577982156426899458","1378104055","<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","pufpufpafpaf",0,FALSE,FALSE,NA,NA 4,"RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht…",FALSE,0,NA,2015-03-17 23:57:36,FALSE,NA,"577982146582806528",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",4,TRUE,FALSE,NA,NA 5,"RT @Jan_Slagter: In Hilversum werden de Buma awards uitgereikt, en Buma wint het #nosdebat #cda",FALSE,0,NA,2015-03-17 23:56:36,FALSE,NA,"577981895570546688",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Ztrmarco",38,TRUE,FALSE,NA,NA 6,"RT @StSteenbakkers: Peiling Maurice de Hond: tweestrijd VVD en CDA! Stem CDA!!! #Lagerelasten #CDA #100pBrabant",FALSE,0,NA,2015-03-17 23:56:31,FALSE,NA,"577981871168090113",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",5,TRUE,FALSE,NA,NA 

等等。 当我在Excel中将文本转换为列时,输出是这样的:

  text favorited created id statusSource screenName retweetCount isRetweet retweeted 1 RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.… FALSE 17-3-2015 23:58 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Cecile2511 25 TRUE FALSE 2 RT @Matthijs85: Ligt het trouwens aan mij #…" FALSE 0 FALSE NA meneerharmsen 3 TRUE FALSE NA #vvd #pvda #d66 #cda" FALSE 0 FALSE 1378104055 pufpufpafpaf 0 FALSE FALSE NA 4 RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht… FALSE 17-3-2015 23:57 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> gijsdupont 4 TRUE FALSE 

结论:

程序没有正确读取Tweets。 由于我有大量的推文,手动清理它不是一个选项。 我认为有可能根据已经存在于第一列中的索引号来对推文进行sorting。 有没有办法做到这一点(在Excel中)? 那么基本上只要find下一个数字就跳到下一行? 任何帮助非常感谢!

我能够使用导入您的数据

 x <- read.table("text.csv", header = TRUE, comment.char = "Ł", sep = ",") 

诀窍是指定一个非默认的注释字符,因为#与Twitter的hastag冲突。

 > str(x) 'data.frame': 6 obs. of 17 variables: $ X : int 1 2 3 4 5 6 $ text : Factor w/ 6 levels "@PuckPetrus bang makerij bemoei je niet met je buurman les 1 \nwil jij de les gelezen worde"| __truncated__,..: 2 5 1 3 4 6 $ favorited : logi FALSE FALSE FALSE FALSE FALSE FALSE $ favoriteCount: int 0 0 0 0 0 0 $ replyToSN : Factor w/ 1 level "PuckPetrus": NA NA 1 NA NA NA $ created : Factor w/ 6 levels "2015-03-17 23:56:31",..: 6 5 4 3 2 1 $ truncated : logi FALSE FALSE FALSE FALSE FALSE FALSE $ replyToSID : num NA NA 5.78e+17 NA NA ... $ id : num 5.78e+17 5.78e+17 5.78e+17 5.78e+17 5.78e+17 ... $ replyToUID : int NA NA 1378104055 NA NA NA $ statusSource : Factor w/ 2 levels "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",..: 2 1 1 2 2 2 $ screenName : Factor w/ 5 levels "Cecile2511","gijsdupont",..: 1 3 4 2 5 2 $ retweetCount : int 25 3 0 4 38 5 $ isRetweet : logi TRUE TRUE FALSE TRUE TRUE TRUE $ retweeted : logi FALSE FALSE FALSE FALSE FALSE FALSE $ longitude : logi NA NA NA NA NA NA $ latitude : Factor w/ 5 levels "NA ","NA ",..: 3 2 2 4 5 1 

我设法做到了! 谢谢大家的帮助。 将CSV数据的第一列复制到记事本++就行了。 从那里我能够导入它!

由于某种原因,R不断读“Ł”为“L”。 因此,它正在切断那里的数据。 使用comment.char =“”,因为代码解决了这个问题。 感谢大家!