如何重新组织由8个重复行,24列的R数据?

我对R很陌生,所以请耐心等待(同时我会尽可能描述性地尊重你的时间)。

我已经得到了一段时间以来一直试图正确格式化的数据,这些数据描述了在一年多的时间内每小时进行8次测量。 由于我不得不检索数据,电子表格现在以表格格式列出了数据,其中8个variables名称作为重复行列出,并且每天的每个小时都作为一个单独的列,如下所示:

var1[0] var1[1] var1[2] var1[3] var1[4] var1[5] var1[6] var1[7] var1[8] var1[9] var1[10] var1[11] var1[12] var1[13] var1[14] var1[15] var1[16] var1[17] var1[18] var1[19] var1[20] var1[21] var1[22] var1[23] var3[0] var2[1] var2[2] var2[3] var2[4] var2[5] var2[6] var2[7] var2[8] var2[9] var2[10] var2[11] var2[12] var2[13] var2[14] var2[15] var2[16] var2[17] var2[18] var2[19] var2[20] var2[21] var2[22] var2[23] var3[0] var3[1] var3[2] var3[3] var3[4] var3[5] var3[6] var3[7] var3[8] var3[9] var3[10] var3[11] var3[12] var3[13] var3[14] var3[15] var3[16] var3[17] var3[18] var3[19] var3[20] var3[21] var3[22] var3[23] var4[0] var4[1] var4[2] var4[3] var4[4] var4[5] var4[6] var4[7] var4[8] var4[9] var4[10] var4[11] var4[12] var4[13] var4[14] var4[15] var4[16] var4[17] var4[18] var4[19] var4[20] var4[21] var4[22] var4[23] var5[0] var5[1] var5[2] var5[3] var5[4] var5[5] var5[6] var5[7] var5[8] var5[9] var5[10] var5[11] var5[12] var5[13] var5[14] var5[15] var5[16] var5[17] var5[18] var5[19] var5[20] var5[21] var5[22] var5[23] var6[0] var6[1] var6[2] var6[3] var6[4] var6[5] var6[6] var6[7] var6[8] var6[9] var6[10] var6[11] var6[12] var6[13] var6[14] var6[15] var6[16] var6[17] var6[18] var6[19] var6[20] var6[21] var6[22] var6[23] var7[0] var7[1] var7[2] var7[3] var7[4] var7[5] var7[6] var7[7] var7[8] var7[9] var7[10] var7[11] var7[12] var7[13] var7[14] var7[15] var7[16] var7[17] var7[18] var7[19] var7[20] var7[21] var7[22] var7[23] var8[0] var8[1] var8[2] var8[3] var8[4] var8[5] var8[6] var8[7] var8[8] var8[9] var8[10] var8[11] var8[12] var8[13] var8[14] var8[15] var8[16] var8[17] var8[18] var8[19] var8[20] var8[21] var8[22] var8[23] var1[24] var1[25] var1[26] var1[27] var1[28] var1[29] var1[30] var1[31] var1[32] var1[33] var1[34] var1[35] var1[36] var1[37] var1[38] var1[39] var1[40] var1[41] var1[42] var1[43] var1[44] var1[45] var1[46] var1[47] var2[24] var2[25] var2[26] var2[27] var2[28] var2[29] var2[30] var2[31] var2[32] var2[33] var2[34] var2[35] var2[36] var2[37] var2[38] var2[39] var2[40] var2[41] var2[42] var2[43] var2[44] var2[45] var2[46] var2[47] var3[24] var3[25] var3[26] var3[27] var3[28] var3[29] var3[30] var3[31] var3[32] var3[33] var3[34] var3[35] var3[36] var3[37] var3[38] var3[39] var3[40] var3[41] var3[42] var3[43] var3[44] var3[45] var3[46] var3[47] var4[24] var4[25] var4[26] var4[27] var4[28] var4[29] var4[30] var4[31] var4[32] var4[33] var4[34] var4[35] var4[36] var4[37] var4[38] var4[39] var4[40] var4[41] var4[42] var4[43] var4[44] var4[45] var4[46] var4[47] var5[24] var5[25] var5[26] var5[27] var5[28] var5[29] var5[30] var5[31] var5[32] var5[33] var5[34] var5[35] var5[36] var5[37] var5[38] var5[39] var5[40] var5[41] var5[42] var5[43] var5[44] var5[45] var5[46] var5[47] var6[24] var6[25] var6[26] var6[27] var6[28] var6[29] var6[30] var6[31] var6[32] var6[33] var6[34] var6[35] var6[36] var6[37] var6[38] var6[39] var6[40] var6[41] var6[42] var6[43] var6[44] var6[45] var6[46] var6[47] var7[24] var7[25] var7[26] var7[27] var7[28] var7[29] var7[30] var7[31] var7[32] var7[33] var7[34] var7[35] var7[36] var7[37] var7[38] var7[39] var7[40] var7[41] var7[42] var7[43] var7[44] var7[45] var7[46] var7[47] var8[24] var8[25] var8[26] var8[27] var8[28] var8[29] var8[30] var8[31] var8[32] var8[33] var8[34] var8[35] var8[36] var8[37] var8[38] var8[39] var8[40] var8[41] var8[42] var8[43] var8[44] var8[45] var8[46] var8[47] 

最初的数据还不止这些,但是为了解决我所遇到的问题,我已经把它解决了。 (在上面的例子中,我试图暗示的是,在每个小时(t1,t2,t3等)中loggingvariables(var1,var2,var3等)。

我的目标是重新格式化,使其类似于这样的东西:

 var1[0] var2[0] var3[0] var4[0] var5[0] var6[0] var7[0] var8[0] var1[1] var2[1] var3[1] var4[1] var5[1] var6[1] var7[1] var8[1] var1[2] var2[2] var3[2] var4[2] var5[2] var6[2] var7[2] var7[2] var1[3] var2[3] var3[3] var4[3] var5[3] var6[3] var7[3] var7[3] . . . . . . . . . . . . . . . . . . . . . . . . [all the way to 9216, which is the number of hours in 384 days] 

到目前为止,我已经尝试在Excel中使用它,并找不到一种方法来做到这一点。 我也研究过编写一个C ++脚本,但我觉得可能有一个更简单的方法。 我最近的努力已经转向R,因为我一直在努力学习它,并且我听说它非常适合这种数据操纵。 有了R,我试图按照一个例子,我发现如果把数据重新创build为一个不同长度的matrix( 这里可以find),但是这会导致错误的数据。 (我相信我可能会误用这个方法)。 我也研究了这里讨论的解决scheme,但我无法修改代码来处理我的情况。 也许我忽略了一些简单的东西?

有没有人有什么build议? 正如我所说的,在这一点上,我试图在R做到这一点,但我打开Excel,C或Python的build议。 (我肯定会接受其他语言的build议,但这可能需要更彻底的解释:))

谢谢!

[编辑:]

上面的数据样本意图是描述性的。 下面是实际的前25行数据的样子; 我所做的唯一的改变是为了保密原因而将variables名称replace:

 Metric,Year,Month,Day,DOW,12am,1am,2am,3am,4am,5am,6am,7am,8am,9am,10am,11am,12pm,1pm,2pm,3pm,4pm,5pm,6pm,7pm,8pm,9pm,10pm,11pm varA,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,22,10,18,24,26,11,21,24,10,0,0,0,0,0 varB,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,10,13,18,28,26,25,25,21,23,13,0,0,0,0,0 varC,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,0,1,7,9,5,1,4,4,1,7,1,0,0,0,0 varD,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,23,17,27,29,27,15,25,25,17,1,0,0,0,0 varE,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,44,32,33,65,37,42,62,75,71,50,0,0,0,0,0 varF,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,89,82,83,94,37,77,100,100,90,60,0,0,0,0,0 varG,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,100,100,100,100,95,100,100,100,100,100,0,0,0,0,0 varH,2013,1,20,Sun,0,0,0,0,0,0,0,0,0,9,10,92,12,101,34,14,64,29,86,0,0,0,0,0 varA,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,5,12,23,20,22,24,9,19,15,12,13,9,0,0,0 varB,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,6,14,21,27,26,23,19,22,16,16,16,12,0,0,0 varC,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,2,5,4,10,6,10,2,7,7,4,5,5,0,0,0 varD,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,7,18,27,30,28,34,12,26,22,16,18,14,0,0,0 varE,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,0,50,20,15,67,33,71,47,36,64,58,67,0,0,0 varF,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,60,70,45,70,90,67,100,100,79,91,92,89,0,0,0 varG,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,100,100,100,100,100,94,100,100,100,91,100,100,0,0,0 varH,2013,1,21,Mon,0,0,0,0,0,0,0,0,0,20,12,31,20,29,16,12,12,16,16,34,41,0,0,0 varA,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,9,14,18,25,16,20,22,11,23,13,9,4,0,0,0 varB,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,20,23,17,28,14,18,30,17,27,17,17,6,0,0,0 varC,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,4,8,2,3,2,6,7,2,4,1,2,1,0,0,0 varD,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,13,22,20,29,18,26,29,13,27,14,11,5,0,0,0 varE,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,83,90,43,30,29,17,32,60,71,54,89,100,0,0,0 varF,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,100,100,86,65,43,56,74,90,90,73,100,100,0,0,0 varG,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,100,100,100,100,100,100,100,100,100,100,100,100,0,0,0 varH,2013,1,22,Tue,0,0,0,0,0,0,0,0,0,14,23,17,30,16,14,12,8,9,13,14,6,0,0,0 

正如你所看到的,在完整的数据集中,在起始处有五个与variables名相对应的附加列,以及date信息。

假设你的数据是在matrixM ,这应该工作:

 output <- NULL last.count <- 9216/8 - 1 for (i in 0:last.count) { output <- rbind(output, t(M[8*i + 1:8,])) } 

ps: rbind可以很慢(取决于数据大小),在这种情况下,您可以预先分配outputmatrix

你可以使用Hadley的 reshape2包来使这个变得简单! 首先让我们做一些数据,因为你没有给我们任何东西。 为将来使用这篇文章作为指导。

 foo<- matrix(rnorm(8*9216),nrow=8) #matrix of 8 rows #(8 variables and 9216 - 384 x 24 columns rownames(foo)<-paste0("V",1:nrow(foo)) #giving rownames, #you can use "var" here if you want foo<-data.frame(foo) #making it a data.frame names(foo)[1:9216]<-paste0("t",0:(ncol(foo)-1)) #time points, #starting at 0, t0,t1,...t9215 foo <-data.frame(id=rownames(foo),foo) #making sure id column is first #load the reshape2 library library(reshape2) foo.wide <- recast(foo,id ~ variable) #we use the variable id as the id column, #play with melt and cast to understand what's going on here #do ?melt, ? cast and look at the examples #foo.wide is a list with data and labels. #code below to transform the list in foo.wide to a data.frame foo.wide.df <-foo.wide$data names(foo.wide.df)<-unlist(foo.wide$labels[[2]]) row.names(foo.wide.df)<-unlist(foo.wide$labels[[1]]) 

希望这可以帮助

更新:刚刚看到你发布的示例数据。 您可以使用附加的5列id列使用下面的代码recast

 foo.wide.df <-recast(foo, id ~ variable, id.var=1:5)