R:是否有相当于Stata的codebookout命令?

在Stata中,我可以使用codebookout命令创build一个Excel工作簿,该工作簿可以保存现有数据集中所有variables的名称,标签和存储types及其相应的值和值标签。

我想在R中find一个等价的函数。到目前为止,我已经遇到了memisc库,它有一个叫做codebook的函数,但是和Stata不一样。

例如,在Stata中,码本的输出看起来像这样…(见下面 – 这是我想要的)

 Variable Name Variable Label Answer Label Answer Code Variable Type hhid hhid Open ended String inter_month inter_month Open ended long year year Open ended long org_unit org_unit long Balaka 1 Blantyre 2 Chikwawa 3 Chiradzulu 4 

即数据框中的每列被评估为产生5个不同列的值:

  • variables名称是列的名称
  • variables标签这是列的名称
  • 答案标签是列中的唯一值。 如果没有唯一的值,则认为是开放式的
  • 应答码,是答案标签中每个类别的数字分配。 如果答案标签不是分类,则为空。
  • variablestypes:int,str,long(date)…

这是我的尝试:

 CreateCodebook <- function(dF){ numbercols <- length(colnames(dF)) table <- data.frame() for (i in 1:length(colnames(dF))){ AnswerCode <- if (sapply(dF, is.factor)[i]) 1:nrow(unique(dF[i])) else "" AnswerLabel <- if (sapply(dF, is.factor)[i]) unique(dF[order(dF[i]),][i]) else "Open ended" VariableName <- if (length(AnswerCode) - 1 > 1) c(colnames(dF)[i], rep("",length(AnswerCode) - 1)) else colnames(dF)[i] VariableLabel <- if (length(AnswerCode) - 1 > 1) c(colnames(dF)[i], rep("",length(AnswerCode) - 1)) else colnames(dF)[i] VariableType <- if (length(AnswerCode) - 1 > 1) c(sapply(dF, class)[i], rep("",length(AnswerCode) - 1)) else sapply(dF, class)[i] df = data.frame(VariableName, VariableLabel, AnswerLabel, AnswerCode, VariableType) names(df) <- c("Variable Name", "Variable Label", "Variable Type", "Answer Code", "Answer Label") table <- rbind(table, df) } return(table) } 

不幸的是,我收到以下警告消息:

 Warning messages: 1: In `[<-.factor`(`*tmp*`, ri, value = 1:3) : invalid factor level, NA generated 2: In `[<-.factor`(`*tmp*`, ri, value = 1:2) : invalid factor level, NA generated 

我产生的输出结果在答案代码标签变得混乱:

  Variable Name Variable Label Variable Type Answer Code Answer Label hhid hhid hhid Open ended character month month month Open ended integer year year year Open ended integer org_unit org_unit org_unit Open ended character v000 v000 v000 Open ended character v001 v001 v001 Open ended integer v002 v002 v002 Open ended integer v003 v003 v003 Open ended integer v005 v005 v005 Open ended integer v006 v006 v006 Open ended integer v007 v007 v007 Open ended integer v021 v021 v021 Open ended numeric 2285 v024 v024 central <NA> factor 1 north <NA> 7119 south <NA> 11 v025 v025 rural <NA> factor 1048 v025 v025 urban <NA> factor district_name district_name district_name Open ended character coords_x1 coords_x1 coords_x1 Open ended numeric coords_x2 coords_x2 coords_x2 Open ended numeric itn_color itn_color itn_color Open ended numeric piped piped piped Open ended numeric sanit sanit sanit Open ended numeric sanit_cd sanit_cd sanit_cd Open ended numeric water water water Open ended numeric 

为了自己的娱乐,我决定采取一些措施。 我使用了内置的Titanic数据集。 但是,我对你的一个定义有一个问题:你说:“如果没有独特的价值,那么它就被认为是开放的”。 但是,长度> 0的每个variables都有一些独特的价值:你的意思是“如果每个值都是唯一的”? 即使这个定义不一定如预期的那样工作:在Titanic数据集中,响应是整数,在32个总值中只有22个唯一值。 我没有想到会真的想要枚举,所以我testing了types的factor而不是(如果你真的想要的话,你可以用下面的length(u)==length(x)行代替)。

 ## utility function: pad vector with blanks to specified length pad <- function(x,n,p="") { return(c(x,rep(p,n-length(x)))) } ## process a single column proc_col <- function(x,nm) { u <- unique(x) ## if (length(u)==length(x)) { if (!is.factor(x)) { n <- 1 u <- "open ended" cc <- "" } else { cc <- as.numeric(u) n <- length(u) } dd <- data.frame(`Variable Name`=pad(nm,n), `Variable Label`=pad(nm,n), `Answer Label`=u, `Answer Code`=cc, `Variable Type`=pad(class(x),n), stringsAsFactors=FALSE) return(dd) } ## process all columns proc_df <- function(x) { L <- Map(proc_col,x,names(x)) dd <- do.call(rbind,L) rownames(dd) <- NULL return(dd) } 

例:

 xx <- as.data.frame.table(Titanic) proc_df(xx) ## Variable.Name Variable.Label Answer.Label Answer.Code Variable.Type ## 1 Class Class 1st 1 factor ## 2 2nd 2 ## 3 3rd 3 ## 4 Crew 4 ## 5 Sex Sex Male 1 factor ## 6 Female 2 ## 7 Age Age Child 1 factor ## 8 Adult 2 ## 9 Survived Survived No 1 factor ## 10 Yes 2 ## 11 Freq Freq open ended numeric 

我没有留下代码值列表之前的空格,但你可以自己做这些调整…

这是我的一个解决scheme的破解:

 CreateCodebook <- function(dF){ numbercols <- length(colnames(dF)) table <- data.frame() for (i in 1:length(colnames(dF))){ AnswerCode <- if (sapply(dF, is.factor)[i]) 1:nrow(unique(dF[i])) else "" AnswerLabel <- if (sapply(dF, is.factor)[i]) unique(dF[order(dF[i]),][i]) else "Open ended" VariableName <- if (length(AnswerCode) > 1) c(colnames(dF)[i], rep("",length(AnswerCode) - 1)) else colnames(dF)[i] VariableLabel <- if (length(AnswerCode) > 1) c(colnames(dF)[i], rep("",length(AnswerCode) - 1)) else colnames(dF)[i] VariableType <- if (length(AnswerCode) > 1) c(sapply(dF, class)[i], rep("",length(AnswerCode) - 1)) else sapply(dF, class)[i] df = data.frame(VariableName, VariableLabel, AnswerLabel, AnswerCode, VariableType, stringsAsFactors = FALSE) names(df) <- c("Variable Name", "Variable Label", "Variable Type", "Answer Code", "Answer Label") table <- rbind(table, df) } rownames(table) <- 1:nrow(table) return(table) } 

输出:

  Variable Name Variable Label Variable Type Answer Code Answer Label 1 brid brid Open ended character 2 month month Open ended integer 3 year year Open ended integer 4 org_unit org_unit Open ended character 5 v000 v000 Open ended character 6 v001 v001 Open ended integer 7 v002 v002 Open ended integer 8 v003 v003 Open ended integer 9 v005 v005 Open ended integer 10 v006 v006 Open ended integer 11 v007 v007 Open ended integer 12 v021 v021 Open ended numeric 13 v024 v024 central 1 factor 14 north 2 15 south 3 16 v025 v025 rural 1 factor 17 urban 2 18 bidx bidx Open ended integer 19 district_name district_name Open ended character 20 coords_x1 coords_x1 Open ended numeric 21 coords_x2 coords_x2 Open ended numeric 22 anc4 anc4 Open ended numeric 23 antimal_48 antimal_48 Open ended numeric 24 carep carep Open ended numeric 25 csec csec Open ended numeric 26 dptv dptv Open ended numeric 27 ebreast ebreast Open ended numeric 28 fans_48 fans_48 Open ended numeric 29 ideliv ideliv Open ended numeric 30 iptp iptp Open ended numeric 31 iron90 iron90 Open ended numeric 32 measlesv measlesv Open ended numeric 33 ors ors Open ended numeric 34 ort ort Open ended numeric 35 pncwm pncwm Open ended numeric 36 sstools sstools Open ended numeric 37 tt tt Open ended numeric 38 vita vita Open ended numeric