展开每个列单元格的列单元格
我有3组不同的数据(不同的列)
- A栏动物(5种)
- B栏的水果(1000种)
- C栏中的国家(10种)
有了这3个数据集合,我想得到5×1000×10共计50k对应的元素在列。 EFG(与每个水果和每个国家对应的每个动物)。
可以通过手动复制和粘贴值来完成,但这需要很长时间。 有什么办法通过VBA代码或自动化它吗?
有没有像上面介绍的无限数据集的通用公式? 请让我知道,如果有什么不清楚。
这里是一个较小的数据例子,结果应该如何变化:
我的第一个解决这个问题的方法与@Jeeped发布的类似:
- 加载input列以对每列中的行进行数组和计数
- 用所有组合填充数组
- 将数组分配给输出范围
使用MicroTimer我已经计算了上述algorithm的每个部分所花费的平均时间。 第3部分占总执行时间的90%-93%用于更大的input数据。
以下是我尝试提高写入数据到工作表的速度。 我定义了一个常数iMinRSize=17
。 一旦可以使用相同的值填充多个iMinRSize
连续行,代码就会停止填充数组并直接写入工作表范围。
Sub CrossJoin(rSrc As Range, rTrg As Range) Dim vSrc() As Variant, vTrgPart() As Variant Dim iLengths() As Long Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long Dim i As Integer, j As Long, k As Long, l As Long Dim iStep As Long Const iMinRSize As Long = 17 Dim iArrLastC As Integer On Error GoTo CleanUp Application.ScreenUpdating = False Application.EnableEvents = False vSrc = rSrc.Value2 iCCnt = UBound(vSrc, 2) iRSrcCnt = UBound(vSrc, 1) iRTrgCnt = 1 iArrLastC = 1 ReDim iLengths(1 To iCCnt) For i = 1 To iCCnt j = iRSrcCnt While (j > 0) And IsEmpty(vSrc(j, i)) j = j - 1 Wend iLengths(i) = j iRTrgCnt = iRTrgCnt * iLengths(i) If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1 Next i If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC) iStep = 1 For i = 1 To iArrLastC k = 0 For j = 1 To iRTrgCnt Step iStep k = k + 1 If k > iLengths(i) Then k = 1 For l = j To j + iStep - 1 vTrgPart(l, i) = vSrc(k, i) Next l Next j iStep = iStep * iLengths(i) Next i rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart For i = iArrLastC + 1 To iCCnt k = 0 For j = 1 To iRTrgCnt Step iStep k = k + 1 If k > iLengths(i) Then k = 1 rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i) Next j iStep = iStep * iLengths(i) Next i End If CleanUp: Application.ScreenUpdating = True Application.EnableEvents = False End Sub Sub test() CrossJoin Range("a2:f10"), Range("k2") End Sub
如果我们将iMinRSize
设置为iMinRSize
,则所有数据都写入数组。 以下是我的样本testing结果:
如果input列的行数最高,那么代码的效果最好,但修改代码以按正确顺序对列和进程进行排名并不是一个大问题。
我通过普遍收集,你希望这容纳任何数量的列和任何数量的条目在每个。 一些变体数组应该提供必要的尺寸来计算每个值的重复周期。
Option Explicit Sub main() Call for_each_in_others(rDATA:=Worksheets("Sheet3").Range("A3"), bHDR:=True) End Sub Sub for_each_in_others(rDATA As Range, Optional bHDR As Boolean = False) Dim v As Long, w As Long Dim iINCROWS As Long, iMAXROWS As Long, sErrorRng As String Dim vVALs As Variant, vTMPs As Variant, vCOLs As Variant On Error GoTo bm_Safe_Exit appTGGL bTGGL:=False With rDATA.Parent With rDATA(1).CurrentRegion 'Debug.Print rDATA(1).Row - .Cells(1).Row With .Resize(.Rows.Count - (rDATA(1).Row - .Cells(1).Row), .Columns.Count).Offset(2, 0) sErrorRng = .Address(0, 0) vTMPs = .Value2 ReDim vCOLs(LBound(vTMPs, 2) To UBound(vTMPs, 2)) iMAXROWS = 1 'On Error GoTo bm_Output_Exceeded For w = LBound(vTMPs, 2) To UBound(vTMPs, 2) vCOLs(w) = Application.CountA(.Columns(w)) iMAXROWS = iMAXROWS * vCOLs(w) Next w 'control excessive or no rows of output If iMAXROWS > Rows.Count Then GoTo bm_Output_Exceeded ElseIf .Columns.Count = 1 Or iMAXROWS = 0 Then GoTo bm_Nothing_To_Do End If On Error GoTo bm_Safe_Exit ReDim vVALs(LBound(vTMPs, 1) To iMAXROWS, LBound(vTMPs, 2) To UBound(vTMPs, 2)) iINCROWS = 1 For w = LBound(vVALs, 2) To UBound(vVALs, 2) iINCROWS = iINCROWS * vCOLs(w) For v = LBound(vVALs, 1) To UBound(vVALs, 1) vVALs(v, w) = vTMPs((Int(iINCROWS * ((v - 1) / UBound(vVALs, 1))) Mod vCOLs(w)) + 1, w) Next v Next w End With End With .Cells(2, UBound(vVALs, 2) + 2).Resize(1, UBound(vVALs, 2) + 2).EntireColumn.Delete If bHDR Then rDATA.Cells(1, 1).Offset(-1, 0).Resize(1, UBound(vVALs, 2)).Copy _ Destination:=rDATA.Cells(1, UBound(vVALs, 2) + 2).Offset(-1, 0) End If rDATA.Cells(1, UBound(vVALs, 2) + 2).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs End With GoTo bm_Safe_Exit bm_Nothing_To_Do: MsgBox "There is not enough data in " & sErrorRng & " to perform expansion." & Chr(10) & _ "This could be due to a single column of values or one or more blank column(s) of values." & _ Chr(10) & Chr(10) & "There is nothing to expand.", vbInformation, _ "Single or No Column of Raw Data" GoTo bm_Safe_Exit bm_Output_Exceeded: MsgBox "The number of expanded values created from " & sErrorRng & _ " (" & Format(iMAXROWS, "\> #, ##0") & " rows × " & UBound(vTMPs, 2) & _ " columns) exceeds the rows available (" & Format(Rows.Count, "#, ##0") & ") on this worksheet.", vbCritical, _ "Too Many Entries" bm_Safe_Exit: appTGGL End Sub Sub appTGGL(Optional bTGGL As Boolean = True) Application.EnableEvents = bTGGL Application.ScreenUpdating = bTGGL End Sub
将列标题标签放在第2行开始列A和直接下面的数据。
我已经添加了一些错误控制来警告超过工作表上的行数。 通常情况下这不是一个可以考虑的事情,但是将不确定数量的列中的值相乘可以快速产生大量的结果。 不可预见的是你会超过1,048,576行。
经典的非连接selectSQL语句示例,返回列表中所有组合结果的笛卡尔乘积。
SQL数据库解决scheme
只需将动物,水果,国家作为单独的表导入到任何SQL数据库,如MS Access,SQLite,MySQL等,并列出没有包括隐式( WHERE
)和显式( JOIN
)连接的连接:
SELECT Animals.Animal, Fruits.Fruit, Countries.Country FROM Animals, Countries, Fruits;
Excel解决scheme
在VBA中使用包含动物,国家和水果范围的工作簿的ODBC连接运行非连接SQL语句的概念是相同的。 在示例中,每个数据分组都在其自己的同名工作表中。
Sub CrossJoinQuery() Dim conn As Object Dim rst As Object Dim sConn As String, strSQL As String Set conn = CreateObject("ADODB.Connection") Set rst = CreateObject("ADODB.Recordset") sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _ & "DBQ=C:\Path To\Excel\Workbook.xlsx;" conn.Open sConn strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] " rst.Open strSQL, conn Range("A1").CopyFromRecordset rst rst.Close conn.Close Set rst = Nothing Set conn = Nothing End Sub
您可以使用工作表公式来做到这一点。 如果你有NAME的范围 – 动物,水果和国家,“诀窍”是生成索引进入该arrays提供所有的各种组合。
例如:
=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)
将生成一个基于1的数字序列,重复水果*国家的数字条目 – 这给你每个动物需要多less行。
=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1
将产生一个基于1的系列,重复每个水果的国家数量。
=MOD(ROWS($1:1)-1,ROWS(Countries))+1))
生成1..n的重复序列,其中n是国家数量。
把这些放入公式(有一些错误检查)
D3: =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"") E3: =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1)) F3: =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))
其实,我想修改我的旧答案。 但是,我的新答案完全不同于旧的答案。 因为,旧的答案是针对特定的栏目,而这个针对的是通用栏目。 提问者回答旧的回答后,提出了他希望通用的新要求。 对于固定列,我们可以认为固定循环和无限列,我们需要从另一个angular度思考。 所以,我也这样做。 而且用户也可以看到代码的差异,我想这对初学者会有帮助。
这个新代码并不像以前那样简单。 如果你想知道关于代码的清楚,我build议逐行debugging代码。
不要担心代码。 我已经一步一步地testing过了。 它完全适合我。 如果不适合你,请告诉我。 有一件事是,这段代码可能会导致空行(它没有数据)的错误。 因为,目前,我没有添加检查。
这是我的问题的普遍方法:
Public Sub matchingCell() Dim startRawColumn, endRawColumn, startResultColumn, endResultColumn, startRow As Integer Dim index, row, column, containerIndex, tempIndex As Integer Dim columnCount, totalCount, timesCount, matchingCount, tempCount As Integer Dim isExist As Boolean Dim arrayContainer() As Variant 'Actually, even it is for universal, we need to know start column and end column of raw data. 'And also start row. And start column for write result. 'I set them for my test data. 'You need to modify them(startRawColumn, endRawColumn, startRow, startResultColumn). 'Set the start column and end column for raw data startRawColumn = 1 endRawColumn = 3 'Set the start row for read data and write data startRow = 2 'Set the start column for result data startResultColumn = 4 'Get no of raw data column columnCount = endRawColumn - startRawColumn 'Set container index containerIndex = 0 'Re-create array container for count of column ReDim arrayContainer(0 To columnCount) With Sheets("sheetname") 'Getting data from sheet 'Loop all column for getting data of each column For column = startRawColumn To endRawColumn Step 1 'Create tempArray for column Dim tempArray() As Variant 'Reset startRow row = startRow 'Reset index index = 0 'Here is one things. I looped until to blank. 'If you want anymore, you can modify the looping type. 'Don't do any changes to main body of looping. 'Loop until the cell is blank Do While .Cells(row, column) <> "" 'Reset isExist flag isExist = False 'Remove checking for no data If index > 0 Then 'Loop previous data for duplicate checking For tempIndex = 0 To index - 1 Step 1 'If found, set true to isExist and stop loop If tempArray(tempIndex) = .Cells(row, column) Then isExist = True Exit For End If Next tempIndex End If 'If there is no duplicate data, store data If Not isExist Then 'Reset tempArray ReDim Preserve tempArray(index) tempArray(index) = .Cells(row, column) 'Increase index index = index + 1 End If 'Increase row row = row + 1 Loop 'Store column with data arrayContainer(containerIndex) = tempArray 'Increase container index containerIndex = containerIndex + 1 Next column 'Now, we got all data column including data which has no duplicate 'Show result data on sheet 'Getting the result row count totalCount = 1 'Get result row count For tempIndex = 0 To UBound(arrayContainer) Step 1 totalCount = totalCount * (UBound(arrayContainer(tempIndex)) + 1) Next tempIndex 'Reset timesCount timesCount = 1 'Get the last column for result endResultColumn = startResultColumn + columnCount 'Loop array container For containerIndex = UBound(arrayContainer) To 0 Step -1 'Getting the counts for looping If containerIndex = UBound(arrayContainer) Then duplicateCount = 1 timesCount = totalCount / (UBound(arrayContainer(containerIndex)) + 1) Else duplicateCount = duplicateCount * (UBound(arrayContainer(containerIndex + 1)) + 1) timesCount = timesCount / (UBound(arrayContainer(containerIndex)) + 1) End If 'Reset the start row row = startRow 'Loop timesCount For countIndex = 1 To timesCount Step 1 'Loop data array For index = 0 To UBound(arrayContainer(containerIndex)) Step 1 'Loop duplicateCount For tempIndex = 1 To duplicateCount Step 1 'Write data to cell .Cells(row, endResultColumn) = arrayContainer(containerIndex)(index) 'Increase row row = row + 1 Next tempIndex Next index Next countIndex 'Increase result column index endResultColumn = endResultColumn - 1 Next containerIndex End With End Sub
好吧,所以你只需要一个所有可能的组合的清单。 这是我会做的:
- 首先select原始数据并逐列删除重复项。
- 然后将这3列读入3个独立的数组。
- 计算所有数组的总长度。
- 然后循环粘贴国家数组的第一个值的次数,因为有动物和水果的组合,所以这些数组的长度相乘。
- 在循环内创build另一个循环,发布水果的所有选项。 有许多重复的行数等于动物的最大数量。
- 然后将动物粘贴在一起,直到表格的最后一行。
在这里,我的方法为你的问题。
Public Sub matchingCell() Dim animalRow, fruitRow, countryRow, checkRow, resultRow As Long Dim isExist As Boolean 'Set the start row animalRow = 2 resultRow = 2 'Work with data sheet With Sheets("sheetname") 'Loop until animals column is blank Do While .Range("A" & animalRow) <> "" 'Set the start row fruitRow = 2 'Loop until fruits column is blank Do While .Range("B" & fruitRow) <> "" 'Set the start row countryRow = 2 'Loop until country column is blank Do While .Range("C" & countryRow) <> "" 'Set the start row checkRow = 2 'Reset flag isExist = False 'Checking for duplicate row 'Loop all result row until D is blank Do While .Range("D" & checkRow) <> "" 'If duplicate row found If .Range("D" & checkRow) = .Range("A" & animalRow) And _ .Range("E" & checkRow) = .Range("B" & fruitRow) And _ .Range("F" & checkRow) = .Range("C" & countryRow) Then 'Set true for exist flag isExist = True End If checkRow = checkRow + 1 Loop 'If duplicate row not found If Not isExist Then .Range("D" & resultRow) = .Range("A" & animalRow) .Range("E" & resultRow) = .Range("B" & fruitRow) .Range("F" & resultRow) = .Range("C" & countryRow) 'Increase resultRow resultRow = resultRow + 1 End If 'Increase countryRow countryRow = countryRow + 1 Loop 'Increase fruitRow fruitRow = fruitRow + 1 Loop 'Increase fruitRow animalRow = animalRow + 1 Loop End With End Sub
我已经testing过了。 它运作良好。 祝你今天愉快。
这是一个recursion的版本。 它假定数据不包含任何内部选项卡,因为核心函数返回制表符分隔的产品string 。 主子需要传递一个由数据和输出范围的左上angular单元组成的范围。 这可能会稍微调整一下,但足以用于testing目的。
ColumnProducts Range("A:C"), Range("E1")
是解决OP问题的呼叫。 这里是代码:
'the following function takes a collection of arrays of strings 'and returns a variant array of tab-delimited strings which 'comprise the (tab-delimited) cartesian products of 'the arrays in the collection Function CartesianProduct(ByVal Arrays As Collection) As Variant Dim i As Long, j As Long, k As Long, m As Long, n As Long Dim head As Variant Dim tail As Variant Dim product As Variant If Arrays.Count = 1 Then CartesianProduct = Arrays.Item(1) Exit Function Else head = Arrays.Item(1) Arrays.Remove 1 tail = CartesianProduct(Arrays) m = UBound(head) n = UBound(tail) ReDim product(1 To m * n) k = 1 For i = 1 To m For j = 1 To n product(k) = head(i) & vbTab & tail(j) k = k + 1 Next j Next i CartesianProduct = product End If End Function Sub ColumnProducts(data As Range, output As Range) Dim Arrays As New Collection Dim strings As Variant, product As Variant Dim i As Long, j As Long, n As Long, numRows As Long Dim col As Range, cell As Range Dim outRange As Range numRows = Range("A:A").Rows.Count For Each col In data.Columns n = col.EntireColumn.Cells(numRows).End(xlUp).Row i = col.Cells(1).Row ReDim strings(1 To n - i + 1) For j = 1 To n - i + 1 strings(j) = col.Cells(i + j - 1) Next j Arrays.Add strings Next col product = CartesianProduct(Arrays) n = UBound(product) Set outRange = Range(output, output.Offset(n - 1)) outRange.Value = Application.WorksheetFunction.Transpose(product) outRange.TextToColumns Destination:=output, DataType:=xlDelimited, Tab:=True End Sub