具有用户定义维数的大数组

我最近写了一个问题,要求帮助如何统计一个人口中每一个独特的过敏事件的发生次数。 我得到的解决scheme是伟大的,但是我现在需要看3 +过敏的组合,使用Excel表完成这一切将永远。

我决定写一个VBA脚本来做到这一点,这对脚本来说非常有用。 自从我返回并更改源数据的格式以来,每个ExceptionID的关联AllergenID都存储在一个由逗号分隔的string中。

我现在正在考虑移动到一个3D或更高的数组,因为我们不知道我们可能需要多less维(可能10或15),我宁愿避免使用一系列Case或嵌套的If/Then声明。

我的研究发现了这篇文章,在这篇文章中我收集了我所问的实际上是不可能的,但是我想问一下OP的声明

我在想,如果我可以在运行时将Redim语句构build为一个string并执行string,那么可以这样做,但这似乎不可能。

我基本上有同样的想法。 下面的代码生成一个types不匹配的错误,但没有变化,这可能工作? ReDim内部不能传递其他函数吗?

 Sub testroutine() Dim x As Integer, y As Integer 'just a counter Dim PairCount() As String Dim AllergenRef As Object 'Object to store a reference to each AllergenID using AllergenKey as key Set AllergenRef = CreateObject("Scripting.Dictionary") For x = 1 To 20 AllergenRef.Add x, (x * 10) + (2 ^ x) 'dummy data for my dictionary Next x Dim N_tuple As Integer N_tuple = 5 'this value would be provided by a user form at runtime Dim ArrayDim() As String ReDim ArrayDim(1 To N_tuple) For x = 1 To N_tuple ArrayDim(x) = "1 to " & AllergenRef.Count Next x ReDim PairCount(Join(ArrayDim, ",")) 'This is the line that throws an error End Sub 

这篇文章听起来像我正在做的事情可能在Java中,但我不会说任何Javanese,所以我不能说这是多么类似这是我想要实现的,或者如果有办法将此方法应用于VBA …

======== UPDATE ============
这是我正在使用的数据的一个示例(在单独的列中,为了清晰起见,我添加了破折号)

ExceptionID – ExcAllergens
035 – 100380
076 – 100107,100392,100345,100596,100141,100151,100344
200 – 100123,100200
325 – 100381
354 – 100381,100123
355 – 100381,100123
360 – 100586
390 – 100151,100344,100345,100349
441 – 100380,100368
448 – 100021,100181,100345,100200,100344,100295
491 – 100381
499 – 100333
503 – 100333
507 – 100331,100346,100596,100345,100344,100269,100283

这里是过敏原定义表(过敏原密钥是我刚刚添加,以便有较小的数字来处理,6位数字是我们的数据库中使用的东西)的摘录。

AllergenKey – 过敏原 – 过敏原标签
01 – 100011 – Açai浆果
02 – 100012 – 醋酸
03 – 100013 – 琼脂琼脂
04 – 100014 – 龙舌兰
05 – 100015 – 酒精
06 – 100016 – 五香粉
07 – 100017 – 碳酸氢铵
08 – 100018 – 淀粉酶
09 – 100019 – 绛珠
10 – 100020 – 苹果
11 – 100021 – 苹果,原始
12 – 100022 – 杏
13 – 100023 – 葛粉
14 – 100025 – 抗坏血酸
15 – 100027 – 芦笋
16 – 100028 – 鳄梨
17 – 100029 – 细菌培养
18 – 100030 – 发酵粉

请注意,有6810exceptionconfiguration文件,范围从1到51个独立的过敏(平均约4或5),和451不同的过敏原。 这里是我对过敏原对的分析结果(顺便说一下,当我说“过敏原”,它也包括像素食者的饮食偏好):

前10对 – 配对计数 – 过敏原1 – 过敏原2
1 – 245 – 乳制品 – 面筋
2 – 232 – 鸡蛋 – 坚果
3 – 190 – 乳品 – 鸡蛋
4 – 173 – 麸质 – 燕麦
5 – 146 – 大豆(可能含有) – 大豆
6 – 141 – 乳制品 – 坚果
7 – 136 – 牛肉 – 猪肉
8 – 120 – 乳制品 – 大豆
9 – 114 – 芝麻(可能含有) – 坚果
10 – 111 – 素食1 – 猪肉

我不会担心与您的中等大小的数据集最大可能的组合。 你将无法做出所有可能的组合。 你会有很多组合,不会在样本人口中发生。 不要试图把它们全部计算出来,然后计算出现的次数。

相反,通过您的样本人口,并创build工作表“数组”的数据项的元组。 我build议使用3位数的变应原密钥作为标识符号码,并将元组中的数字合并为一个Long(也许十进制可能需要更大的数字)。

我build议的方法是将这些元组合成可以在以后很容易分解的长元素。 然后使用频率函数来计算每个元组“数字”的出现次数。 所以如果有密钥的变应原:1,17,451 – 他们形成一个长达1,017,451(等同于451,17和1)的组合 – 我们确保任何元组都有强制的最小密钥到最大密钥的顺序。 所以最大三倍是449,450,451,最小是1,002,003。 请注意,你永远不能有3,002,001,因为这将重复1,002,003。

我玩过的模块如下: 编辑 – 更好的代码

 Option Explicit Option Base 1 Public Function concID(paramArr() As Variant) As Variant ' this function takes an array of numbers and arranges the array into ' one long code number - with order of smallest to largest ' the code number generated has each individual array entry as a 3-digit component Dim wsf As WorksheetFunction Dim decExp As Integer Dim i As Long, j As Long Dim bigNum As Variant ' may need to cast to Decimal?? Set wsf = WorksheetFunction 'may use cDec if necessary here?? For i = 1 To UBound(paramArr) 'determine the position of the component by multiplying by a multiple of 10^3 decExp = 3 * (UBound(paramArr) - i) bigNum = bigNum + wsf.Small(paramArr, i) * 10 ^ decExp Next i concID = bigNum End Function Public Sub runAllergen() Dim ws As Worksheet Dim dataRange As Range, tupleRange As Range, uniqueList As Range, freqRange As Range, r As Range Dim i As Long, j As Long, counter As Long Dim dataArray As Variant, arr As Variant, tempholder As Long Dim bigArray(1 To 10 ^ 6, 1 To 1) As Variant ' the array which will hold all the generated combinations from the data Dim tuple As Long tuple = 3 'this will come in as a user input. Set ws = Sheet1 Set dataRange = ws.Range("A2:A10001") 'I have 10k people in my dataset, and this is just the allergen data vector Application.ScreenUpdating = False 'IMPORTANT for efficiency tempholder = 1 'this is the array index which the next combi entry is to be put into bigArray dataArray = dataRange.Value 'write entire worksheet column to internal array for efficiency For i = 1 To UBound(dataArray) 'obtain array of allergen values in each data row to obtain tuples from arr = Split(dataArray(i, 1), ",") If UBound(arr) + 1 >= tuple Then 'give over the array of row data to make tuples from and write to bigArray 'return the next available index of bigArray to store data tempholder = printCombinations(arr, tuple, bigArray(), tempholder) End If Next i Set r = ws.Range("B2") 'write entire list of tuples from data population to worksheet for efficiency - MASSIVE performance boost r.Resize(tempholder - 1, 1).Value = bigArray 'copy tuple output over to another column to remove duplicates and get unique list Set tupleRange = ws.Range(r, r.End(xlDown)) tupleRange.Copy Set r = ws.Range("D2") r.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _ :=False, Transpose:=False 'remove duplicates from copied tuple output to get a unique list of codes to serve as bins in FREQUENCY function ws.Range(r, r.End(xlDown)).RemoveDuplicates Columns:=1, Header:=xlNo Set uniqueList = ws.Range(r, r.End(xlDown)) Application.CutCopyMode = False 'set the frquency output range which is always 1 more row than the bins array Set freqRange = uniqueList.Offset(0, 1).Resize(uniqueList.Rows.Count + 1, 1) 'get the frequency of each tuple freqRange.FormulaArray = "=FREQUENCY(R2C" & tupleRange.Column & ":R" & tupleRange.Rows.Count + 1 & _ "C" & tupleRange.Column & _ ",R2C" & uniqueList.Column & ":R" & uniqueList.Rows.Count + 1 & "C" & uniqueList.Column & ")" Application.ScreenUpdating = True End Sub Public Function printCombinations(pool As Variant, r As Long, printVector As Variant, tempPosition As Long) As Long 'this function writes the data row arrays as tuples/combis to the bigArray, 'and returns the next available index in bigArray Dim i As Long, j As Long, n As Long Dim tempholder() As Variant Dim idx() As Long ReDim tempholder(1 To r) ReDim idx(1 To r) n = UBound(pool) - LBound(pool) + 1 For i = 1 To r idx(i) = i Next i Do For j = 1 To r tempholder(j) = CLng(pool(idx(j) - 1)) Next j 'we now have an array of size tuple from the row data, so construct our code number, 'and write to the next available index in bigArray printVector(tempPosition, 1) = concID(tempholder) tempPosition = tempPosition + 1 ' Locate last non-max index i = r While (idx(i) = n - r + i) i = i - 1 If i = 0 Then 'the algorithm has ended with the last index exhausted 'return the next available index of bigArray printCombinations = tempPosition Exit Function End If Wend idx(i) = idx(i) + 1 For j = i + 1 To r idx(j) = idx(i) + j - i Next j Loop End Function 

初始设置:

在这里输入图像说明

你也可以复制你的频率范围粘贴到值等….

为了扩大我的评论,这里是一些修改后的代码,使用基于提供的N_tuplevariables的数组数组。 我很难想象这种情况不适合你:

 Sub testroutine() Dim x As Integer, y As Integer 'just a counter Dim ArrayTemp() As Variant Dim PairCount() As Variant Dim AllergenRef As Object 'Object to store a reference to each AllergenID using AllergenKey as key Set AllergenRef = CreateObject("Scripting.Dictionary") For x = 1 To 20 AllergenRef.Add x, (x * 10) + (2 ^ x) 'dummy data for my dictionary Next x Dim N_tuple As Integer N_tuple = 5 'this value would be provided by a user form at runtime 'Now that you have your N_tuple, redim your paircount array ReDim PairCount(1 To N_tuple) 'For each N_tuple, create an array and add it to the PairCount array 'Note that you could easily have a 2-dimensional array for a table of values as ArrayTemp For x = 1 To N_tuple ReDim ArrayTemp(1 To AllergenRef.Count) PairCount(x) = ArrayTemp Next x 'Now you have an array of arrays, which can be easily accessed. 'For example: PairCount(2)(3) 'Or if the subarrays are 2-dimensional: PairCount(4)(6, 12) 'This simply loops through the PairCount array and shows the ubound of its subarrays For x = 1 To UBound(PairCount) MsgBox UBound(PairCount(x)) Next x End Sub