四分位间距 – 低,中和高

我正在尝试根据可以是任意长度的一系列数字来计算四分位间距

1, 1, 5, 6, 7, 8, 2, 4, 7, 9, 9, 9, 9 

我需要从这个四分位数范围内解决的问题是:

  • 上四分位
  • 中位数
  • 下四分位数

如果我将以上数组数组转换为Microsoft Excel(列A:M),那么我可以使用下列公式:

  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)

为了得到我的答案:

  • 4
  • 7
  • 9

我现在需要在SQL Server或VB.NET中计算出这三个值。 我可以用任何一种语言获取任何格式或对象的数组值,但是我找不到像Excel所具有的QUARTILE.INC函数那样的函数。

有谁知道这可以在SQL Server或VB.NET中实现吗?

可能有更简单的方法,但要获得Quartiles,可以使用NTILE(Transact-SQL)

将有序分区中的行分配到指定数量的组中。 这些小组从一开始编号。 对于每行,NTILE返回该行所属的组的编号。

所以对于你的数据:

 SELECT 1 Val INTO #temp UNION ALL SELECT 1 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 2 UNION ALL SELECT 4 UNION ALL SELECT 7 UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 9 UNION ALL SELECT 9 -- NTILE(4) specifies you require 4 partitions (quartiles) SELECT NTILE(4) OVER ( ORDER BY Val ) AS Quartile , Val INTO #tempQuartiles FROM #temp SELECT * FROM #tempQuartiles DROP TABLE #temp DROP TABLE #tempQuartiles 

这会产生:

 Quartile Val 1 1 1 1 1 2 1 4 2 5 2 6 2 7 3 7 3 8 3 9 4 9 4 9 4 9 

从这里你可以计算出你的结果。

所以修改SELECT可以做到这一点:

 SELECT Quartile, MAX(Val) MaxVal FROM #tempQuartiles WHERE Quartile <= 3 GROUP BY Quartile 

生产:

 Quartile MaxVal 1 4 2 7 3 9 

我们已经创build了一个User-Defined-Type来将它用作函数参数,然后用这种方法。

我们的实现使用与Excel Percentile函数相同的计算。

 CREATE TYPE [dbo].[floatListType] AS TABLE ( [value] FLOAT NOT NULL ); GO CREATE FUNCTION [dbo].[getPercentile] ( @data floatListType readonly, @percentile float ) RETURNS float AS BEGIN declare @values table ( value float, idx int ); insert into @values select value, ROW_NUMBER() OVER (order by value) - 1 as idx from @data; declare @cnt int = (select count(*) from @values) , @n float = (@cnt - 1) * @percentile + 1 , @k int = FLOOR(@n) , @d float = @n - @k; if (@k = 0) return (select value from @values where idx = 0) if (@k = @cnt) return (select value from @values where idx = @cnt - 1) if (@k > 0 AND @k < @cnt) return (select value from @values where idx = @k - 1) + @d * ((select value from @values where idx = @k) - (select value from @values where idx = @k - 1)) return null; END 

你可以像这样使用它来得到中位数和四分位数(Q1是一个0.25百分位),例如:

 declare @values floatListType; insert into @values select value from #mytable select getPercentile(@values, 0.25) as Q1, getPercentile(@values, 0.5) as median, getPercentile(@values, 0.75) as Q3 

道歉,如果我误解了你,但这可以使用NTILE()和以后在ROW_NUMBER()

SQL代码:

 ;WITH FirstStep (NT, N) AS ( SELECT NTILE(3) OVER (ORDER BY T.column1), T.column1 FROM dbo.GetTableFromList_Int('1, 1, 5, 6, 7, 8, 2, 4, 7, 9, 9, 9, 9', ',') AS T ), SecondStep (RN, NT, N) AS ( SELECT ROW_NUMBER() OVER (PARTITION BY T.NT ORDER BY TN DESC), NT, TN FROM FirstStep AS T ) SELECT N FROM SecondStep WHERE RN = 1 

说明:

  • dbo.GetTableFromList_Int()TVF将我的string分割成行(DISTINCT)
  • 我们使用NTILE(3)将其分成三类,按列表sorting(IIRC,您需要命令您的列表以获得正确的值)
  • 然后使用ROW_NUMBER()在每个组中获得正确的值。

在你的情况下,它返回预期的结果。

如果这不是你所需要的,那么它可以被修改以获得正确的输出。

如果你想要一个SQL Server解决scheme,几年前我在我的博客上发布了一个Interquartile Range过程 。 它基于dynamicSQL,所以你可以插入任何你有权访问的列。 它没有经过很好的testing,当时我仍然在学习,现在的代码已经有点老了,但它可以满足你的需求,或者至less为你自己的解决scheme提供一个起点。 这里是代码的要点 – 按照我的博客链接进行深入的讨论。

 CREATE PROCEDURE [Calculations].[InterquartileRangeSP] @DatabaseName as nvarchar(128) = NULL, @SchemaName as nvarchar(128), @TableName as nvarchar(128),@ColumnName AS nvarchar(128), @PrimaryKeyName as nvarchar(400), @OrderByCode as tinyint = 1, @DecimalPrecision AS nvarchar(50) AS SET @DatabaseName = @DatabaseName + '.' DECLARE @SchemaAndTableName nvarchar(400) SET @SchemaAndTableName = ISNull(@DatabaseName, ”) + @SchemaName + '.' + @TableName DECLARE @SQLString nvarchar(max) SET @SQLString = 'DECLARE @OrderByCode tinyint, @Count bigint, @LowerPoint bigint, @UpperPoint bigint, @LowerRemainder decimal(38,37), — use the maximum precision and scale for these two variables to make the procedure flexible enough to handle large datasets; I suppose I could use a float @UpperRemainder decimal(38,37), @LowerQuartile decimal(' + @DecimalPrecision + '), @UpperQuartile decimal(' + @DecimalPrecision + '), @InterquartileRange decimal(' + @DecimalPrecision + '), @LowerInnerFence decimal(' + @DecimalPrecision + '), @UpperInnerFence decimal(' + @DecimalPrecision + '), @LowerOuterFence decimal(' + @DecimalPrecision + '), @UpperOuterFence decimal(' + @DecimalPrecision + ') SET @OrderByCode = ' + CAST(@OrderByCode AS nvarchar(50)) + ' SELECT @Count=Count(' + @ColumnName + ') FROM ' + @SchemaAndTableName + ' WHERE ' + @ColumnName + ' IS NOT NULL SELECT @LowerPoint = (@Count + 1) / 4, @LowerRemainder = ((CAST(@Count AS decimal(' + @DecimalPrecision + ')) + 1) % 4) /4, @UpperPoint = ((@Count + 1) *3) / 4, @UpperRemainder = (((CAST(@Count AS decimal(' + @DecimalPrecision + ')) + 1) *3) % 4) / 4; –multiply by 3 for the left s' + @PrimaryKeyName + 'e on the upper point to get 75 percent WITH TempCTE (' + @PrimaryKeyName + ', RN, ' + @ColumnName + ') AS (SELECT ' + @PrimaryKeyName + ', ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY ' + @ColumnName + ' ASC) AS RN, ' + @ColumnName + ' FROM ' + @SchemaAndTableName + ' WHERE ' + @ColumnName + ' IS NOT NULL), TempCTE2 (QuartileValue) AS (SELECT TOP 1 ' + @ColumnName + ' + ((Lead(' + @ColumnName + ', 1) OVER (ORDER BY ' + @ColumnName + ') – ' + @ColumnName + ') * @LowerRemainder) AS QuartileValue FROM TempCTE WHERE RN BETWEEN @LowerPoint AND @LowerPoint + 1 UNION SELECT TOP 1 ' + @ColumnName + ' + ((Lead(' + @ColumnName + ', 1) OVER (ORDER BY ' + @ColumnName + ') – ' + @ColumnName + ') * @UpperRemainder) AS QuartileValue FROM TempCTE WHERE RN BETWEEN @UpperPoint AND @UpperPoint + 1) SELECT @LowerQuartile = (SELECT TOP 1 QuartileValue FROM TempCTE2 ORDER BY QuartileValue ASC), @UpperQuartile = (SELECT TOP 1 QuartileValue FROM TempCTE2 ORDER BY QuartileValue DESC) SELECT @InterquartileRange = @UpperQuartile – @LowerQuartile SELECT @LowerInnerFence = @LowerQuartile – (1.5 * @InterquartileRange), @UpperInnerFence = @UpperQuartile + (1.5 * @InterquartileRange), @LowerOuterFence = @LowerQuartile – (3 * @InterquartileRange), @UpperOuterFence = @UpperQuartile + (3 * @InterquartileRange) –SELECT @LowerPoint AS LowerPoint, @LowerRemainder AS LowerRemainder, @UpperPoint AS UpperPoint, @UpperRemainder AS UpperRemainder — uncomment this line to debug the inner calculations SELECT @LowerQuartile AS LowerQuartile, @UpperQuartile AS UpperQuartile, @InterquartileRange AS InterQuartileRange,@LowerInnerFence AS LowerInnerFence, @UpperInnerFence AS UpperInnerFence,@LowerOuterFence AS LowerOuterFence, @UpperOuterFence AS UpperOuterFence SELECT ' + @PrimaryKeyName + ', ' + @ColumnName + ', OutlierDegree FROM (SELECT ' + @PrimaryKeyName + ', ' + @ColumnName + ', ”OutlierDegree” = CASE WHEN (' + @ColumnName + ' < @LowerInnerFence AND ' + @ColumnName + ' >= @LowerOuterFence) OR (' + @ColumnName + ' > @UpperInnerFence AND ' + @ColumnName + ' <= @UpperOuterFence) THEN 1 WHEN ' + @ColumnName + ' < @LowerOuterFence OR ' + @ColumnName + ' > @UpperOuterFence THEN 2 ELSE 0 END FROM ' + @SchemaAndTableName + ' WHERE ' + @ColumnName + ' IS NOT NULL) AS T1 ORDER BY CASE WHEN @OrderByCode = 1 THEN ' + @PrimaryKeyName + ' END ASC, CASE WHEN @OrderByCode = 2 THEN ' + @PrimaryKeyName + ' END DESC, CASE WHEN @OrderByCode = 3 THEN ' + @ColumnName + ' END ASC, CASE WHEN @OrderByCode = 4 THEN ' + @ColumnName + ' END DESC, CASE WHEN @OrderByCode = 5 THEN OutlierDegree END ASC, CASE WHEN @OrderByCode = 6 THEN OutlierDegree END DESC' –SELECT @SQLString — uncomment this to debug string errors EXEC (@SQLString)