如何根据SAS / SQL中的行值进行条件计数？

重新上传，因为我上一篇文章有一些问题，我不知道我们应该发布样本数据。我对SAS相当陌生，而且我有一个问题，我知道如何在Excel中解决问题，而不是SAS。但是，数据集太大，无法在Excel中合理使用。

我有四个variables：id，year_start，groupname，test_score。

样本数据：

id year_start group_name test_score 1 19931231 Red 90 1 19941230 Red 89 1 19951231 Red 91 1 19961231 Red 92 2 19930630 Red 85 2 19940629 Red 87 2 19950630 Red 95 3 19950931 Blue 90 3 19960931 Blue 90 4 19930331 Red 95 4 19940331 Red 97 4 19950330 Red 98 4 19960331 Red 95 5 19931231 Red 96 5 19941231 Red 97

我的目标是每年通过test_score获得排名表（分数）。我希望能使用PROC RANK FRACTION来实现这个function。这个函数将通过test_score（最高为1，最高第二等于2等）来计算顺序，然后除以观察总数以提供分数等级。不幸的是，year_start在行与行之间差别很大。对于每个ID /年份的组合，我想从年份开始执行一年的回顾，并将该观察相比于在一年范围内具有year_start的所有其他ID进行排名。我没有兴趣按历年来比较，每个ID的级别应该是相对于它自己的year_start。添加另一个层次的复杂性，我想这个等级由groupname执行。

PROC SQL是完全正确的，如果有人有一个SQL解决scheme。

使用上面的数据，队伍会是这样的：

 id year_start group_name test_score rank 1 19931231 Red 90 0.75 1 19941230 Red 89 0.8 1 19951231 Red 91 1 1 19961231 Red 92 1 2 19930630 Red 85 1 2 19940629 Red 87 0.8 2 19950630 Red 95 0.75 3 19950931 Blue 90 1 3 19960931 Blue 90 1 4 19930331 Red 95 1 4 19940331 Red 97 0.2 4 19950330 Red 98 0.2 4 19960331 Red 95 0.333 5 19931231 Red 96 0.25 5 19941231 Red 97 0.667

为了计算行1的等级，

我们首先排除蓝色观察。
然后，我们计算在year_start，19931231之前一年内的观测值的数量（所以我们有4个观测值）。
我们计算这些观测值中有多less具有较高的test_score，然后加1以查找当前观测的顺序（所以它是第三高的）。
然后，我们将这个顺序除以总数得到排名（3/4 = 0.75）。

在Excel中，这个variables的公式看起来像这样。假设公式是行1，有100行。 id = A，year_start = B，groupname = C，test_score = D：

  =(1+countifs(D1:D100,">"&D1, B1:B100,"<="&B1, B1:B100,">"&B1-365.25, C1:C100, C1))/ countifs(B1:B100,"<="&B1, B1:B100,">"&B1-365.25, C1:C100, C1)

非常感谢你的帮助！

ahammond428

如果我正确地阅读，你的例子是不正确的，所以很难确切地知道你想要做什么。但尝试下面，看看它是否工作。您可能需要根据您是否要包含一年的date来调整不公开或closures的不平等。请注意，您的year_start列需要以SASdate格式导入才能正常工作。否则，您可以使用input（year_start，yymmdd8）对其进行更改。

 proc sql; select distinct a.id, a.year_start, a.group_name, a.test_score, 1+sum(case when b.test_score > a.test_score then 1 else 0 end) as rank_num, count(b.id) as rank_denom, calculated rank_num / calculated rank_denom as rank from testdata a left join testdata b on a.group_name = b.group_name and intnx('year',a.year_start,-1,'s') le b.year_start le a.year_start group by a.id, a.year_start, a.group_name, a.test_score order by id, year_start; quit;

请注意，我将9/31的date更改为9/30（因为没有9/31），但是只剩下3/30，6/29和12/30，因为这可能是有意的，尽pipe其他date似乎四分之一结束。

考虑SQL中的相关计数子查询：

数据

 data ranktable; infile datalines missover; input id year_start group_name $ test_score; datalines; 1 19931231 Red 90 1 19941230 Red 89 1 19951231 Red 91 1 19961231 Red 92 2 19930630 Red 85 2 19940629 Red 87 2 19950630 Red 95 3 19950930 Blue 90 3 19960930 Blue 90 4 19930331 Red 95 4 19940331 Red 97 4 19950330 Red 98 4 19960331 Red 95 5 19931231 Red 96 5 19941231 Red 97 ; run; data ranktable; set ranktable; format year_start date9.; year_start = input(put(year_start,z8.),yymmdd8.); run;

PROC SQL

其他字段包括您的审查

 proc sql; select r.id, r.year_start, r.group_name, r.test_score, put(intnx('year', r.year_start, -1, 's'), yymmdd10.) as year_ago, (select count(*) from ranktable sub where sub.test_score >= r.test_score and sub.group_name = r.group_name and sub.year_start <= r.year_start and sub.year_start >= intnx('year', r.year_start, -1, 's')) as num_rank, (select count(*) from ranktable sub where sub.group_name = r.group_name and sub.year_start <= r.year_start and sub.year_start >= intnx('year', r.year_start, -1, 's')) as denom_rank, calculated num_rank / calculated denom_rank as rank from ranktable r; run;

OUTPUT

您会注意到您的预期结果可能会有所不同，这可能与您申请所有年份的季度日（365.25）有关，因为SAS的国际intnx需要花费一整整的日历年，并且每年都会发生变化

Proc SQL输出

如何根据SAS / SQL中的行值进行条件计数？

如何将'.xlsx'文件发送到python打印机

Excel数字Wildacard

表在不活动的表上

有没有办法在Excel VBA中创build文件夹和子文件夹？

win32com Excel PasteSpecial

dynamic用户表单VBA上的事件

VBA – 尝试转置行值时运行时错误“13”

如何根据单元格值在Excel中将行从一个表复制到另一个表

使用标准查找唯一的计数

使用参数调用macros：Python win32com API