{"id":129,"date":"2014-03-26T14:47:02","date_gmt":"2014-03-26T13:47:02","guid":{"rendered":"http:\/\/candrea.ch\/blog\/?p=129"},"modified":"2014-03-26T14:52:10","modified_gmt":"2014-03-26T13:52:10","slug":"select-pupils-by-number-of-pupils-per-group","status":"publish","type":"post","link":"https:\/\/candrea.ch\/blog\/select-pupils-by-number-of-pupils-per-group\/","title":{"rendered":"Select Pupils by Number of Pupils per Group"},"content":{"rendered":"<p>Selecting rows conditioned on values in columns is easy, as for example selecting people aged over 33. What is about selecting rows conditioned on statistics computed on multiple rows of the data frame, as for example selecting pupils in groups by the number of pupils per group?<\/p>\n<p>That is where the very nice dplyr package comes in.<\/p>\n<p>We build and print the data frame:<\/p>\n<pre class=\"lang:default decode:true\">df &lt;- data.frame(id=1:9, classid=c(1,1,1,2,2,3,3,3,3), math=round(runif(9,1,6),1))<\/pre>\n<pre class=\"lang:default decode:true\">&gt; print(df)\r\n  id classid math\r\n1  1       1  5.4\r\n2  2       1  4.0\r\n3  3       1  1.1\r\n4  4       2  2.2\r\n5  5       2  3.9\r\n6  6       3  2.7\r\n7  7       3  6.0\r\n8  8       3  2.0\r\n9  9       3  1.6<\/pre>\n<p>Now, we want to select &#8211; i.e. &#8220;filter&#8221; in terms of the dplyr package &#8211; pupils that are part of groups\/classes with more than two pupils per class. In dplyr there are three different syntaxes to achieve this.<\/p>\n<pre class=\"lang:default decode:true\"># step-by-step\r\ndf.g &lt;- group_by(df, classid)\r\ndf.n &lt;- filter(df.g, n()&gt;2)\r\n\r\n# or nested syntax\r\ndf.n &lt;- filter(group_by(df, classid), n()&gt;2)\r\n\r\n# or with %.% operator\r\ndf.n &lt;- df %.% \r\n  group_by(classid) %.% \r\n  filter(n()&gt;2)<\/pre>\n<p>The result is the same for all:<\/p>\n<pre class=\"lang:default decode:true\">&gt; print(df.n)\r\nSource: local data frame [7 x 3]\r\nGroups: classid\r\n\r\n  id classid math\r\n1  1       1  5.4\r\n2  2       1  4.0\r\n3  3       1  1.1\r\n4  6       3  2.7\r\n5  7       3  6.0\r\n6  8       3  2.0\r\n7  9       3  1.6<\/pre>\n<p>Of course, you can do this the pure-R-way,<\/p>\n<pre class=\"lang:default decode:true\">&gt; df.c &lt;- df[df$classid %in% which(xtabs(~classid, df)&gt;2), ]\r\n&gt; print(df.c)\r\n  id classid math\r\n1  1       1  5.4\r\n2  2       1  4.0\r\n3  3       1  1.1\r\n6  6       3  2.7\r\n7  7       3  6.0\r\n8  8       3  2.0\r\n9  9       3  1.6<\/pre>\n<p>but I think with dplyr it looks quite a bit nicer.<\/p>\n<p>Happy dpylr!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Selecting rows conditioned on values in columns is easy, as for example selecting people aged over 33. What is about selecting rows conditioned on statistics computed on multiple rows of the data frame, as for example selecting pupils in groups by the number of pupils per group? That is where the very nice dplyr package &hellip; <a href=\"https:\/\/candrea.ch\/blog\/select-pupils-by-number-of-pupils-per-group\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Select Pupils by Number of Pupils per Group<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-129","post","type-post","status-publish","format-standard","hentry","category-r"],"_links":{"self":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/129"}],"collection":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/comments?post=129"}],"version-history":[{"count":8,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/129\/revisions"}],"predecessor-version":[{"id":138,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/posts\/129\/revisions\/138"}],"wp:attachment":[{"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/media?parent=129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/categories?post=129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/candrea.ch\/blog\/wp-json\/wp\/v2\/tags?post=129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}