select only rows with duplicate id and specific value from another column in R

select only rows with duplicate id and specific value from another column in R

I have the following data with ID and value:

id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2","1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2","2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5","2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1","2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2") value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6)

If you notice, there are multiple values for the same id. What I'd like to do is get the value that are only 3 and 6 only if the IDs are the same. for eg. ID "1103-5" has both 3 and 6, so it should be in the list, but not "2347-2"

I'm using R

One method I tried is the following, but it gives me everything with value 3 and 6.

d <- data.frame(id, value) group36 <- d[d$value == 3 | d$value == 6,]

and

d %>% group_by(id) %>% filter(3 == value | 6 == value)

The output should be like this:

id value 1103-5 6 1103-5 3 1104-2 6 1104-2 3 1104-4 6 1104-4 3 1106-2 6 1106-2 3 1106-3 6 1106-3 3 2294-1 3 2294-1 6 2294-2 3 2294-2 6 2294-3 3 2294-3 6 2294-4 3 2294-4 6 2294-5 3 2294-5 6

I'm not clear on your expected output; first you say that you want to keep only entries where value = 6 or value = 3; your last sentence 'but it gives me everything with 3 or 6" suggests that this is not what you want. Can you clarify? Can you add the specific expected output for the sample data you give?
– Maurits Evers
Jul 2 at 0:12

value = 6

value = 3

@MauritsEvers Sorry for the confusion, I meant to say, in the last sentence, that it also gives me the rows where the IDs do not match with the value of 3.
– user6401955
Jul 2 at 0:17

I think this could also be thought of as a setequal problem: d %>% group_by(id) %>% filter(value %in% c(3,6)) %>% filter(setequal(c(3,6), value))
– thelatemail
Jul 2 at 1:07

setequal

d %>% group_by(id) %>% filter(value %in% c(3,6)) %>% filter(setequal(c(3,6), value))

2 Answers
2

d<-group_by(d,id) filter(d,any(value==3),any(value==6))

This gives you all the IDs where there is both a value of 3 (somewhere) AND a value of 6 (somewhere). Mind you, your data contains some IDs with THREE values. In these cases, if both 3 and 6 are present, it will be included in the result.

If you want to exclude those lines that remain which done equal 3 or 6, add this:

filter(d,value==3 | value==6)

If you want to exclude IDs that also have 3 and 6 as values but also have OTHER values, use this:

filter(d,any(value==3),any(value==6),value==3 | value==6)

That didn't work. It also gives me the value 9
– user6401955
Jul 2 at 0:27

That's because there are cases with three values (3, 6, 9). What do you want to do with these cases? Exclude them completely? Or just remove the lines with value==9?
– doviod
Jul 2 at 0:28

I added the output in the question. I only want to select the cases where the ID with the value 3 and 6 is listed. I'll have to do the same thing later for 3 and 9.
– user6401955
Jul 2 at 0:31

See my edited answer. You'll be wanting the last option.
– doviod
Jul 2 at 0:35

Glad it worked for you. It's a matter of figuring out the conditions. You wanted only cases where both 3 and 6 showed for the same ID, so that means selecting only lines where any of the cases for that ID had 3 AND 6, and then selecting of those only cases where the value is either 3 OR 6.
– doviod
Jul 2 at 4:35

Not sure if this is what you want. We can filter rows that equal to either 3 or 6 then convert from long to wide format and keep only columns which have both 3 and 6 values. After that, convert back to long format.

3

6

3

6

library(dplyr) library(tidyr) id <- c("1103-5","1103-5","1104-2","1104-2","1104-4","1104-4","1106-2","1106-2", "1106-3","1106-3","2294-1","2294-1","2294-2","2294-2","2294-2", "2294-3","2294-3","2294-3","2294-4","2294-4","2294-5","2294-5","2294-5", "2300-1","2300-1","2300-2","2300-2","2300-4","2300-4","2321-1","2321-1", "2321-2","2321-2","2321-3","2321-3","2321-4","2321-4","2347-1","2347-1","2347-2","2347-2") value <- c(6,3,6,3,6,3,6,3,6,3,3,6,9,3,6,9,3,6,3,6,9,3,6,9,6,9,6,9,6,9,3,9,3,9,3,9,3,9,6,9,6) d <- data.frame(id, value) d %>% group_by(id) %>% filter(value %in% c(3, 6)) %>% mutate(rows = 1:n()) %>% spread(key = id, value) %>% select_if(~ all(!is.na(.))) #> # A tibble: 2 x 11 #> rows `1103-5` `1104-2` `1104-4` `1106-2` `1106-3` `2294-1` `2294-2` #> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 6 6 6 6 6 3 3 #> 2 2 3 3 3 3 3 6 6 #> # ... with 3 more variables: `2294-3` <dbl>, `2294-4` <dbl>, #> # `2294-5` <dbl> d %>% group_by(id) %>% filter(value %in% c(3, 6)) %>% mutate(rows = 1:n()) %>% spread(key = id, value) %>% select_if(~ all(!is.na(.))) %>% select(-rows) %>% gather(id, value) #> # A tibble: 20 x 2 #> id value #> <chr> <dbl> #> 1 1103-5 6 #> 2 1103-5 3 #> 3 1104-2 6 #> 4 1104-2 3 #> 5 1104-4 6 #> 6 1104-4 3 #> 7 1106-2 6 #> 8 1106-2 3 #> 9 1106-3 6 #> 10 1106-3 3 #> 11 2294-1 3 #> 12 2294-1 6 #> 13 2294-2 3 #> 14 2294-2 6 #> 15 2294-3 3 #> 16 2294-3 6 #> 17 2294-4 3 #> 18 2294-4 6 #> 19 2294-5 3 #> 20 2294-5 6

Created on 2018-07-01 by the reprex package (v0.2.0.9000).

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

EX5 zn4M1ANAy4,5bTANHNRZCXd po1goObk7vNd5 ivOO,pPEc0NpNG,A9OzXoGWpk58kGCqOcR62 N,6o Ij4M6

搜尋此網誌

Gtjkyu