subset a dataframe based on a matrix of row numbers and save the result in one list

Multi tool use
Multi tool use


subset a dataframe based on a matrix of row numbers and save the result in one list



I have a data frame called df that looks like:


> df
Date A B C
1 2001 1 12 14
2 2002 2 13 15
3 2003 3 14 16
4 2004 4 15 17
5 2005 5 16 18
6 2006 6 17 19
7 2007 7 18 20
8 2008 8 19 21
9 2009 9 20 22
10 2010 10 21 23



and a matrix called index that looks like:


> index
Resample01 Resample02 Resample03 Resample04 Resample05
[1,] 1 7 1 2 7
[2,] 3 9 2 3 8
[3,] 5 1 3 8 1
[4,] 8 3 4 9 4
[5,] 10 4 5 10 9



The numbers in each column stands for the row number to be selected.



The aim is to split the dataframe into two exclusive groups of "train" and "test" according to the row numbers in each column of the matrix "index". For example for "Resample01", the result should be look like:


> train
Date A B C
1 2001 1 12 14
3 2003 3 14 16
5 2005 5 16 18
8 2008 8 19 21
10 2010 10 21 23



and


> test
Date A B C
2 2002 2 13 15
4 2004 4 15 17
6 2006 6 17 19
7 2007 7 18 20
9 2009 9 20 22



and this process should be done for each colum in "index", and the results should be saved in two lists of "train" and "test", in which "train" is like:


$train1
Date A B C
1 2001 1 12 14
3 2003 3 14 16
5 2005 5 16 18
8 2008 8 19 21
10 2010 10 21 23

$train2
:
:
$train5



and "test" should be in the same format.



Only to note that my df accually contains 43,000 observations and the index matrix has 2000 columns and more than 20,000 rows. I know that subsetting for one column is easy, by doing:


test = df[-c(index[,1]),]



but for multiple columns I don't know how to do it (or loop it), and the saving form of a list seems also difficult.




2 Answers
2



You could try it something like this. The result should be of length ncol(index) and each element should hold two list elements, training and testing datasets each.


ncol(index)


apply(index, MARGIN = 2, FUN = function(x, data) {
# is is "demoted" from a column to a vector
list(train = data[x, ], test = data[-x, ])
}, data = df)





Not clear why you are using a "data" argument. It's not a named formal in apply and you make no reference to it in your FUN argument. (I still think your code will succeed, although I would have used sapply.)
– 42-
Jun 30 at 15:22



sapply





thank you, the code works. However the result is one list with: sample 01 (train)(test),sample 02(train)(test),....sample2000 (train)(test). What I hope for are two lists: train01 (sample01),train02(sample02),...,train2000(sample2000). & test 01(sample01),test02(sample02),...,test2000(sample2000). Maybe I can do it by run train = apply(train.index, MARGIN = 1, FUN = function(x, data) { list(train = sample_N_new_env[x, ])}, data = sample_N_new_env) and [test = apply(train.index, MARGIN = 1, FUN = function(x, data) { list(test = sample_N_new_env[-x, ])}, data = sample_N_new_env)]
– Weiwei Gu
Jun 30 at 16:08


sample 01 (train)(test),sample 02(train)(test),....sample2000 (train)(test)


train01 (sample01),train02(sample02),...,train2000(sample2000)


test 01(sample01),test02(sample02),...,test2000(sample2000)


train = apply(train.index, MARGIN = 1, FUN = function(x, data) { list(train = sample_N_new_env[x, ])}, data = sample_N_new_env)


[test = apply(train.index, MARGIN = 1, FUN = function(x, data) { list(test = sample_N_new_env[-x, ])}, data = sample_N_new_env)]





@42- it's a typo, df --> data. I have corrected it.
– Roman Luštrik
Jul 1 at 11:32



The solution from akrun solves my problem.



by @Roman Luštrik codes:


listofsample = apply(index, MARGIN = 2, FUN = function(x, data) {
list(train = df[x, ], test = df[-x, ])
}, data = df)



following code from akrun:


train = sapply(listofsample, `[`,1)
test = sapply(listofsample, `[`,2)



it produce the two lists that I wanted.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

58gv01eJH,NdA3vXoO7jw8HTZu4LwFQm5,XXLe91FOXMdjA
l9,5Slf,IlYtJM hIHiqMukiYBZ,EO

Popular posts from this blog

Rothschild family

Boo (programming language)