Error in running randomForest : object not found

Multi tool use
Error in running randomForest : object not found
So i am trying to fit a random forest classifier for my dataset. I am very new to R and i imagine this is a simple formatting issue.
I read in a text file and transform my dataset so it is of this format: (taking out confidential info)
>head(df.train,2)
GOLGA8A ITPR3 GPR174 SNORA63 GIMAP8 LEF1 PDE4B LOC100507043 TGFB1I1 SPINT1
Sample1 3.726046 3.4013711 3.794364 4.265287 -1.514573 7.725775 2.162616 -1.514573 -1.5145732 -1.514573
Sample2 4.262779 0.9261892 4.744096 7.276971 -1.514573 4.694769 4.707387 2.031476 -0.8325444 2.615991
...
...
CD8B FECH PYCR1 MGC12916 KCNA3 resp
Sample1 -1.514573 2.099336 3.427928 1.542951 -1.514573 1
Sample2 -1.145806 1.204241 2.846832 1.523808 1.616791 1
In essence the columns are my features and the rows my samples, the last column is my response vector which is a column of factors, resp.
Then i use:
set.seed(1) #Set the seed in order to gain reproducibility
RF1 = randomForest(resp~., data=df.train,ntree=1000,importance=T,mtry=3)
Simply trying to train the RF on my column resp
using the other columns as features.
resp
But I obtain the error:
Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found
However, looking into my training set I can clearly find that column, e.g with:
sort(unique(colnames(df.train))
So I don't really understand the error or where to go from here. My apologies if I haven't posed the question in the correct way, thanks for any and all help!
df.train
2 Answers
2
I would suspect this comes from having an illegal variable name in your data frame. Let's consider a data frame that just has a response variable resp
and a variable (illegally) named PCNA-AS1
:
resp
PCNA-AS1
(dat <- structure(list(`PCNA-AS1` = c(1, 2, 3), resp = structure(c(2L, 2L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("PCNA-AS1", "resp"), row.names = c(NA, -3L), class = "data.frame"))
# PCNA-AS1 resp
# 1 1 1
# 2 2 1
# 3 3 0
Now when we train a random forest we get the indicated error:
library(randomForest)
mod <- randomForest(resp~., data=dat)
# Error in eval(expr, envir, enclos) : object 'PCNA-AS1' not found
A natural solution to this problem would be converting your variable names to all be legal:
names(dat) <- make.names(names(dat))
dat
# PCNA.AS1 resp
# 1 1 1
# 2 2 1
# 3 3 0
mod <- randomForest(resp~., data=dat)
Now the model trains with no error.
Thanks for your comment Josilber, i tried converting to legal names but that wasn't the problem. The error was actually i gave randomForest a matrix (rather than a data frame) which i assumed didn't matter and that randomForest could easily convert between the two. But i was mistaken, so i solved the issue now.
– AHawks
Jan 29 '16 at 3:20
@AHawks OK, then all the more reason for you to edit your question to make it reproducible! (aka including the code and data needed to replicate the issue). Try cutting down the columns in your data frame to the smallest number where you can reproduce the issue, and then post that dataset (if you haven't already figured out what's going on first).
– josliber♦
Jan 29 '16 at 3:21
Yes, you are definitely correct, that would have been better and i will do that for future questions, just getting used to presenting problems here on stack overflow so thanks for your advice!
– AHawks
Jan 31 '16 at 22:02
So in short,
It was a very rookie mistake, i was inputting a matrix rather than a data.frame which was causing this error. Why it was complaining about that particular column (which was not the first) compared to another i still don't understand.
Thanks for all the help.
Cheers,
Anthony
when creating/casting data.frame, check.names=TRUE. So inputting a data.frame could have fixed the problems as illegal col.names would have been edited. In general randomForest gives much fewer problems with data.frame than matrix
– Soren Havelund Welling
Jan 29 '16 at 12:29
This is not an answer but rather a comment to the real answer, which should be marked as accepted.
– Cath
Jul 2 at 7:49
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Could you make this a reproducible example (aka provide sample data for
df.train
that causes the error)?– josliber♦
Jan 29 '16 at 1:04