Spark's Column.isin function does not take List
Spark's Column.isin function does not take List
I am trying to filter out rows from my Spark Dataframe.
val sequence = Seq(1,2,3,4,5)
df.filter(df("column").isin(sequence))
Unfortunately, I get an unsupported literal type error
java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.$colon$colon List(1,2,3,4,5)
according to the documentation it takes a scala.collection.Seq list
I guess I don't want a literal? Then what can I take in, some sort of wrapper class?
2 Answers
2
@JustinPihony's answer is correct but it's incomplete. The isin
function takes a repeated parameter for argument, so you'll need to pass it as so :
isin
scala> val df = sc.parallelize(Seq(1,2,3,4,5,6,7,8,9)).toDF("column")
// df: org.apache.spark.sql.DataFrame = [column: int]
scala> val sequence = Seq(1,2,3,4,5)
// sequence: Seq[Int] = List(1, 2, 3, 4, 5)
scala> val result = df.filter(df("column").isin(sequence : _*))
// result: org.apache.spark.sql.DataFrame = [column: int]
scala> result.show
// +------+
// |column|
// +------+
// | 1|
// | 2|
// | 3|
// | 4|
// | 5|
// +------+
It's all in the Scala Language Specification. :)
– eliasah
Apr 12 '16 at 12:54
This is happening because the underlying Scala implementation uses varargs
, so the documentation in Java is not quite correct. It is using the @varargs
annotation, so you can just pass in an array.
varargs
@varargs
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
this also helped me understand what was going on stackoverflow.com/questions/6051302/…
– Jake Fund
Apr 12 '16 at 12:37