Under the hood SPARK Dataframe optimization

Multi tool use
Multi tool use


Under the hood SPARK Dataframe optimization



Leaving aside the database connection aspects that get discussed with mapPartitions for RDDs, and noting that for me the Dataframe under the hood is harder to follow than the RDD abstraction:





I didn't understand the first part. The answer that you are looking for 2nd question is partially available in this link stackoverflow.com/a/29012187/3213772
– puru
Jul 2 at 8:59





You mean bullet 1: if so: mapPartitions is seen as a performance booster for RDDs. If DF's are so good, then how does the underhood work to equate to better than performance of an RDD using mapPartitions?
– thebluephantom
Jul 2 at 9:02





@puru But that link is about RDDs. I get that. I am wondering what it means for when a Data Frame is loaded. All this under the hood optimization - not clear how default partitioning applies to a DF.
– thebluephantom
Jul 2 at 9:51





have edited the question and left this out
– thebluephantom
Jul 2 at 9:57




1 Answer
1



From Spark 2.0 onwards the Dataframe is a Dataset organized into named columns. To answer your question, there is no need for Dataframes to be converted back to RDDs to achieve performance and optimization, because, Datasets and Dataframes themselves are very efficient compared to primitive RDDs due to below reasons.





I have read those things but find them a bit thin. I am wondering if it is true, as I have seen some posts otherwise. That said, there are so many variable in a multi-use system, so I am going to for the moment assume this is true. Not everything is relational.
– thebluephantom
Jul 3 at 8:33






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

8DKdH,7x4f JhmByfj2jZ69QxCLHF,gO43W
rV1CvLf niesm24q1RRb i cU9,pcwD,jaoPhtg0klWZj44JcEDIDucrh4

Popular posts from this blog

Rothschild family

Boo (programming language)