Skip to content Skip to sidebar Skip to footer
Showing posts with the label Pyspark Sql

How To Apply The Describe Function After Grouping A Pyspark Dataframe?

I want to find the cleanest way to apply the describe function to a grouped DataFrame (this questio… Read more How To Apply The Describe Function After Grouping A Pyspark Dataframe?

Identify Partition Key Column From A Table Using Pyspark

I need help to find the unique partitions column names for a Hive table using PySpark. The table mi… Read more Identify Partition Key Column From A Table Using Pyspark

Replace Column Values In Spark Dataframe Based On Dictionary Similar To Np.where

My data frame looks like - no city amount 1 Kenora 56% 2 … Read more Replace Column Values In Spark Dataframe Based On Dictionary Similar To Np.where

Pyspark, Compare Two Rows In Dataframe

I'm attempting to compare one row in a dataframe with the next to see the difference in timesta… Read more Pyspark, Compare Two Rows In Dataframe

Check If Two Pyspark Rows Are Equal

I am writing unit tests for a Spark job, and some of the outputs are named tuples: pyspark.sql.Row … Read more Check If Two Pyspark Rows Are Equal

How To Use Matplotlib To Plot Pyspark Sql Results

I am new to pyspark. I want to plot the result using matplotlib, but not sure which function to use… Read more How To Use Matplotlib To Plot Pyspark Sql Results

Convert A Pandas Dataframe To A PySpark Dataframe

I have a script with the below setup. I am using: 1) Spark dataframes to pull data in 2) Converting… Read more Convert A Pandas Dataframe To A PySpark Dataframe

Spark Request Max Count

I'm a beginner on spark and I try to make a request allow me to retrieve the most visited web p… Read more Spark Request Max Count