Skip to content Skip to sidebar Skip to footer

Splitting A Column In Pyspark

I am trying to split a dataframe in pyspark This is the data i have df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value']) df = df.withColumn('Sp

Solution 1:

You forgot the escape character, you should include escape character as

df = df.withColumn('Splitted', split(df['Value'], '\|')[0])

If you want output as

+---+-----+--------+
|Key|Value|Splitted|
+---+-----+--------+
|1  |10   |Foo     |
|2  |11   |Bar     |
|3  |12   |Car     |
+---+-----+--------+

You should do

from pyspark.sql import functions as F
df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0])

Post a Comment for "Splitting A Column In Pyspark"