Splitting A Column In Pyspark
I am trying to split a dataframe in pyspark This is the data i have df = sc.parallelize([[1, 'Foo|10'], [2, 'Bar|11'], [3,'Car|12']]).toDF(['Key', 'Value']) df = df.withColumn('Sp
Solution 1:
You forgot the escape
character, you should include escape character as
df = df.withColumn('Splitted', split(df['Value'], '\|')[0])
If you want output as
+---+-----+--------+
|Key|Value|Splitted|
+---+-----+--------+
|1 |10 |Foo |
|2 |11 |Bar |
|3 |12 |Car |
+---+-----+--------+
You should do
from pyspark.sql import functions as F
df = df.withColumn('Splitted', F.split(df['Value'], '\|')).withColumn('Value', F.col('Splitted')[1]).withColumn('Splitted', F.col('Splitted')[0])
Post a Comment for "Splitting A Column In Pyspark"