Skip to content Skip to sidebar Skip to footer

Replacing Newlines With Spaces For Str Columns Through Pandas Dataframe

Given an example dataframe with the 2nd and 3rd columns of free text, e.g. >>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\

Solution 1:

Use replace - first first and last strip and then replace \n:

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0123012       abc  foo bar
131defhaha  love it

Solution 2:

You can select_dtypes to select columns of type object and use applymap on those columns.

Because there is no inplace argument for these functions, this would be a workaround to make change to the dataframe:

strs = lol.select_dtypes(include=['object']).applymap(lambdax: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3#0  1  2       abc  foo bar#1  3  1  def haha  love it

Solution 3:

Adding to the other nice answers, this is a vectorized version of your initial idea:

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       forcolin columns] 

Details:

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0123012      abc  def haha
131  foo bar   love it

Solution 4:

You may use the following two regex replace approach:

>>>df.replace({ r'\A\s+|\s+\Z': '', '\n' : ' '}, regex=True, inplace=True)>>>df
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
>>>

Details

  • '\A\s+|\s+\Z' -> '' will act like strip() removing all leading and trailing whitespace:
    • \A\s+ - matches 1 or more whitespace symbols at the start of the string
    • | - or
    • \s+\Z - matches 1 or more whitespace symbols at the end of the string
  • '\n' -> ' ' will replace any newline with a space.

Post a Comment for "Replacing Newlines With Spaces For Str Columns Through Pandas Dataframe"