Skip to content Skip to sidebar Skip to footer

Keep Elements With Pattern In Pandas Series Without Converting Them To List

I have the following dataframe: df = pd.DataFrame(['Air type:1, Space kind:2, water', 'something, Space blu:3, somethingelse'], columns = ['A']) and I want to create a new column

Solution 1:

You can use pd.Series.str.findall here.

df['new'] = df['A'].str.findall('\w+:\w+')

                                 A               new
0            type:1, kind:2, water  [type:1, kind:2]
1  something, blu:3, somethingelse           [blu:3]

EDIT:

When there are multiple words then try

df['new'] = df['A'].str.findall('[^\s,][^:,]+:[^:,]+').str.join(', ')

                                      A                       new
0        Air type:1, Space kind:2, water  Air type:1, Space kind:2
1  something, Space blu:3, somethingelse               Space blu:3

Solution 2:

You can use findall with join:

import pandas as pd
df = pd.DataFrame(["type:1, kind:2, water", "something, blu:3, somethingelse"], columns = ['A'])
df['new'] = df['A'].str.findall(r'[^\s:,]+:[^\s,]+').str.join(', ')
df['new']
# => 0    type:1, kind:2# => 1             blu:3

The regex matches

  • [^\s:,]+ - one or more chars other than whitespace, : and ,
  • : - a colon
  • [^\s,]+ - one or more chars other than whitespace and ,.

See the regex demo.

The .str.join(', ') concats all the found matches with ,+space.

Post a Comment for "Keep Elements With Pattern In Pandas Series Without Converting Them To List"