Skip to content Skip to sidebar Skip to footer

How To Test String Contains Elements In List And Assign The Target Element To Another Column Via Pandas

I have a one column list presenting some company names. Some of those names contain the country names (e.g., 'China' in 'China A1', 'Finland' in 'C1 in Finland'). I want to extract

Solution 1:

Here's one way using str.extract:

df['Country'] = df['Company name'].str.extract('('+'|'.join(country_list)+')')

       Company name    Country
0          China A1      China
1      Australia-A2  Australia
2        Belgium_C1    Belgium
3   C1  in  Finland    Finland
4    D1  of  Greece     Greece
5  E2  for Pakistan   Pakistan

Solution 2:

You need series.str.extract() here:

pat = r'({})'.format('|'.join(country_list))
# pat-->'(China|America|Greece|Pakistan|Finland|Belgium|Japan|British|Australia)'
df['Country']=df['Company name'].str.extract(pat, expand=False)

Solution 3:

Maybe using findall in case you have more than one country name in one cell

df["Company name"].str.findall('|'.join(country_list)).str[0]
Out[758]: 
0        China
1    Australia
2      Belgium
3      Finland
4       Greece
5     Pakistan
Name: Company name, dtype: object

Solution 4:

Using str.extract with Regex

Ex:

import pandas as pd
country_list = ['China','America','Greece','Pakistan','Finland','Belgium','Japan','British','Australia']

df = pd.read_csv(filename)
df["Country"] = df["Company_name"].str.extract("("+"|".join(country_list)+ ")")
print(df)

Output:

           Company_name    Country
0      China A1              China
1  Australia-A2          Australia
2      Belgium_C1          Belgium
3       C1  in  Finland    Finland
4        D1  of  Greece     Greece
5      E2  for Pakistan   Pakistan

Post a Comment for "How To Test String Contains Elements In List And Assign The Target Element To Another Column Via Pandas"