How To Test String Contains Elements In List And Assign The Target Element To Another Column Via Pandas
I have a one column list presenting some company names. Some of those names contain the country names (e.g., 'China' in 'China A1', 'Finland' in 'C1 in Finland'). I want to extract
Solution 1:
Here's one way using str.extract
:
df['Country'] = df['Company name'].str.extract('('+'|'.join(country_list)+')')
Company name Country
0 China A1 China
1 Australia-A2 Australia
2 Belgium_C1 Belgium
3 C1 in Finland Finland
4 D1 of Greece Greece
5 E2 for Pakistan Pakistan
Solution 2:
You need series.str.extract()
here:
pat = r'({})'.format('|'.join(country_list))
# pat-->'(China|America|Greece|Pakistan|Finland|Belgium|Japan|British|Australia)'
df['Country']=df['Company name'].str.extract(pat, expand=False)
Solution 3:
Maybe using findall
in case you have more than one country name in one cell
df["Company name"].str.findall('|'.join(country_list)).str[0]
Out[758]:
0 China
1 Australia
2 Belgium
3 Finland
4 Greece
5 Pakistan
Name: Company name, dtype: object
Solution 4:
Using str.extract
with Regex
Ex:
import pandas as pd
country_list = ['China','America','Greece','Pakistan','Finland','Belgium','Japan','British','Australia']
df = pd.read_csv(filename)
df["Country"] = df["Company_name"].str.extract("("+"|".join(country_list)+ ")")
print(df)
Output:
Company_name Country
0 China A1 China
1 Australia-A2 Australia
2 Belgium_C1 Belgium
3 C1 in Finland Finland
4 D1 of Greece Greece
5 E2 for Pakistan Pakistan
Post a Comment for "How To Test String Contains Elements In List And Assign The Target Element To Another Column Via Pandas"