How To Remove Extra Commas From Data In Python
I have a CSV file through which I am trying to load data into my SQL table containing 2 columns. I have 2 columns and the data is separated by commas, which identify the next field
Solution 1:
If all your data will be in the same format, as Number Address "000" , "000 abc street, Unit 000"
, you can split the list, remove the comma, and put the list back together, making it a string again. For example using the data you gave:
ori_addr ="Number Address \"12345\" , \"123 abc street, Unit 345\""
addr = ori_addr.split()
addr[6] = addr[6].replace(",", "")
together_addr =" ".join(addr)
together_addr is equal to "Number Address "12345" , "123 abc street Unit 345" note that there is no comma between "street" and "Unit."
Solution 2:
Edits:
- Following user's comments, added a failing address to this test. This address loads to the database without issue.
- Added code to store CSV addresses into MySQL.
Answer:
The code below performs the following actions:
- MySQL database
engine
(connection) created. - Address data (number, address) read from CSV file.
- Non-field separating commas replaced from source data, and extra whitespace removed.
- Edited data fed into a
DataFrame
DataFrame
used to store data into MySQL.
import csv
import pandas as pd
from sqlalchemy import create_engine
# Set database credentials.
creds = {'usr': 'admin',
'pwd': '1tsaSecr3t',
'hst': '127.0.0.1',
'prt': 3306,
'dbn': 'playground'}
# MySQL conection string.
connstr = 'mysql+mysqlconnector://{usr}:{pwd}@{hst}:{prt}/{dbn}'# Create sqlalchemy engine for MySQL connection.
engine = create_engine(connstr.format(**creds))
# Read addresses from mCSV file.
text = list(csv.reader(open('comma_test.csv'), skipinitialspace=True))
# Replace all commas which are not used as field separators.# Remove additional whitespace.for idx, row inenumerate(text):
text[idx] = [i.strip().replace(',', '') for i in row]
# Store data into a DataFrame.
df = pd.DataFrame(data=text, columns=['number', 'address'])
# Write DataFrame to MySQL using the engine (connection) created above.
df.to_sql(name='commatest', con=engine, if_exists='append', index=False)
Source File (comma_test.csv
):
"12345" , "123 abc street, Unit 345""10101" , "111 abc street, Unit 111""20202" , "222 abc street, Unit 222""30303" , "333 abc street, Unit 333""40404" , "444 abc street, Unit 444""50505" , "abc DR, UNIT# 123 UNIT 123"
Unedited Data:
['12345 ', '123 abc street, Unit 345']['10101 ', '111 abc street, Unit 111']['20202 ', '222 abc street, Unit 222']['30303 ', '333 abc street, Unit 333']['40404 ', '444 abc street, Unit 444']['50505 ', 'abc DR, UNIT# 123 UNIT 123']
Edited Data:
['12345', '123 abc street Unit 345']['10101', '111 abc street Unit 111']['20202', '222 abc street Unit 222']['30303', '333 abc street Unit 333']['40404', '444 abc street Unit 444']['50505', 'abc DR UNIT# 123 UNIT 123']
Queried from MySQL:
number address
12345123 abc street Unit34510101111 abc street Unit11120202222 abc street Unit22230303333 abc street Unit33340404444 abc street Unit44450505 abc DR UNIT# 123 UNIT 123
Acknowledgement:
This is a long-winded approach. However, each step has been broken down intentionally to clearly show the steps involved.
Post a Comment for "How To Remove Extra Commas From Data In Python"