Find The Median From A Csv File Using Python
I have a CSV file named 'salaries.csv' The content of the files is as follows: City,Job,Salary Delhi,Doctors,500 Delhi,Lawyers,400 Delhi,Plumbers,100 London,Doctors,800
Solution 1:
You can use defaultdict to put all the salaries for each profession then just get the median.
import csv
from collections import defaultdict
withopen("C:/Users/jimenez/Desktop/a.csv","r") as f:
d = defaultdict(list)
reader = csv.reader(f)
reader.next()
for row in reader:
d[row[1]].append(float(row[2]))
for k,v in d.iteritems():
print"{} median is {}".format(k,sorted(v)[len(v) // 2])
print"{} average is {}".format(k,sum(v)/len(v))
Outputs
Plumbers median is500.0
Plumbers average is475.0
Lawyers median is700.0
Lawyers average is628.571428571
Dog catchers median is400.0
Dog catchers average is400.0
Doctors median is800.0
Doctors average is787.5
Solution 2:
It is easy if you use pandas
(http://pandas.pydata.org):
import pandas as pd
df = pd.read_csv('test.csv', names=['City', 'Job', 'Salary'])
df.groupby('Job').median()
# Salary# Job # Doctors 800# Dog catchers 400# Lawyers 700# Plumbers 450
If you want the average instead of the median,
df.groupby('Job').mean()
# Salary# Job # Doctors 787.500000# Dog catchers 400.000000# Lawyers 628.571429# Plumbers 475.000000
Solution 3:
If your problem is computing he median, and not inserting everything in a SQL databas and scrambling it about, it is a matter of just reading all lines, group all salaries in a list, and get the median from there - this reduces your hundred-line-magnitude script to:
import csv
professions = {}
withopen("sal.csv") as data:
for city, profession, salary in csv.reader(data):
professions.setdefault(profession.strip(), []).append(int(salary.strip()))
for profession, salaries insorted(professions.items()):
print ("{}: {}".format(profession, sorted(salaries)[len(salaries//2)] ))
(give or take "1" to get the proper median from the sorted salaries)
Post a Comment for "Find The Median From A Csv File Using Python"