Skip to content Skip to sidebar Skip to footer

Find The Median From A Csv File Using Python

I have a CSV file named 'salaries.csv' The content of the files is as follows: City,Job,Salary Delhi,Doctors,500 Delhi,Lawyers,400 Delhi,Plumbers,100 London,Doctors,800

Solution 1:

You can use defaultdict to put all the salaries for each profession then just get the median.

import csv
from collections import defaultdict

withopen("C:/Users/jimenez/Desktop/a.csv","r") as f:
    d = defaultdict(list)
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        d[row[1]].append(float(row[2]))   

for k,v in d.iteritems():
    print"{} median is {}".format(k,sorted(v)[len(v) // 2])
    print"{} average is {}".format(k,sum(v)/len(v))

Outputs

Plumbers median is500.0
Plumbers average is475.0
Lawyers median is700.0
Lawyers average is628.571428571
Dog catchers median is400.0
Dog catchers average is400.0
Doctors median is800.0
Doctors average is787.5

Solution 2:

It is easy if you use pandas (http://pandas.pydata.org):

import pandas as pd
df = pd.read_csv('test.csv', names=['City', 'Job', 'Salary'])
df.groupby('Job').median()

#               Salary# Job                 # Doctors          800# Dog catchers     400# Lawyers          700# Plumbers         450

If you want the average instead of the median,

df.groupby('Job').mean()

#                   Salary# Job                     # Doctors       787.500000# Dog catchers  400.000000# Lawyers       628.571429# Plumbers      475.000000

Solution 3:

If your problem is computing he median, and not inserting everything in a SQL databas and scrambling it about, it is a matter of just reading all lines, group all salaries in a list, and get the median from there - this reduces your hundred-line-magnitude script to:

import csv
professions = {}

withopen("sal.csv") as data:
    for city, profession, salary in csv.reader(data):
        professions.setdefault(profession.strip(), []).append(int(salary.strip()))

for profession, salaries insorted(professions.items()):
    print ("{}: {}".format(profession, sorted(salaries)[len(salaries//2)] ))

(give or take "1" to get the proper median from the sorted salaries)

Post a Comment for "Find The Median From A Csv File Using Python"