Creating A Matrix From Pandas Dataframe To Display Connectedness - 2
This is a follow-up question to Creating a matrix from Pandas dataframe to display connectedness. The difference is in the matrix. I have my data in this format in a pandas datafra
Solution 1:
Just for completeness, here's the modified version of my previous answer. Basically, you add a condition when updating the matrix: if edge > node:
import pandas as pd
#I'm assuming you can get your data into a pandas data frame:
data = {'Customer_ID':[1,1,1,2,2,2],'Location':['A','B','C','A','B','D']}
df = pd.DataFrame(data)
#Initialize an empty matrix
matrix_size = len(df.groupby('Location'))
matrix = [[0for col inrange(matrix_size)] for row inrange(matrix_size)]
#To make life easier, I made a map to go from locations #to row/col positions in the matrix
location_set = list(set(df['Location'].tolist()))
location_set.sort()
location_map = dict(zip(location_set,range(len(location_set))))
#Group data by customer, and create an adjacency list (dyct) for each#Update the matrix accordinglyfor name,group in df.groupby('Customer_ID'):
locations = set(group['Location'].tolist())
dyct = {}
for i in locations:
dyct[i] = list(locations.difference(i))
#Loop through the adjacency list and update matrixfor node, edges in dyct.items():
for edge in edges:
#Add this condition to create bottom half of the symmetric matrixif edge > node:
matrix[location_map[edge]][location_map[node]] +=1
Solution 2:
The change is 2 characters in this line:
overlaps+= [(l2, l1, 0) for l1, l2, _ inoverlaps]
from
overlaps +=[(l2, l1,c)for l1, l2,cin overlaps]
The purpose of this line in the first version was to populate symmetric values. If you want to have a lower diagonal matrix, simply fill the respective keys with zeros.
Original code:
import pandas as pd
from collections import Counter
from itertools import product
df = pd.DataFrame({
'Customer_ID': ['Alpha', 'Alpha', 'Alpha', 'Beta', 'Beta', 'Beta'],
'Location_ID': ['A', 'B', 'C', 'A', 'B', 'D'],
})
ctrs = {location: Counter(gp.Customer_ID) for location, gp in df.groupby('Location_ID')}
# In [7]: q.ctrs# Out[7]:# {'A': Counter({'Alpha': 1, 'Beta': 1}),# 'B': Counter({'Alpha': 1, 'Beta': 1}),# 'C': Counter({'Alpha': 1})}
ctrs = list(ctrs.items())
overlaps = [(loc1, loc2, sum(min(ctr1[k], ctr2[k]) for k in ctr1))
for i, (loc1, ctr1) inenumerate(ctrs, start=1)
for (loc2, ctr2) in ctrs[i:] if loc1 != loc2]
overlaps += [(l2, l1, 0) for l1, l2, _ in overlaps]
df2 = pd.DataFrame(overlaps, columns=['Loc1', 'Loc2', 'Count'])
df2 = df2.set_index(['Loc1', 'Loc2'])
df2 = df2.unstack().fillna(0).astype(int)
# Count # Loc2 A B C D# Loc1 # A 0 0 0 0# B 2 0 0 0# C 1 1 0 0# D 1 1 0 0
Post a Comment for "Creating A Matrix From Pandas Dataframe To Display Connectedness - 2"