Skip to content Skip to sidebar Skip to footer

Table Of Pairwise Frequency Counts In Python

I'm completely new to python and most of my work has been done in R. I would like to know how to get this question work in python. Please refer to the link for clear understanding

Solution 1:

This can be set up using a dictionary set up and use collections and Counter to do the analysis. However, I will show an analysis using the simplest dictionary and loop methods. Of course the actual code can be made smaller, I am deliberately showing the expanded version. My Python does not have Pandas available, so I am using the most basic Python.

# Assume the you have a set of tuples lstlst.sort()# sort the list by idmydict= {}
id=Nonetags= []
for ids in lst:ifids[0]==id# Pick up the current entrytags.append(ids[1])else:# This is a new id# check the count of the previous tags.for elem1 in tags:for elem2 in tags:ifelem1!=elem2:if elem1 not in mydict:mydict[elem1]= {}
          ifelem2notinmydict[elem1]:mydict[elem1][elem2]=0mydict[elem1][elem2]+=1# This is a different id, reset the indicators for the next loopid=ids[0]tags=ids[1]# This is a new idelse:# The last element of the lst has to be processed as well# check the count of the previous tags.for elem1 in tags:for elem2 in tags:ifelem1!=elem2:if elem1 not in mydict:mydict[elem1]= {}
        ifelem2notinmydict[elem1]:mydict[elem1][elem2]=0mydict[elem1][elem2]+=1# at this point, my dict has the full dictionary countfortaginmydict.keys():printtag,mydict[tag]

This now gives the tags with the counts and you can format your output by looping over the final dictionary, printing the keys and counts appropriately.

Solution 2:

Here is one way of doing this in Pandas, which uses DataFrames similar to R. I am assuming you have a DataFrame df containing your data. (You can read the data from file using pandas.read_table. see thid: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_table.html).

First, use groupby to group the columns by id.

gps=df.groupby("id")printgps.groupsOut: {5: [0, 1], 6: [2, 3], 7: [4, 5, 6], 8: [7], 9: [8], 10: [9]}

groups gives the row numbers that belong to same id.

Next, you create your target matrix having row and column names as unique values in your featureCode.

unqFet = list(set(df["featureCode"]))
final = pandas.DataFrame(columns=unqFet, index=unqFet)
final = final.fillna(0)
print final
Out: 
            PCLI PPLC PPL
     PCLI    000
     PPLC    000
     PPL     000

Finally, loop over your groups and increment correct values in your final matrix.

for g in gps.groups.values():
    for i in range(len(g)):
       for j in range(len(g)):
          if i != j:
              final[ df["featureCode"][g[i]] ][ df["featureCode"][g[j]] ] += 1print final
Out:
          PCLI PPLC PPL
   PCLI    031
   PPLC    301
   PPL     110

Post a Comment for "Table Of Pairwise Frequency Counts In Python"