Table Of Pairwise Frequency Counts In Python

March 20, 2024 Post a Comment

I'm completely new to python and most of my work has been done in R. I would like to know how to get this question work in python. Please refer to the link for clear understanding

Solution 1:

This can be set up using a dictionary set up and use collections and Counter to do the analysis. However, I will show an analysis using the simplest dictionary and loop methods. Of course the actual code can be made smaller, I am deliberately showing the expanded version. My Python does not have Pandas available, so I am using the most basic Python.

# Assume the you have a set of tuples lstlst.sort()# sort the list by idmydict= {}
id=Nonetags= []
for ids in lst:ifids[0]==id# Pick up the current entrytags.append(ids[1])else:# This is a new id# check the count of the previous tags.for elem1 in tags:for elem2 in tags:ifelem1!=elem2:if elem1 not in mydict:mydict[elem1]= {}
          ifelem2notinmydict[elem1]:mydict[elem1][elem2]=0mydict[elem1][elem2]+=1# This is a different id, reset the indicators for the next loopid=ids[0]tags=ids[1]# This is a new idelse:# The last element of the lst has to be processed as well# check the count of the previous tags.for elem1 in tags:for elem2 in tags:ifelem1!=elem2:if elem1 not in mydict:mydict[elem1]= {}
        ifelem2notinmydict[elem1]:mydict[elem1][elem2]=0mydict[elem1][elem2]+=1# at this point, my dict has the full dictionary countfortaginmydict.keys():printtag,mydict[tag]

This now gives the tags with the counts and you can format your output by looping over the final dictionary, printing the keys and counts appropriately.

Solution 2:

Here is one way of doing this in Pandas, which uses DataFrames similar to R. I am assuming you have a DataFrame df containing your data. (You can read the data from file using pandas.read_table. see thid: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_table.html).

First, use groupby to group the columns by id.

gps=df.groupby("id")printgps.groupsOut: {5: [0, 1], 6: [2, 3], 7: [4, 5, 6], 8: [7], 9: [8], 10: [9]}

groups gives the row numbers that belong to same id.

Next, you create your target matrix having row and column names as unique values in your featureCode.

unqFet = list(set(df["featureCode"]))
final = pandas.DataFrame(columns=unqFet, index=unqFet)
final = final.fillna(0)
print final
Out: 
            PCLI PPLC PPL
     PCLI    000
     PPLC    000
     PPL     000

Finally, loop over your groups and increment correct values in your final matrix.

for g in gps.groups.values():
    for i in range(len(g)):
       for j in range(len(g)):
          if i != j:
              final[ df["featureCode"][g[i]] ][ df["featureCode"][g[j]] ] += 1print final
Out:
          PCLI PPLC PPL
   PCLI    031
   PPLC    301
   PPL     110

Getting Started with Python

Table Of Pairwise Frequency Counts In Python

Solution 1:

Solution 2:

Post a Comment for "Table Of Pairwise Frequency Counts In Python"