Skip to content Skip to sidebar Skip to footer

1 Column Has An Int. Another Has A List Of Ints. How To Convert Dataframe Into A Numpy Rec Array Of These Pairs?

This is a follow up to this question Best data type (in terms of speed/RAM) for millions of pairs of a single int paired with a batch (2 to 100) of ints Which asks what's the best

Solution 1:

In [106]: d = pd.DataFrame([[1, [1,2,3]],[2,[3,4]], [3, [0,3,2,4]]],columns=['x','y'])                       
In [107]: d                                                                                                  
Out[107]: 
   x             y
01     [1, 2, 3]
12        [3, 4]
23  [0, 3, 2, 4]

Using pandas own method:

In [108]: d.to_records()                                                                                     
Out[108]: 
rec.array([(0, 1, list([1, 2, 3])), (1, 2, list([3, 4])),
           (2, 3, list([0, 3, 2, 4]))],
          dtype=[('index', '<i8'), ('x', '<i8'), ('y', 'O')])

and without the index:

In [110]: d.to_records(index=False)                                                                          
Out[110]: 
rec.array([(1, list([1, 2, 3])), (2, list([3, 4])),
           (3, list([0, 3, 2, 4]))],
          dtype=[('x', '<i8'), ('y', 'O')])
In [111]: _['y']                                                                                             
Out[111]: array([list([1, 2, 3]), list([3, 4]), list([0, 3, 2, 4])], dtype=object)

Solution 2:

Data:

data = np.rec.array([( 2955637, np.array([ 2557706,  7612432,  9348232,   462772,  8018521,  1811275,
        9230331,  7023852,  9392270,  4693741,  7854644,  5233547,
       12446986,  9534800,  2133753,  5971332,  2156690, 12031365,
        4433539, 11607217,  3461811,  5361706, 11282946, 14548809,
        8109194,  1199299,  7576507, 12035216,  6635766,  4158077,
        5403991,   212711,  1703853,  2094248,  7005438,   951244,
        6314059, 11616582, 13002385,   761714, 14016603, 14981654,
        8946411, 10050035,   658239,  1693614], dtype=np.int32)),
           (  822302, np.array([ 2579065, 14360524,  4489101, 14753709,  7440511,  2202626,
         504487,  8539709,  6309347,  9028007,  4103133,  6899943,
        9391766,  1104058, 10155666,  2845288, 10488737,  1728141,
        3976034, 13648527,  6125367, 14690826,  7387347,  7766092,
        8717468,  4088448,  2051190,  7914318, 14346922, 13792566,
       10343601], dtype=np.int32))])

DataFrame:

df = pd.DataFrame(data)

enter image description here

To np.rec.array:

d2 = list(zip(df.f0.tolist(), df.f1.tolist()))
d2 = np.rec.array(d2)

Final:

enter image description here

print(type(d2))
>>> <class'numpy.recarray'>

Post a Comment for "1 Column Has An Int. Another Has A List Of Ints. How To Convert Dataframe Into A Numpy Rec Array Of These Pairs?"