1 Column Has An Int. Another Has A List Of Ints. How To Convert Dataframe Into A Numpy Rec Array Of These Pairs?
This is a follow up to this question Best data type (in terms of speed/RAM) for millions of pairs of a single int paired with a batch (2 to 100) of ints Which asks what's the best
Solution 1:
In [106]: d = pd.DataFrame([[1, [1,2,3]],[2,[3,4]], [3, [0,3,2,4]]],columns=['x','y'])
In [107]: d
Out[107]:
x y
01 [1, 2, 3]
12 [3, 4]
23 [0, 3, 2, 4]
Using pandas own method:
In [108]: d.to_records()
Out[108]:
rec.array([(0, 1, list([1, 2, 3])), (1, 2, list([3, 4])),
(2, 3, list([0, 3, 2, 4]))],
dtype=[('index', '<i8'), ('x', '<i8'), ('y', 'O')])
and without the index:
In [110]: d.to_records(index=False)
Out[110]:
rec.array([(1, list([1, 2, 3])), (2, list([3, 4])),
(3, list([0, 3, 2, 4]))],
dtype=[('x', '<i8'), ('y', 'O')])
In [111]: _['y']
Out[111]: array([list([1, 2, 3]), list([3, 4]), list([0, 3, 2, 4])], dtype=object)
Solution 2:
Data:
data = np.rec.array([( 2955637, np.array([ 2557706, 7612432, 9348232, 462772, 8018521, 1811275,
9230331, 7023852, 9392270, 4693741, 7854644, 5233547,
12446986, 9534800, 2133753, 5971332, 2156690, 12031365,
4433539, 11607217, 3461811, 5361706, 11282946, 14548809,
8109194, 1199299, 7576507, 12035216, 6635766, 4158077,
5403991, 212711, 1703853, 2094248, 7005438, 951244,
6314059, 11616582, 13002385, 761714, 14016603, 14981654,
8946411, 10050035, 658239, 1693614], dtype=np.int32)),
( 822302, np.array([ 2579065, 14360524, 4489101, 14753709, 7440511, 2202626,
504487, 8539709, 6309347, 9028007, 4103133, 6899943,
9391766, 1104058, 10155666, 2845288, 10488737, 1728141,
3976034, 13648527, 6125367, 14690826, 7387347, 7766092,
8717468, 4088448, 2051190, 7914318, 14346922, 13792566,
10343601], dtype=np.int32))])
DataFrame:
df = pd.DataFrame(data)
To np.rec.array:
d2 = list(zip(df.f0.tolist(), df.f1.tolist()))
d2 = np.rec.array(d2)
Final:
print(type(d2))
>>> <class'numpy.recarray'>
Post a Comment for "1 Column Has An Int. Another Has A List Of Ints. How To Convert Dataframe Into A Numpy Rec Array Of These Pairs?"