How to manage Numpy arrays in Pandas DataFrames
How to manage Numpy arrays in Pandas DataFrames
Let's assume one has a DataFrame with some integers values and some arrays defined somehow:
df = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['rand_int'])
array_a = np.arange(5)
array_b = np.arange(7)
df['array_a'] = df['rand_int'].apply(lambda x: array_a[:x])
df['array_b'] = df['rand_int'].apply(lambda x: array_b[:x])
Some questions which can help me understand how to manage Numpy arrays with Pandas DataFrames:
array_diff
Not only that, but also define another column in
df
which is the np.setdiff1d
between the rows in array_a
and array_b
. Thank you– espogian
Jul 2 at 7:24
df
np.setdiff1d
array_a
array_b
1 Answer
1
I'd say it's better to work with NumPy and import data into the dataframe as a last step.
Anyway here's a solution that stores arrays into the dataframe step by step. Not really sure you actually want the outer product, it would be great if you could post the expected result.
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 1)), columns=['rand_int'])
>>> df
rand_int
0 51
1 92
2 14
3 71
4 60
df['a'] = np.split(np.outer(df['rand_int'], np.arange(5)), 5)
df['b'] = np.split(np.outer(df['rand_int'], np.arange(7)), 5)
>>> df
rand_int a b
0 51 [[0, 51, 102, 153, 204]] [[0, 51, 102, 153, 204, 255, 306]]
1 92 [[0, 92, 184, 276, 368]] [[0, 92, 184, 276, 368, 460, 552]]
2 14 [[0, 14, 28, 42, 56]] [[0, 14, 28, 42, 56, 70, 84]]
3 71 [[0, 71, 142, 213, 284]] [[0, 71, 142, 213, 284, 355, 426]]
4 60 [[0, 60, 120, 180, 240]] [[0, 60, 120, 180, 240, 300, 360]]
df['d'] = df.b.combine(df.a, func=np.setdiff1d)
>>> df['d']
0 [255, 306]
1 [460, 552]
2 [70, 84]
3 [355, 426]
4 [300, 360]
Name: d, dtype: object
Note that np.split
leaves an extra dimension, not sure if this can be avoided. You might want to remove it with np.squeeze
np.split
np.squeeze
>>> df['a'].apply(np.squeeze)
0 [0, 51, 102, 153, 204]
1 [0, 92, 184, 276, 368]
2 [0, 14, 28, 42, 56]
3 [0, 71, 142, 213, 284]
4 [0, 60, 120, 180, 240]
Name: a, dtype: object
Really helpful thanks! This addresses my need. Unfortunately I did not get the notification, apologize for the late reply.
– espogian
Jul 5 at 9:37
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
So you want to multiply both array a and array b by the corresponding value in rand_int?
– user3483203
Jul 2 at 7:20