How to split column data and create new DataFrame with multiple columns

I'd like to split the data in the following DataFrame

df = pd.DataFrame(data={'per': np.repeat([10,20,30], 32), 'r':12*range(8), 'cnt': np.random.randint(300, 400, 96)}); df cnt per r 0 355 10 0 1 359 10 1 2 347 10 2 3 390 10 3 4 304 10 4 5 306 10 5 .. ... ... .. 87 357 30 7 88 371 30 0 89 396 30 1 90 357 30 2 91 353 30 3 92 306 30 4 93 301 30 5 94 329 30 6 95 312 30 7 [96 rows x 3 columns]

such that for each r value a new column cnt_r{r} exist in a DataFrame but also keeping the corresponding per column.

r

cnt_r{r}

per

The following piece of code almost does what I want except that it looses the per column:

per

pd.DataFrame({'cnt_r{}'.format(i): df[df.r==i].reset_index()['cnt'] for i in range(8)}) cnt_r0 cnt_r1 cnt_r2 cnt_r3 cnt_r4 cnt_r5 cnt_r6 cnt_r7 0 355 359 347 390 304 306 366 310 1 394 331 384 312 380 350 318 396 2 340 336 360 389 352 370 353 319 ... 9 341 300 386 334 386 314 358 326 10 357 386 311 382 356 339 375 357 11 371 396 357 353 306 301 329 312

I need a way to build the follow DataFrame:

per cnt_r0 cnt_r1 cnt_r2 cnt_r3 cnt_r4 cnt_r5 cnt_r6 cnt_r7 0 10 355 359 347 390 304 306 366 310 1 10 394 331 384 312 380 350 318 396 2 10 340 336 360 389 352 370 353 319 ... 7 20 384 385 376 323 345 339 339 347 9 30 341 300 386 334 386 314 358 326 10 30 357 386 311 382 356 339 375 357 11 30 371 396 357 353 306 301 329 312

Note that by construction my dataset has same number of values per per for each r. Obviously my dataset is much larger than the example one (about 800 million records).

per

r

Many thanks for your time.

1 Answer
1

If possible use reshape for 2d array and then insert new colum per:

reshape

2d array

insert

per

np.random.seed(1256) df = pd.DataFrame(data={'per': np.repeat([10,20,30], 32), 'r': 12*list(range(8)), 'cnt': np.random.randint(300, 400, 96)}) df1 = pd.DataFrame(df['cnt'].values.reshape(-1, 8)).add_prefix('cnt_r') df1.insert(0, 'per', np.repeat([10,20,30], 4)) print (df1) per cnt_r0 cnt_r1 cnt_r2 cnt_r3 cnt_r4 cnt_r5 cnt_r6 cnt_r7 0 10 365 358 305 311 393 343 340 313 1 10 393 319 358 351 322 387 316 359 2 10 360 301 337 333 322 337 393 396 3 10 320 344 325 310 338 381 314 339 4 20 323 305 342 340 343 319 332 371 5 20 398 308 350 320 340 319 305 369 6 20 344 340 345 332 373 334 304 331 7 20 323 349 301 334 344 374 300 336 8 30 357 375 396 354 309 391 304 334 9 30 311 395 372 359 370 342 351 330 10 30 378 302 306 341 308 392 387 332 11 30 350 373 316 376 338 351 398 304

Or use cumcount for create new groups and reshape by set_index with unstack:

cumcount

set_index

unstack

df = (df.set_index([df.groupby('r').cumcount(), 'per','r'])['cnt'] .unstack() .add_prefix('cnt_r') .reset_index(level=1) .rename_axis(None, axis=1)) print (df) per cnt_r0 cnt_r1 cnt_r2 cnt_r3 cnt_r4 cnt_r5 cnt_r6 cnt_r7 0 10 365 358 305 311 393 343 340 313 1 10 393 319 358 351 322 387 316 359 2 10 360 301 337 333 322 337 393 396 3 10 320 344 325 310 338 381 314 339 4 20 323 305 342 340 343 319 332 371 5 20 398 308 350 320 340 319 305 369 6 20 344 340 345 332 373 334 304 331 7 20 323 349 301 334 344 374 300 336 8 30 357 375 396 354 309 391 304 334 9 30 311 395 372 359 370 342 351 330 10 30 378 302 306 341 308 392 387 332 11 30 350 373 316 376 338 351 398 304

Thank you some much for the enlightening answers! The problem I faced (on my dataset) with the first answer is with the insert. I had to create the per column with the following code df1.insert(0, 'per', df['per'].values.reshape(-1,8).transpose()[0]) Otherwise the second solution works like a charm.
– Claudio
Jul 2 at 14:16

per

df1.insert(0, 'per', df['per'].values.reshape(-1,8).transpose()[0])

@Claudio - Hmmm, it depends of real data, np.repeat is not possible use?
– jezrael
Jul 2 at 14:18

np.repeat

you are too fast... :D
– Claudio
Jul 2 at 14:18

@Claudio - OK, I can check first solution
– jezrael
Jul 2 at 14:20

hmmm, for me your code and similar working df1.insert(0, 'per', df['per'].values.reshape(-1,8)[:, [0]])
– jezrael
Jul 2 at 14:22

df1.insert(0, 'per', df['per'].values.reshape(-1,8)[:, [0]])

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Gtjkyu