How to collapse multiple columns into one in pandas


How to collapse multiple columns into one in pandas



I have a pandas dataframe filled with users and categories, but multiple columns for those categories.


| user | category | val1 | val2 | val3 |
| ------ | ------------------| -----| ---- | ---- |
| user 1 | c1 | 3 | NA | None |
| user 1 | c2 | NA | 4 | None |
| user 1 | c3 | NA | NA | 7 |
| user 2 | c1 | 5 | NA | None |
| user 2 | c2 | NA | 7 | None |
| user 2 | c3 | NA | NA | 2 |



I want to get it so the values are compressed into a single column.


| user | category | value|
| ------ | ------------------| -----|
| user 1 | c1 | 3 |
| user 1 | c2 | 4 |
| user 1 | c3 | 7 |
| user 2 | c1 | 5 |
| user 2 | c2 | 7 |
| user 2 | c3 | 2 |



Ultimately, to get a matrix like the following:


np.array([[3, 4, 7], [5, 7, 2]])





How do you get 2 for the final row, since all values are null there?
– jpp
Jun 29 at 14:40


2





Pretty sure that was a typo and the 2 in the val3 column should be dropped down.
– piRSquared
Jun 29 at 14:41


val3





Edited it. yes it was a typo
– hedebyhedge
Jul 2 at 10:29




3 Answers
3



You can use pd.DataFrame.bfill to backfill values over selected columns. However, I'm not sure how you derive 2 for the final value, since no values are non-null in the final row.


pd.DataFrame.bfill


2


val_cols = ['val1', 'val2', 'val3']

df['value'] = pd.to_numeric(df[val_cols].bfill(axis=1).iloc[:, 0], errors='coerce')

print(df)

user0 category val1 val2 val3 value
0 user 1 c1 3.0 NaN None 3.0
1 user 1 c2 NaN 4.0 None 4.0
2 user 1 c3 NaN NaN 7 7.0
3 user 2 c1 5.0 NaN None 5.0
4 user 2 c2 NaN 7.0 2 7.0
5 user 2 c3 NaN NaN None NaN





bfill is a nice way to do it.
– piRSquared
Jun 29 at 14:39


bfill


['user', 'category']


d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).reset_index(name='value')

user category value
0 user 1 c1 3
1 user 1 c2 4
2 user 1 c3 7
3 user 2 c1 5
4 user 2 c2 7
5 user 2 c3 2



You can skip the resetting of the index and unstack to get your final result


d = df.set_index(['user', 'category'])
pd.Series(d.lookup(d.index, d.isna().idxmin(1)), d.index).unstack()

category c1 c2 c3
user
user 1 3 4 7
user 2 5 7 2



You can simply fillna(0) (df2 = df.fillna(0)) and use | operator.


fillna(0)


df2 = df.fillna(0)


|



Convert to int first


int


df2.loc[:, ['val1','val2','val3']] = df2[['val1','val2','val3']].astype(int)



Then


df2['val4'] = df2.val1.values | df2.val2.values | df2.val3.values





Interesting approach (-:
– piRSquared
Jun 29 at 14:39






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Rothschild family

Cinema of Italy