Searching for element in a column by iterating over the column, pandas

Multi tool use
Multi tool use


Searching for element in a column by iterating over the column, pandas



In my data frame I need to remove columns that contain a specific character. In order to search for those columns, I am trying to write a for loop in python that iterate over each column and, if find a column with the unwanted character, this column has to be dropped out.
My data frame appears like this and I need to drop col3 and col5 that have 'f' and 't'


col1 col2 col3 col4 col5 col6
1245 pink f Mar f f
245 green f Feb t f
1237 grey t Apr f f
267 black f Sep t f



I am trying to write a script similar to this


for col in df.items():
if df[col] == 'f'
df = df.drop([col], axis=1)





Does the column 6 have to stay?
– Joe
1 min ago





2 Answers
2



With pd.DataFrame.loc and pd.DataFrame.any functions:


pd.DataFrame.loc


pd.DataFrame.any


In [196]: df
Out[196]:
col1 col2 col3 col4 col5
0 1245 pink t Mar f
1 245 green f Feb t
2 1237 grey f Apr f
3 267 black f Sep f
4 111 red t Aug t

In [197]: df.loc[:, ~((df == 'f') | (df == 't')).any(axis=0)]
Out[197]:
col1 col2 col4
0 1245 pink Mar
1 245 green Feb
2 1237 grey Apr
3 267 black Sep
4 111 red Aug





Thanks a lot. I got this error: 'Could not compare ['f'] with block values'
– Annalix
51 mins ago





@Annalix, see my update where I've printed the initial df structure, check if you posted the actual dataframe
– RomanPerekhrest
47 mins ago


df





My actual dataframe is much more complex, with 200 columns, that's why I need to write a script. Anyway the dataframe posted is the same...what does mean block values?
– Annalix
36 mins ago





@Annalix, I suppose you have some other values in columns col3 and col5 except f , right?
– RomanPerekhrest
28 mins ago


col3


col5


f





I have updated the dtype to all object type and now both scripts suggested by you and @Joe works. Doing this I have discovered that you are write, there is also a 't'. Sorry but I have hundred od columns and a lot of them are similar, containing 't' and 'f' that I need to IDENTIFY and then remove.
– Annalix
19 mins ago



You can create a boolean mask of the columns which contains only f and then apply the mask to the df:


f


mask = ((df == 'f') | (df=='t')).any(0)
df = df[df.columns[~mask]]





Thanks @Joe. I got again the error as the other suggestion: 'Could not compare ['f'] with block values'
– Annalix
50 mins ago





Can you post some of the real df?
– Joe
45 mins ago





@Annalix maybe you are trying to compare string with numerical values. You could try with: df = df.astype(np.object) so they are all the same type. Check the type of the columns with df.info()
– Joe
38 mins ago



df = df.astype(np.object)


df.info()





Fixed the type problem. Thanks so much! Now, as explained I have just discovered that my columns contain also 't'. See update dataframe in the post.
– Annalix
11 mins ago






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

TqVMm gv1SriJEY0
AznfoBKp3j3r m8v5sEQX,We2eV,Zqd,8uag hZWM8jMRGsgpTiu6BzIjTiWaLF70GB88Fe499KSQsnz

Popular posts from this blog

Rothschild family

Cinema of Italy