Searching for element in a column by iterating over the column, pandas

Multi tool use
Searching for element in a column by iterating over the column, pandas
In my data frame I need to remove columns that contain a specific character. In order to search for those columns, I am trying to write a for loop in python that iterate over each column and, if find a column with the unwanted character, this column has to be dropped out.
My data frame appears like this and I need to drop col3 and col5 that have 'f' and 't'
col1 col2 col3 col4 col5 col6
1245 pink f Mar f f
245 green f Feb t f
1237 grey t Apr f f
267 black f Sep t f
I am trying to write a script similar to this
for col in df.items():
if df[col] == 'f'
df = df.drop([col], axis=1)
2 Answers
2
With pd.DataFrame.loc
and pd.DataFrame.any
functions:
pd.DataFrame.loc
pd.DataFrame.any
In [196]: df
Out[196]:
col1 col2 col3 col4 col5
0 1245 pink t Mar f
1 245 green f Feb t
2 1237 grey f Apr f
3 267 black f Sep f
4 111 red t Aug t
In [197]: df.loc[:, ~((df == 'f') | (df == 't')).any(axis=0)]
Out[197]:
col1 col2 col4
0 1245 pink Mar
1 245 green Feb
2 1237 grey Apr
3 267 black Sep
4 111 red Aug
Thanks a lot. I got this error: 'Could not compare ['f'] with block values'
– Annalix
51 mins ago
@Annalix, see my update where I've printed the initial
df
structure, check if you posted the actual dataframe– RomanPerekhrest
47 mins ago
df
My actual dataframe is much more complex, with 200 columns, that's why I need to write a script. Anyway the dataframe posted is the same...what does mean block values?
– Annalix
36 mins ago
@Annalix, I suppose you have some other values in columns
col3
and col5
except f
, right?– RomanPerekhrest
28 mins ago
col3
col5
f
I have updated the dtype to all object type and now both scripts suggested by you and @Joe works. Doing this I have discovered that you are write, there is also a 't'. Sorry but I have hundred od columns and a lot of them are similar, containing 't' and 'f' that I need to IDENTIFY and then remove.
– Annalix
19 mins ago
You can create a boolean mask of the columns which contains only f
and then apply the mask to the df:
f
mask = ((df == 'f') | (df=='t')).any(0)
df = df[df.columns[~mask]]
Thanks @Joe. I got again the error as the other suggestion: 'Could not compare ['f'] with block values'
– Annalix
50 mins ago
Can you post some of the real df?
– Joe
45 mins ago
@Annalix maybe you are trying to compare string with numerical values. You could try with:
df = df.astype(np.object)
so they are all the same type. Check the type of the columns with df.info()
– Joe
38 mins ago
df = df.astype(np.object)
df.info()
Fixed the type problem. Thanks so much! Now, as explained I have just discovered that my columns contain also 't'. See update dataframe in the post.
– Annalix
11 mins ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Does the column 6 have to stay?
– Joe
1 min ago