create two dataframes based on regex

create two dataframes based on regex

Here is a small sample of dataframe I would like to split into two separate dataframes.

No Code Name Rem Last Done LACP Chg % Chg Vol ('00) 0 1 0012 3A [S] s 0.940 0.940 - - 20 1 2 7054 AASIA [S] s - 0.205 - - - 2 3 5238 AAX [S] s 0.345 0.340 0.005 +1.47 37,806 3 4 5238WA AAX-WA [S] s 0.135 0.135 - - 590 4 5 7086 ABLEGRP [S] s 0.095 0.100 -0.005 -5.00 300

I want to filter on the "Code" column based on matching or not matching the following python regular expression:

"^[0-9]{1,5}$"

1 Answer
1

Use str.contains with boolean indexing, ~ is for inverting boolean mask:

str.contains

boolean indexing

~

m = df['Code'].str.contains("^[0-9]{1,5}$") df1 = df[m] print (df1) No Code Name Rem Last Done LACP Chg % Chg Vol ('00) 0 1 0012 3A [S] s 0.940 0.940 - - 20 1 2 7054 AASIA [S] s - 0.205 - - - 2 3 5238 AAX [S] s 0.345 0.340 0.005 +1.47 37,806 4 5 7086 ABLEGRP [S] s 0.095 0.100 -0.005 -5.00 300 df2 = df[~m] print (df2) No Code Name Rem Last Done LACP Chg % Chg Vol ('00) 3 4 5238WA AAX-WA [S] s 0.135 0.135 - - 590

Detail:

print (m) 0 True 1 True 2 True 3 False 4 True Name: Code, dtype: bool print (~m) 0 False 1 False 2 False 3 True 4 False Name: Code, dtype: bool

I was having trouble wrapping my head around how boolean indexing would work for this application. Thank you for the illustration.
– Timothy Lombard
Jul 2 at 5:35

is it possible to an "or" condition? something like m = df['Code'].str.contains("^[0-9]{1,5}$" | "5235SS" )
– Timothy Lombard
2 days ago

@TimothyLombard - You can use m = df['Code'].str.contains("^[0-9]{1,5}$|5235SS")
– jezrael
2 days ago

m = df['Code'].str.contains("^[0-9]{1,5}$|5235SS")

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

UPqVm,R,qqcmcd9PNzo3,H1CvBaMgC,X qIFt75GFVcZNAUQ zMc r x7tV0a

搜尋此網誌

Gtjkyu