regex : get part of text from url data

Multi tool use
Multi tool use


regex : get part of text from url data



I have many of this type of url :


http://www.example.com/some-text-to-get/jkl/another-text-to-get



I want to be able to get this :


["some-text-to-get", "another-text-to-get"]



I tried this :


re.findall(".*([[a-z]*-[a-z]*]*).*", "http://www.example.com/some-text-to-get/jkl/another-text-to-get")



but it's not working. Any idea ?




4 Answers
4



You can use a lookbehind and lookahead:


import re
s = 'http://www.example.com/some-text-to-get/jkl/another-text-to-get'
final_result = re.findall('(?<=.w{3}/)[a-z-]+|[a-z-]+(?=$)', s)



Output:


['some-text-to-get', 'another-text-to-get']





I want only lowercase words, is that possible to do ? Can't make it work with [a-z]
– Mohamed AL ANI
21 mins ago


[a-z]





@MohamedALANI Please see my recent edit.
– Ajax1234
18 mins ago



You could capture the 2 parts in a capturing group:



^https?://[^/]+/([^/]+).*/(.*)$


^https?://[^/]+/([^/]+).*/(.*)$



That would match:


^


https?://


://


[^/]+/


([^/]+)


.*


/


.*


(.*)$


$



Your matches are in the first and second capturing group.



Demo



Or you could parse the url, get the path, split by a / and get your parts by index:


/


from urlparse import urlparse

o = urlparse('http://www.example.com/some-text-to-get/jkl/another-text-to-get')
parts = filter(None, o.path.split('/'))
print(parts[0])
print(parts[2])



Given:


>>> s
"http://www.example.com/some-text-to-get/jkl/another-text-to-get"



You can use this regex:


>>> re.findall(r"/([a-z-]+)(?:/|$)", s)
['some-text-to-get', 'another-text-to-get']



Of course you can do this with Python string methods and a list comprehension:


>>> [e for e in s.split('/') if '-' in e]
['some-text-to-get', 'another-text-to-get']



You could capture it using this regular expression:



((?:[a-z]+-)+[a-z]+)


((?:[a-z]+-)+[a-z]+)



[a-z]+ match one or more character


[a-z]+



(?:[a-z]+-) don't capture in group


(?:[a-z]+-)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

QR,ds5ZMPJyUKqdCmGRtgQeW PX OdnWj,u9,hyNDspwMST97vcsLAB0SMRT8nW6Qj3s3GHs,9,wmw
4ZOpMjw Qd hUuQYxSeY8ZT0,khuV3w27frkw JF5wA q8rJM nBnyDtiJ

Popular posts from this blog

Rothschild family

Cinema of Italy