regex : get part of text from url data

Multi tool use
regex : get part of text from url data
I have many of this type of url :
http://www.example.com/some-text-to-get/jkl/another-text-to-get
I want to be able to get this :
["some-text-to-get", "another-text-to-get"]
I tried this :
re.findall(".*([[a-z]*-[a-z]*]*).*", "http://www.example.com/some-text-to-get/jkl/another-text-to-get")
but it's not working. Any idea ?
4 Answers
4
You can use a lookbehind and lookahead:
import re
s = 'http://www.example.com/some-text-to-get/jkl/another-text-to-get'
final_result = re.findall('(?<=.w{3}/)[a-z-]+|[a-z-]+(?=$)', s)
Output:
['some-text-to-get', 'another-text-to-get']
[a-z]
@MohamedALANI Please see my recent edit.
– Ajax1234
18 mins ago
You could capture the 2 parts in a capturing group:
^https?://[^/]+/([^/]+).*/(.*)$
^https?://[^/]+/([^/]+).*/(.*)$
That would match:
^
https?://
://
[^/]+/
([^/]+)
.*
/
.*
(.*)$
$
Your matches are in the first and second capturing group.
Demo
Or you could parse the url, get the path, split by a /
and get your parts by index:
/
from urlparse import urlparse
o = urlparse('http://www.example.com/some-text-to-get/jkl/another-text-to-get')
parts = filter(None, o.path.split('/'))
print(parts[0])
print(parts[2])
Given:
>>> s
"http://www.example.com/some-text-to-get/jkl/another-text-to-get"
You can use this regex:
>>> re.findall(r"/([a-z-]+)(?:/|$)", s)
['some-text-to-get', 'another-text-to-get']
Of course you can do this with Python string methods and a list comprehension:
>>> [e for e in s.split('/') if '-' in e]
['some-text-to-get', 'another-text-to-get']
You could capture it using this regular expression:
((?:[a-z]+-)+[a-z]+)
((?:[a-z]+-)+[a-z]+)
[a-z]+
match one or more character
[a-z]+
(?:[a-z]+-)
don't capture in group
(?:[a-z]+-)
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
I want only lowercase words, is that possible to do ? Can't make it work with
[a-z]
– Mohamed AL ANI
21 mins ago