I don't know any python but I need to customize a script a little bit.
There are strings parsed in the script and put to a list (I guess).
Then these strings are filtered based on whether they start with "http". What I want to add is a filter based on their file extension as well. All links ending on html
or xml
shall be filtered.
This is the code that filters all hyperlinks:
links = filter (lambda x:x.startswith("http://") , links)
I don't know the proper syntax to put an OR operator for something like .endswith(".html) OR .endswith("xml")
I know this would filter all links ending on .html
but I also need the .xml
links.
links = filter (lambda x:x.startswith("http://") , links)
links = filter (lambda x:x.endswith(".html") , links)
filter() method is a very useful method of Python. One or more data values can be filtered from any string or list or dictionary in Python by using filter() method. It filters data based on any particular condition. It stores data when the condition returns true and discard data when returns false.
Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .
filter() in python The filter() method filters the given sequence with the help of a function that tests each element in the sequence to be true or not. syntax: filter(function, sequence) Parameters: function: function that tests if each element of a sequence true or not.
contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
If you're on at least 2.5, you can pass a tuple of suffixes to endswith
. Thanks to @hcwhsa for pointing that out:
links = filter(lambda x:x.endswith((".html", ".xml")), links)
If you're using an earlier version, you can use the or
operator:
links = filter(lambda x:x.endswith(".html") or x.endswith(".xml"), links)
Though you will want to lowercase x if you're not sure it's already lowercased.
I would probably do this with a generator expression rather than filter
, and certainly without successive calls to filter
:
links = [link for link in links if link.startswith('http://') and link.endswith(('.html', '.xml'))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With