Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python script to filter a list of strings based on ending

I don't know any python but I need to customize a script a little bit. There are strings parsed in the script and put to a list (I guess). Then these strings are filtered based on whether they start with "http". What I want to add is a filter based on their file extension as well. All links ending on html or xml shall be filtered.

This is the code that filters all hyperlinks:

links = filter (lambda x:x.startswith("http://") , links) 

I don't know the proper syntax to put an OR operator for something like .endswith(".html) OR .endswith("xml")

I know this would filter all links ending on .html but I also need the .xml links.

links = filter (lambda x:x.startswith("http://") , links) 
links = filter (lambda x:x.endswith(".html") , links) 
like image 891
tzippy Avatar asked Nov 26 '13 08:11

tzippy


People also ask

How do you filter a string list in Python?

filter() method is a very useful method of Python. One or more data values can be filtered from any string or list or dictionary in Python by using filter() method. It filters data based on any particular condition. It stores data when the condition returns true and discard data when returns false.

How do you filter the elements based on a function in a Python list?

Python has a built-in function called filter() that allows you to filter a list (or a tuple) in a more beautiful way. The filter() function iterates over the elements of the list and applies the fn() function to each element. It returns an iterator for the elements where the fn() returns True .

How do you filter a sequence in Python?

filter() in python The filter() method filters the given sequence with the help of a function that tests each element in the sequence to be true or not. syntax: filter(function, sequence) Parameters: function: function that tests if each element of a sequence true or not.

Is there a Contains function in Python?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.


1 Answers

If you're on at least 2.5, you can pass a tuple of suffixes to endswith. Thanks to @hcwhsa for pointing that out:

links = filter(lambda x:x.endswith((".html", ".xml")), links)

If you're using an earlier version, you can use the or operator:

links = filter(lambda x:x.endswith(".html") or x.endswith(".xml"), links) 

Though you will want to lowercase x if you're not sure it's already lowercased.

I would probably do this with a generator expression rather than filter, and certainly without successive calls to filter:

links = [link for link in links if link.startswith('http://') and link.endswith(('.html', '.xml'))]
like image 83
Peter DeGlopper Avatar answered Sep 27 '22 23:09

Peter DeGlopper