my purpose of this code is to extract all the integers from the text and sum them up together.
I have been looking for solutions to pluck out all the integers in a line of text. I saw some solutions suggesting to use \D
and \b
, I just got started with regular expression and still unfamiliar with how it can fit into my code. Please help :(
import re
import urllib2
data = urllib2.urlopen("http://python-data.dr-chuck.net/regex_sum_179860.txt")
aList = []
for word in data:
data = (str(w) for w in data)
s = re.findall(r'[\d]+', word)
if len(s) != 1: continue
num = int(s[0])
aList.append(num)
print aList
To find numbers from a given string in Python we can easily apply the isdigit() method. In Python the isdigit() method returns True if all the digit characters contain in the input string and this function extracts the digits from the string. If no character is a digit in the given string then it will return False.
The finditer function of the regex library can help us perform the task of finding the occurrences of the substring in the target string and the start function can return the resultant index of each of them.
Python String count() The count() method returns the number of occurrences of a substring in the given string.
1. Using indexOf() and lastIndexOf() method. The String class provides an indexOf() method that returns the index of the first appearance of a character in a string. To get the indices of all occurrences of a character in a String, you can repeatedly call the indexOf() method within a loop.
read
of the return value of the urllib2.urlopen
; The return value of urllib2.urlopen
is not a string, but a connection object (file-like object)re.findall
to the data
.\d
are not necessary.import re
import urllib2
data = urllib2.urlopen("http://python-data.dr-chuck.net/regex_sum_179860.txt").read()
int_list = map(int, re.findall(r'\d+', data))
>>> int_list
[3524, 9968, 6177, 3133, 6508, 7940, 3738, 1112, 6179, 4570, 6127, 9150,
9883, 418, 3538, 2992, 8527, 1150, 2049, 2834, 2630, 3840, 2638, 3800,
9144, 5866, 6742, 588, 6918, 7802, 8229, 7947, 8992, 1339, 2119, 846,
3820, 4070, 9356, 9708, 3238, 9380, 5572, 9491, 3038, 7434, 7771, 288,
8632, 3962, 9136, 8106, 7295, 3699, 4136, 3459, 8120, 6018, 8963, 5779,
3635, 3984, 4850, 9633, 2588, 7631, 9591, 1067, 7182, 1301, 8041, 1361,
5425, 8326, 7094, 8155, 2581, 7199, 6125, 42]
You can do it line by line, call findall
using the pattern "\d+"
for one or more digits and extending your output list:
import re
import urllib2
data = urllib2.urlopen("http://python-data.dr-chuck.net/regex_sum_179860.txt")
r = re.compile("\d+")
l = []
for line in data:
l.extend(map(int,r.findall(line)))
Output:
[3524, 9968, 6177, 3133, 6508, 7940, 3738, 1112, 6179, 4570, 6127, 9150, 9883, 418, 3538, 2992, 8527, 1150, 2049, 2834, 2630, 3840, 2638, 3800, 9144, 5866, 6742, 588, 6918, 7802, 8229, 7947, 8992, 1339,
2119, 846, 3820, 4070, 9356, 9708, 3238, 9380, 5572, 9491, 3038,
7434, 7771, 288, 8632, 3962, 9136, 8106, 7295, 3699, 4136, 3459, 8120,
6018, 8963, 5779, 3635, 3984, 4850, 9633, 2588, 7631, 9591, 1067,
7182, 1301, 8041, 1361, 5425, 8326, 7094, 8155, 2581, 7199, 6125, 42]
You could also use str.isdigit
:
l = []
for line in data:
l.extend(map(int,(w for w in line.split() if w.isdigit())))
If you just want to sum
the numbers, you don't need to store all the numbers at all:
print(sum(sum(map(int,(w for w in line.split() if w.isdigit()))) for line in data))
Output:
435239
Or using a regex:
print(sum(sum(map(int,r.findall(line))) for line in data))
Probably irrelevant in your case but if you wanted to avoid any intermediary lists using python2 you could use itertools.imap
:
from itertools import imap
print(sum(sum(imap(int,r.findall(line))) for line in data))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With