It is few hours I am stuck with this:
I have a DataFrame containing a list of email addresses, from those email addresses I want to check whether in the mail is contained or not a number I.E. [email protected]
, if yes I want this number to be appended to an array:
I have tried both with a DataFrame, and also a ndarray woth numpy, but it does not work. This is what i am trying to do:
mail_addresses = pd.DataFrame(customers_df.iloc[:,0].values)
mail_addresses = mail_addresses.dropna(axis = 0, how= 'all')
mail_addresses_toArray = mail_addresses.values
for i in mail_addresses:
dates =[]
if any(i.isdigit()) == True:
dates.append(i)
print(dates)
I think my problem is that I don't know how I can convert all the elements in this array to string so that the method isdigit()
would work and iterate through all the elements inside (825 mail addresses).
When running the code above this is the error i get:
AttributeError: 'numpy.int64' object has no attribute 'isdigit'
Meanwhile, if i try with the numpy array (mail_addresses_toArray) this is the error:
AttributeError: 'numpy.ndarray' object has no attribute 'isdigit'
Python String isdigit() The isdigit() method returns True if all characters in a string are digits. If not, it returns False .
isreal() method returns Boolean value. If it returns True, it means that the value is numeric and if False is the result, then the value is non-numeric. In this way, we can check that a row is numeric or non-numeric values in it.
To check for numeric columns, you could use df[c]. dtype. kind in 'iufcb' where c is any given column name. The comparison will yeild a True or False boolean output.
Check whether all characters in each string are numeric. This is equivalent to running the Python string method str. isnumeric() for each element of the Series/Index.
Use extract
if each mail contains only one number
or findall
if there is possible multiple ones:
customers_df = pd.DataFrame({'A':['[email protected]','[email protected]',
'[email protected]','[email protected]'],
'B':[4,5,4,5],
'C':[7,8,9,4]})
print (customers_df)
A B C
0 [email protected] 4 7
1 [email protected] 5 8
2 [email protected] 4 9
3 [email protected] 5 4
L = customers_df.iloc[:,0].str.extract('(\d+)', expand=False).dropna().astype(int).tolist()
print (L)
[123, 123, 23]
L = np.concatenate(customers_df.iloc[:,0].str.findall('(\d+)')).astype(int).tolist()
print (L)
[123, 123, 23, 55]
Here is one way.
import pandas as pd
df = pd.DataFrame({'A': ['[email protected]', '[email protected]',
'[email protected]', None]})
s = df['A'].dropna()
t = s.map(lambda x: ''.join([i for i in x if i.isdigit()]).strip())
res = t.loc[t != ''].map(int).tolist()
# [123, 43]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With