I'm looking for the easiest way to convert all non-numeric data (including blanks) in Python to zeros. Taking the following for example:
someData = [[1.0,4,'7',-50],['8 bananas','text','',12.5644]]
I would like the output to be as follows:
desiredData = [[1.0,4,7,-50],[0,0,0,12.5644]]
So '7' should be 7, but '8 bananas' should be converted to 0.
To encode non-numeric data to numeric you can use scikit-learn's LabelEncoder. It will encode each category such as COL1's a , b , c to integers. enc. fit() creates the corresponding integer values.
Use the re. sub() method to remove all non-numeric characters from a string, e.g. result = re. sub(r'[^0-9]', '', my_str) .
import numbers
def mapped(x):
if isinstance(x,numbers.Number):
return x
for tpe in (int, float):
try:
return tpe(x)
except ValueError:
continue
return 0
for sub in someData:
sub[:] = map(mapped,sub)
print(someData)
[[1.0, 4, 7, -50], [0, 0, 0, 12.5644]]
It will work for different numeric types:
In [4]: from decimal import Decimal
In [5]: someData = [[1.0,4,'7',-50 ,"99", Decimal("1.5")],["foobar",'8 bananas','text','',12.5644]]
In [6]: for sub in someData:
...: sub[:] = map(mapped,sub)
...:
In [7]: someData
Out[7]: [[1.0, 4, 7, -50, 99, Decimal('1.5')], [0, 0, 0, 0, 12.5644]]
if isinstance(x,numbers.Number)
catches subelements that are already floats, ints etc.. if it is not a numeric type we first try casting to int then to float, if none of those are successful we simply return 0
.
Another solution using regular expressions
import re
def toNumber(e):
if type(e) != str:
return e
if re.match("^-?\d+?\.\d+?$", e):
return float(e)
if re.match("^-?\d+?$", e):
return int(e)
return 0
someData = [[1.0,4,'7',-50],['8 bananas','text','',12.5644]]
someData = [map(toNumber, list) for list in someData]
print(someData)
you get:
[[1.0, 4, 7, -50], [0, 0, 0, 12.5644]]
Note It don't works for numbers in scientific notation
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With