import pandas as pd
path1 = "/home/supertramp/Desktop/100&life_180_data.csv"
mydf = pd.read_csv(path1)
numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}
print mydf['Cigarettes']
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
print mydf['CigarNum']
mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')
The csv file "100&life_180_data.csv" contains columns like age, bmi,Cigarettes,Alocohol etc.
No int64
Age int64
BMI float64
Alcohol object
Cigarettes object
dtype: object
Cigarettes column contains "Never" "1-5 Cigarettes/day","10-20 Cigarettes/day". I want to assign weights to these object (Never,1-5 Cigarettes/day ,....)
The expected output is new column CigarNum appended which consists only numbers 0,1,2 CigarNum is as expected till 8 rows and then shows Nan till last row in CigarNum column
0 Never
1 Never
2 1-5 Cigarettes/day
3 Never
4 Never
5 Never
6 Never
7 Never
8 Never
9 Never
10 Never
11 Never
12 10-20 Cigarettes/day
13 1-5 Cigarettes/day
14 Never
...
167 Never
168 Never
169 10-20 Cigarettes/day
170 Never
171 Never
172 Never
173 Never
174 Never
175 Never
176 Never
177 Never
178 Never
179 Never
180 Never
181 Never
Name: Cigarettes, Length: 182, dtype: object
The output I get shoudln't give NaN after few first rows.
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 NaN
11 NaN
12 NaN
13 NaN
14 0
...
167 NaN
168 NaN
169 NaN
170 NaN
171 NaN
172 NaN
173 NaN
174 NaN
175 NaN
176 NaN
177 NaN
178 NaN
179 NaN
180 NaN
181 NaN
Name: CigarNum, Length: 182, dtype: float64
pandas Convert String to FloatUse pandas DataFrame. astype() function to convert column from string/int to float, you can apply this on a specific column or on an entire DataFrame. To cast the data type to 54-bit signed float, you can use numpy. float64 , numpy.
We can convert a string to float in Python using the float() function. This is a built-in function used to convert an object to a floating point number. Internally, the float() function calls specified object __float__() function.
A float value can be converted to an int value no larger than the input by using the math. floor() function, whereas it can also be converted to an int value which is the smallest integer greater than the input using math. ceil() function. The math module is to be imported in order to use these methods.
In this article, we’ll look at different ways in which we can convert a string to a float in a pandas dataframe. Now, let’s create a Dataframe with ‘Year’ and ‘Inflation Rate’ as a column. Method 1: Using DataFrame.astype (). The method is used to cast a pandas object to a specified dtype.
The following code shows how to use the to_numeric () function to convert the points column in the DataFrame from an object to a float: Notice that the points column now has a data type of float64. Also note that this method produces the exact same result as the previous method.
Convert multiple string column to int In this example, we are converting multiple columns that have a numeric string to int by using the astype (int) method of the Pandas library. We are using a Python dictionary to change multiple columns datatype Where keys specify the column and values specify a new datatype.
Run the code in Python and you would see that the data type for the ‘Price’ column is Object: The goal is to convert the values under the ‘Price’ column into a float. You can then use the astype(float) method to perform the conversion into a float: In the context of our example, the ‘DataFrame Column’ is the ‘Price’ column.
OK, first problem is you have embedded spaces causing the function to incorrectly apply:
fix this using vectorised str
:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
now create your new column should just work:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
UPDATE
Thanks to @Jeff as always for pointing out superior ways to do things:
So you can call replace
instead of calling apply
:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
you can also use factorize
method also.
Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?
So:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
Version 0.17.0 or newer
convert_objects
is deprecated since 0.17.0
, this has been replaced with to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
Here errors='coerce'
will return NaN
where the values cannot be converted to a numeric value, without this it will raise an exception
Try using this function for all problems of this kind:
def get_series_ids(x):
'''Function returns a pandas series consisting of ids,
corresponding to objects in input pandas series x
Example:
get_series_ids(pd.Series(['a','a','b','b','c']))
returns Series([0,0,1,1,2], dtype=int)'''
values = np.unique(x)
values2nums = dict(zip(values,range(len(values))))
return x.replace(values2nums)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With