I am fairly new to pandas and come from a statistics background and I am struggling with a conceptual problem: Pandas has columns, who are containing values. But sometimes values have a special meaning - in a statistical program like SPSS or R called a "value labels".
Imagine a column rain
with two values 0
(meaning: no rain) and 1
(meaning: raining). Is there a way to assign these labels to that values?
Is there a way to do this in pandas, too? Mainly for platting and visualisation purposes.
In pandas documents, the term label is used as if it is granted that we know what it is, such as in Indexing and selecting data. The axis labeling information in pandas objects serves many purposes: pandas provides a suite of methods in order to have purely label based indexing.
It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas. Pandas DataFrame. values attribute return a Numpy representation of the given DataFrame.
You can get the value of a cell from a pandas dataframe using df. iat[0,0] .
There's not need to use a map
anymore. Since version 0.15, Pandas allows a categorical data type for its columns.
The stored data takes less space, operations on it are faster and you can use labels.
I'm taking an example from the pandas docs:
df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
#Recast grade as a categorical variable
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
#Gives this:
Out[124]:
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): [a, b, e]
You can also rename categories and add missing categories
You could have a separate dictionary which maps values to labels:
d={0:"no rain",1:"raining"}
and then you could access the labelled data by doing
df.rain_column.apply(lambda x:d[x])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With