Pandas for Python: Exception: Data must be 1-dimensional

Tags:

Here's what I got from a tutorial

# Data Preprocessing

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
# Encoding the Dependent Variable
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

This is the X matrix with encoded dummy variables

1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    4.400000000000000000e+01    7.200000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    2.700000000000000000e+01    4.800000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    3.000000000000000000e+01    5.400000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    3.800000000000000000e+01    6.100000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    4.000000000000000000e+01    6.377777777777778101e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    3.500000000000000000e+01    5.800000000000000000e+04
0.000000000000000000e+00    0.000000000000000000e+00    1.000000000000000000e+00    3.877777777777777857e+01    5.200000000000000000e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    4.800000000000000000e+01    7.900000000000000000e+04
0.000000000000000000e+00    1.000000000000000000e+00    0.000000000000000000e+00    5.000000000000000000e+01    8.300000000000000000e+04
1.000000000000000000e+00    0.000000000000000000e+00    0.000000000000000000e+00    3.700000000000000000e+01    6.700000000000000000e+04

The problem is there are no column labels. I tried

something = pd.get_dummies(X)

But I get the following Exception

Exception: Data must be 1-dimensional

859

asked Aug 22 '17 23:08

Tyler L

1 Answers

Most sklearn methods don't care about column names, as they're mainly concerned with the math behind the ML algorithms they implement. You can add column names back onto the OneHotEncoder output after fit_transform(), if you can figure out the label encoding ahead of time.

First, grab the column names of your predictors from the original dataset, excluding the first one (which we reserve for LabelEncoder):

X_cols = dataset.columns[1:-1]
X_cols
# Index(['Age', 'Salary'], dtype='object')

Now get the order of the encoded labels. In this particular case, it looks like LabelEncoder() organizes its integer mapping alphabetically:

labels = labelencoder_X.fit(X[:, 0]).classes_ 
labels
# ['France' 'Germany' 'Spain']

Combine these column names, and then add them to X when you convert to DataFrame:

# X gets re-used, so make sure to define encoded_cols after this line
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
encoded_cols = np.append(labels, X_cols)
# ...
X = onehotencoder.fit_transform(X).toarray()
encoded_df = pd.DataFrame(X, columns=encoded_cols)

encoded_df
   France  Germany  Spain        Age        Salary
0     1.0      0.0    0.0  44.000000  72000.000000
1     0.0      0.0    1.0  27.000000  48000.000000
2     0.0      1.0    0.0  30.000000  54000.000000
3     0.0      0.0    1.0  38.000000  61000.000000
4     0.0      1.0    0.0  40.000000  63777.777778
5     1.0      0.0    0.0  35.000000  58000.000000
6     0.0      0.0    1.0  38.777778  52000.000000
7     1.0      0.0    0.0  48.000000  79000.000000
8     0.0      1.0    0.0  50.000000  83000.000000
9     1.0      0.0    0.0  37.000000  67000.000000

NB: For example data I'm using this dataset, which seems either very similar or identical to the one used by OP. Note how the output is identical to OP's X matrix.

125

answered Oct 20 '22 20:10

andrew_reece

Related questions
                            
                                Python 3.6 tkinter window icon on Linux error
                            
                                create pirate plot in seaborn (combination of box and point plot)
                            
                                Unknown column 'nan' in 'field list' python pandas
                            
                                How can I multiply a n*m DataFrame with a 1*m DataFrame in pandas?
                            
                                Mark test to be run in independent process
                            
                                How to add borders to a table in excel sheet created by pandas dataframe?
                            
                                Delete python environment
                            
                                Keras log_loss error is same
                            
                                How to check if a Jupyter Notebook extension is enabled?
                            
                                Django - (1366, "Incorrect string value:... error
                            
                                Error, 'only list-like objects are allowed to be passed to isin(), you passed a [int]'
                            
                                Move files in folders to a top-level directory
                            
                                Sorting items by drag and drop in django
                            
                                Virtual Environment For Installing Tensorflow : Why Do I need it for Whiich Purpose?
                            
                                Set Python Logging to overwrite log file when using dictConfig?
                            
                                How to write JSON data to Dynamodb by ignoring empty elements in boto3
                            
                                Parsing large, possibly compressed, files in Python
                            
                                Is it possible to get recent 10 emails using Gmail-api?
                            
                                Circular imports in classes
                            
                                Python decorators just syntactic sugar? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas for Python: Exception: Data must be 1-dimensional

Tags:

python

pandas

one-hot-encoding

scikit-learn

Tyler L

People also ask

1 Answers

andrew_reece

Recent Activity

Donate For Us