Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame and Keras

I'm trying to perform a sentiment analysis in Python using Keras. To do so, I need to do a word embedding of my texts. The problem appears when I try to fit the data to my model:

model_1 = Sequential()
model_1.add(Embedding(1000,32, input_length = X_train.shape[0]))
model_1.add(Flatten())
model_1.add(Dense(250, activation='relu'))
model_1.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

The shape of my train data is

(4834,)

And is a Pandas series object. When I try to fit my model and validate it with some other data I get this error:

model_1.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=2, batch_size=64, verbose=2)

ValueError: Error when checking model input: expected embedding_1_input to have shape (None, 4834) but got array with shape (4834, 1)

How can I reshape my data to make it suited for Keras? I've been trying with np.reshape but I cannot place None elements with that function.

Thanks in advance

like image 457
Gonzalo Donoso Avatar asked May 09 '17 17:05

Gonzalo Donoso


People also ask

Can keras use pandas DataFrame?

Transforming data Now, we can use our Automater to transform the dataset, from a pandas DataFrame to numpy objects properly formatted for Keras's input and output layers. This will return two objects: X : An array, containing numpy object for each Keras input.

What is a pandas DataFrame?

What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

What is the difference between pandas and DataFrame?

Pandas library is heavily used for Data Analytics, Machine learning, data science projects, and many more. Pandas can load the data by reading CSV, JSON, SQL, many other formats and creates a DataFrame which is a structured object containing rows and columns (similar to SQL table).


4 Answers

None is the number of expected rows that goes into training therefore you can't define it. Also Keras needs a numpy array as input and not a pandas dataframe. First convert the df to a numpy array with df.values and then do np.reshape((-1, 4834)). Note that you should use np.float32. This is important if you train it on GPU.

like image 149
Dat Tran Avatar answered Oct 24 '22 00:10

Dat Tran


https://pypi.org/project/keras-pandas/

Easiest way is having the keras_pandas package to fit a pandas dataframe to keras.The code shown below is an general example from the package docs.

from keras import Model
from keras.layers import Dense

from keras_pandas.Automater import Automater
from keras_pandas.lib import load_titanic

observations = load_titanic()

# Transform the data set, using keras_pandas
categorical_vars = ['pclass', 'sex', 'survived']
numerical_vars = ['age', 'siblings_spouses_aboard', 'parents_children_aboard', 'fare']
text_vars = ['name']

auto = Automater(categorical_vars=categorical_vars, numerical_vars=numerical_vars, text_vars=text_vars,
 response_var='survived')
X, y = auto.fit_transform(observations)

# Start model with provided input nub
x = auto.input_nub

# Fill in your own hidden layers
x = Dense(32)(x)
x = Dense(32, activation='relu')(x)
x = Dense(32)(x)

# End model with provided output nub
x = auto.output_nub(x)

model = Model(inputs=auto.input_layers, outputs=x)
model.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=4, validation_split=.2)
like image 31
Pardhu Avatar answered Oct 24 '22 00:10

Pardhu


You need a specific version of Pandas for this to work. If you use the current version (as of 20th Aug 2018) this will fail.

Rollback your Pandas and Keras (pip uninstall ....) and then install a specific version like this

python -m pip install pandas==0.19.2
like image 26
Tim Seed Avatar answered Oct 24 '22 00:10

Tim Seed


Use tf.data.Dataset.from_tensor_slices to read the values from a pandas dataframe.

See https://www.tensorflow.org/tutorials/load_data/pandas_dataframe for reference how to do this properly in TF2.x

like image 28
Aleksey Vlasenko Avatar answered Oct 23 '22 22:10

Aleksey Vlasenko