I am working on a project that deals with sensor data. I collected the sensor data from 3 places and stored the data as a csv file. There are about 4000 samples. The goal is to build a classification model on the dataset. Since it is a time series data, I chose using Bidirectional LSTM.
model = keras.Sequential()
model.add(
keras.layers.Bidirectional(
keras.layers.LSTM(
units = 128,
input_shape = [X_train.shape[1], X_train.shape[2]]
)
)
)
model.add(keras.layers.Dropout(rate = 0.2))
model.add(keras.layers.Dense(units = 128, activation = 'relu'))
model.add(keras.layers.Dense(y_train.shape[1], activation = 'softmax'))
model.compile(
loss = 'categorical_crossentropy',
optimizer = 'adam',
metrics = ['acc']
)
After I train my model
history3 = model.fit(
X_train, y_train,
epochs=35,
batch_size=100,
validation_split = 0.1,
shuffle=False
)
Here’s the accuracy and validation accuracy given by my last epoch:
Epoch 35/35
4002/4002 [==============================] - 3s 858us/step - loss: 0.0216 - acc: 0.9948 - val_loss: 0.3026 - val_acc: 0.9056
When I use model.evaluate(X_test, y_test)
it returns a list of two values: [5.144028138408701, 0.43551796674728394]
So the question is what are those two values?
My guess is that the first value is MSE and the second is an accuracy. If I am right so why is the accuracy so low when I use .evaluate
? What should I do to improve the model??
P.S.
More information
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(4447, 24, 3) (4447, 3) (473, 24, 3) (473, 3)
The data is ordered data so I use shuffle = True
during split.
df_train, df_test = train_test_split(df, test_size = 0.1, shuffle = True)