休息的周末

2018/06/10 周末

Posted by WangXiaoDong on June 10, 2018
    今天早晨,来到实验室,开始学习深度学习的知识,看了一上午,感觉有点理解怎么编程了,但是缺的东西
在于如何运用日志相关的数据进行分析。
    下午,通过尝试,终于进行了一下简单数据的操作。下面总结一下。

首先是最近的数据读取,我们可以使用Keras自带的数据集:

# -*- coding: utf-8 -*-
"""
Spyder Editor
This is a temporary script file.
"""

# Data                                                                   1. 数据
Your data needs to be stored as NumPy arrays or as a list of NumPy arrays. 
Ideally, you split the data in training and test sets, for which you can also 
resort to the train_test_split module of sklearn.cross_validation.

## Keras Data Sets                                                       1.1 Keras数据集
from keras.datasets import boston_housing, mnist, cifar10, imdb
(x_train,y_train),(x_test,y_test) = mnist.load_data()
(x_train2,y_train2),(x_test2,y_test2) = boston_housing.load_data()
(x_train3,y_train3),(x_test3,y_test3) = cifar10.load_data()
(x_train4,y_train4),(x_test4,y_test4) = imdb.load_data(num_words=20000)
num_classes = 10

当然也可以自己通过URL连接下载自己需要的网上开源的数据集:

##Other   the URL below can not found, so comment the code temporary   1.2 其他数据集
#from urllib.request import urlopen
#data = np.loadtxt(urlopen("http://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"),delimiter=",")
#X = data[:,0:8]
#y = data [:,8]

然后就是数据预处理模块了:

#Preprocessing          2. 预处理
##Sequence Padding      2.1 序列填充
from keras.preprocessing import sequence
x_train4 = sequence.pad_sequences(x_train4,maxlen=80)
x_test4 = sequence.pad_sequences(x_test4,maxlen=80)
##One-Hot Encoding      2.2 独热编码
from keras.utils import to_categorical
Y_train = to_categorical(y_train, num_classes)
Y_test = to_categorical(y_test, num_classes)
Y_train3 = to_categorical(y_train3, num_classes)
Y_test3 = to_categorical(y_test3, num_classes)
##Train And Test Sets   2.3 训练和测试集合
#from sklearn.model_selection import train_test_split
#X_train5, X_test5, y_train5, y_test5 = train_test_split(X, y, test_size=0.33, random_state=42)

之后常用的方法是标准化数据:

#Standardization/Normalization    3. 标准化
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler().fit(x_train2)
standardized_X = scaler.transform(x_train2)
standardized_X_test = scaler.transform(x_test2)

下面就是重点,建立模型的模块:

#Model Architecture               4. 模型结构
##Sequential Model                4.1 序列模型
from keras.models import Sequential
model = Sequential()
##Multi-Layer Perceptron (MLP)    4.2 多层感知机
###Binary Classification          4.2.1 二分类
from keras.layers import Dense
#model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
#model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
#model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
###Multi-Class Classification     4.2.2 多分类
from keras.layers import Dropout
#model.add(Dense(512,activation='relu',input_shape=(784,)))
#model.add(Dropout(0.2))
#model.add(Dense(512,activation='relu'))
#model.add(Dropout(0.2))
#model.add(Dense(10,activation='softmax'))
###Regression                    4.2.3 回归
model.add(Dense(64, activation='relu', input_dim=train_data.shape[1]))
model.add(Dense(1))

当然,模型建立好之后我们需要有一些工具可以检测模型的结构,防止建立错误:

#Inspect Model                           5. 模型检查
##Model output shape                     5.1 模型输出形状 
model.output_shape
##Model summary representation           5.2 模型总结表示
model.summary()
##Model configuratio                     5.3 模型配置
model.get_config()
##List all weight tensors in the model   5.4 列出模型所有的权重张量
model.get_weights()

最后编译模型,就可以慢慢等待最后的结果啦,真的是非常简单方便:

#Compile Model                        6. 编译模型
##Multi-Layer Perceptron (MLP)        6.1 多层感知机
###MLP: Binary Classification         6.1.1 二分类
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
###MLP: Multi-Class Classification    6.1.2 多分类
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
###MLP: Regression                    6.1.3 回归 
model.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])