Theano: Logistic Regressionまでの道のり
Theanoの基本的シンボル操作から,基本的な微分を経て,Logistic Regressionまでの道のり.
基本
installation
- ubuntu14.04の場合
$ sudo pip install theano
- version確認
$ python -c "import theano; print theano.__version__" 0.6.0
import
- 取りあえずimportしておくmoduleなど
import numpy as np import theano import theano.tensor as T from theano import function from theano import shared
使い方の基礎
1. T.xxxでsymbolをつくる
2. functionで関数を作る
- inputs: symbol
- outputs: inputsの組み合わせのsymbol
3. 作ったfunctionに対して実際の値を入れて操作
symbolと書いたが,実体は基本,theano.tensor.var.TensorVariable.
Function
- 関数を作りたい場合
f = function([symbol_x, symboly, symbolz, ...], returned_symbol)
いろんな例
print "##### Basic Examples #####" print "Scala Addition" x = T.dscalar("x") y = T.dscalar("y") z = x + y f = function([x, y], z) print f(2, 1.3) print "Vector Manipulation" x = T.dvector("x") y = T.dvector("y") z = x.dot(y) f = function([x, y], z) x_ = np.random.rand(10) y_ = np.random.rand(10) print f(x_, y_) print "Logistic Function for Vector" x = T.dvector("x") s = 1 / (1 + T.exp(-x)) f = function([x], s) x_ = np.random.rand(10) print f(x_) print "Matrix Vecotr Product" A = T.dmatrix("A") b = T.dvector("b") c = A.dot(b) f = function([A, b], c) A_ = np.random.rand(10, 10) b_ = np.random.rand(10) print f(A_, b_) print "Broadcast" A = T.dmatrix("A") f = function([A], A * 5) A_ = np.ones((5, 5)) print f(A_)
Shared Variables
関数内で状態をもてる
state = shared(0)
- 関数間でこの状態のシェアもできる.
- 普通にT.dxxxで作ったオブジェクトの様に使える.
- ただし,shared variablesはfunctionのexplicit inputとして使えないので注意.
shared varialbeを使って,それのupdateができる.
function(inputs = [], outputs = [], updates = [(), (), ...]
- accumulatorの例
state = shared(0) x = T.dscalar("x") accumulator = function([x], state, updates=[(state, state + x)])
givenがよくわからん,置き換えぽい (未検証).
Derivatives
基本
T.grad(objective_function_symbol, with_respect_to_symbol_variable(s))
例
最適化数学では頻出の5つの例
print "##### Derivatives #####" print "Derivative of Vector-Vector Product w.r.t Vector" x = T.dvector("x") y = T.dvector("y") z = T.dot(x, y) gz_wrt_x = T.grad(z, x) f = function([x, y], gz_wrt_x) # d f(x) / dx x_ = np.random.rand(5) y_ = np.random.rand(5) print f(x_, y_) print y_ # recalculation print "Derivative of Quadratic Form w.r.t. Vector" x = T.dvector("x") A = T.dmatrix("A") z = x.dot(A).dot(x) gz_wrt_x = T.grad(z, x) f = function([x, A], gz_wrt_x) # d f(x) / dx x_ = np.random.rand(5) A_ = np.random.rand(5, 5) A_ = A_ + A_.T print f(x_, A_) print 2 * np.dot(A_, x_) # recalculation print "Derivative of 2-norm" w = T.dvector("w") z = T.grad(T.square(w.norm(L=2)) / 2, w) f = function([w], z) w_ = np.random.rand(5) print f(w_, ) print w_ # recalculation print "Derivative Trace of Matrix-Matrix Product" A = T.dmatrix("A") X = T.dmatrix("X") z = A.dot(X).trace() gz_wrt_X = T.grad(z, X) f = function([A, X], gz_wrt_X) A_ = np.random.rand(5, 5) X_ = np.random.rand(5, 5) print f(A_, X_) print A_ # recalculation print "Derivative of Log Determinant of Matrix" X = T.dmatrix("X") z = T.log(theano.sandbox.linalg.det(X)) gz_wrt_X = T.grad(z, X) f = function([X], gz_wrt_X) X_ = np.random.rand(5, 5) X_ = (X_ + X_) / 2 + np.diag(np.ones(5) * 10) X_ = np.random.rand(5, 5) print f(X_) print np.linalg.inv(X_) # recalculation
gz_wrt_xを返すfunctionの引数は,zがとり得る引数を入れないといけないよう.
現状,目的関数はscalarでないといけないよう.
proposalを見るとmatrix derivativeはできないのかと思ったけど,目的関数がスカラなら大丈夫ぽい.
determinatはtheano.sandbox.linalgにあった.
The Matrix CookbookにやWikipediaに載っているような複雑な目的関数の行列微分ができるか未検証.
Logistic Regression
ここまでやると線形モデルの超基本,Logistic Regressionくらいはインプリできる.
一番簡単な2クラスのLogistic Regressionのサンプル.
Logistic Regressionは今回は,1次微分まで見るGradient Descentで解く.
データは2クラスがいいので,Breast Cancer DatasetをUCI ML Repoからとってくる.
目的関数
記法
コード
print "##### Logistic Regression solved with Gradient Descent #####" # Loading dataset data = np.loadtxt("/home/kzk/datasets/uci_csv/breast_cancer.csv") y_ = data[:, 0] X_ = data[:, 1:] set_y = list(set(y_)) for i, label in enumerate(y_): if label == set_y[0]: y_[i] = 1 else: y_[i] = -1 pass # Learning setting ## param max_iter = 1000 w_threshold = 0.001 step_size = 0.01 d = X_.shape[1] n = X_.shape[0] ## variables w = shared(np.random.rand(d), name="w") b = shared(np.random.rand(1)[0], name="b") X = T.dmatrix("X") y = T.dvector("y") c = 0.1 # lambda regularizer = (w ** 2).sum() / 2 loss = T.log((1 + T.exp(-y * (X.dot(w) + b)))).sum() obj_func = c * regularizer + loss / n grad_obj_func_wrt_w = T.grad(obj_func, w) grad_obj_func_wrt_b = T.grad(obj_func, b) ## train/predict train = function( inputs=[X, y], outputs=[w, b], updates=[(w, w - step_size * grad_obj_func_wrt_w), (b, b - step_size * grad_obj_func_wrt_b)] ) x = T.dvector("x") y_pred = T.dscalar("y") prob = 1 / (1 + T.exp(- y_pred * (w.dot(x) + b))) predict = function( inputs=[x, y_pred], outputs=prob ) print "Learn" w_prev = w.get_value() cnt = 0 while True: cnt += 1 train(X_, y_) w_cur = w.get_value() diff_w = np.linalg.norm(w_cur) - np.linalg.norm(w_prev) print cnt, np.abs(diff_w) if np.abs(diff_w) < w_threshold or cnt > max_iter: break pass w_prev = w_cur print "Predict" hit = 0 for i, z in enumerate(X_): pred_1 = predict(z, 1) pred__1 = predict(z, -1) print "1 probability is", pred_1 print "-1 probability is", pred__1 print "label is ", y_[i] pred_value = 1 if pred_1 >= pred__1 else -1 hit += 1 if pred_value == y_[i] else 0 print "Accuracy is ", (100.0 * hit/len(y_)), " %"
trainning/validation/test datasetを全くわけてないけど,パラメータのノルム小さくなっているし,分類できてるし,過学習はとりあえず気にしない.
参考
- http://deeplearning.net/software/theano/
- http://deeplearning.net/software/theano/tutorial/index.html#tutorial
- http://deeplearning.net/software/theano/library/index.html#libdoc
- http://deeplearning.net/software/theano/library/tensor/basic.html
- http://deeplearning.net/software/theano/tutorial/gradients.html
- http://deeplearning.net/software/theano/library/tensor/nlinalg.html
- http://deeplearning.net/software/theano/library/tensor/nnet/index.html
- http://deeplearning.net/software/theano/tutorial/loop.html
- http://deeplearning.net/software/theano/faq.html#faq