KZKY memo

自分用メモ.

Theano: Logistic Regressionまでの道のり

Theanoの基本的シンボル操作から,基本的な微分を経て,Logistic Regressionまでの道のり.

基本

installation

  • ubuntu14.04の場合
$ sudo pip install theano
  • version確認
$ python -c "import theano; print theano.__version__"
0.6.0

import

  • 取りあえずimportしておくmoduleなど
import numpy as np
import theano
import theano.tensor as T
from theano import function
from theano import shared

使い方の基礎

1. T.xxxでsymbolをつくる
2. functionで関数を作る

  • inputs: symbol
  • outputs: inputsの組み合わせのsymbol

3. 作ったfunctionに対して実際の値を入れて操作

symbolと書いたが,実体は基本,theano.tensor.var.TensorVariable.

Function

  • 関数を作りたい場合
f = function([symbol_x, symboly, symbolz, ...], returned_symbol)

いろんな例

print "##### Basic Examples #####"
print "Scala Addition"
x = T.dscalar("x")
y = T.dscalar("y")
z = x + y
f = function([x, y], z)
print f(2, 1.3)

print "Vector Manipulation"
x = T.dvector("x")
y = T.dvector("y")
z = x.dot(y)
f = function([x, y], z)
x_ = np.random.rand(10)
y_ = np.random.rand(10)
print f(x_, y_)

print "Logistic Function for Vector"
x = T.dvector("x")
s = 1 / (1 + T.exp(-x))
f = function([x], s)
x_ = np.random.rand(10)
print f(x_)

print "Matrix Vecotr Product"
A = T.dmatrix("A")
b = T.dvector("b")
c = A.dot(b)
f = function([A, b], c)
A_ = np.random.rand(10, 10)
b_ = np.random.rand(10)
print f(A_, b_)

print "Broadcast"
A = T.dmatrix("A")
f = function([A], A * 5)
A_ = np.ones((5, 5))
print f(A_)

Shared Variables

関数内で状態をもてる

state = shared(0)
  • 関数間でこの状態のシェアもできる.
  • 普通にT.dxxxで作ったオブジェクトの様に使える.
  • ただし,shared variablesはfunctionのexplicit inputとして使えないので注意.

shared varialbeを使って,それのupdateができる.

function(inputs = [], outputs = [], updates = [(), (), ...]
  • accumulatorの例
state = shared(0)
x = T.dscalar("x")
accumulator = function([x], state, updates=[(state, state + x)])

givenがよくわからん,置き換えぽい (未検証).

Derivatives

基本

T.grad(objective_function_symbol, with_respect_to_symbol_variable(s))

最適化数学では頻出の5つの例

{ \displaystyle
\frac{d \mathbf{x}^T \mathbf{y}}{d \mathbf{x}} \\
\frac{d \mathbf{x}^T A \mathbf{x}}{d \mathbf{x}} \\
\frac{d \|\mathbf{w}\|_2^2} {d \mathbf{w}} \\
\frac{d \mathrm{Tr}(AX)}{d X} \\
\frac{d \log \det(X)}{d X} \\
}

print "##### Derivatives #####"
print "Derivative of Vector-Vector Product w.r.t Vector"
x = T.dvector("x")
y = T.dvector("y")
z = T.dot(x, y)
gz_wrt_x = T.grad(z, x)
f = function([x, y], gz_wrt_x)  # d f(x) / dx
x_ = np.random.rand(5)
y_ = np.random.rand(5)
print f(x_, y_)
print y_  # recalculation

print "Derivative of Quadratic Form w.r.t. Vector"
x = T.dvector("x")
A = T.dmatrix("A")
z = x.dot(A).dot(x)
gz_wrt_x = T.grad(z, x)
f = function([x, A], gz_wrt_x)  # d f(x) / dx
x_ = np.random.rand(5)
A_ = np.random.rand(5, 5)
A_ = A_ + A_.T
print f(x_, A_)
print 2 * np.dot(A_, x_)  # recalculation

print "Derivative of 2-norm"
w = T.dvector("w")
z = T.grad(T.square(w.norm(L=2)) / 2, w)
f = function([w], z)
w_ = np.random.rand(5)
print f(w_, )
print w_  # recalculation

print "Derivative Trace of Matrix-Matrix Product"
A = T.dmatrix("A")
X = T.dmatrix("X")
z = A.dot(X).trace()
gz_wrt_X = T.grad(z, X)
f = function([A, X], gz_wrt_X)
A_ = np.random.rand(5, 5)
X_ = np.random.rand(5, 5)
print f(A_, X_)
print A_  # recalculation

print "Derivative of Log Determinant of Matrix"
X = T.dmatrix("X")
z = T.log(theano.sandbox.linalg.det(X))
gz_wrt_X = T.grad(z, X)
f = function([X], gz_wrt_X)
X_ = np.random.rand(5, 5)
X_ = (X_ + X_) / 2 + np.diag(np.ones(5) * 10)
X_ = np.random.rand(5, 5)
print f(X_)
print np.linalg.inv(X_)  # recalculation

gz_wrt_xを返すfunctionの引数は,zがとり得る引数を入れないといけないよう.
現状,目的関数はscalarでないといけないよう.

proposalを見るとmatrix derivativeはできないのかと思ったけど,目的関数がスカラなら大丈夫ぽい.

determinatはtheano.sandbox.linalgにあった.

The Matrix CookbookにやWikipediaに載っているような複雑な目的関数の行列微分ができるか未検証.

Logistic Regression

ここまでやると線形モデルの超基本,Logistic Regressionくらいはインプリできる.

一番簡単な2クラスのLogistic Regressionのサンプル.
Logistic Regressionは今回は,1次微分まで見るGradient Descentで解く.

データは2クラスがいいので,Breast Cancer DatasetをUCI ML Repoからとってくる.

目的関数

{ \displaystyle
L(X, \mathbf{y}, \mathbf{w}, b) = \frac{1}{2} \|\mathbf{w}\|_2^2 + \frac{1}{n} \sum_{i=1}^{n} \ln (1 + \exp(- y_i f(\mathbf{x}_i)))
}

記法

{ \displaystyle
X = (\mathbf{x}_1 \ldots \mathbf{x}_n) \\
\mathbf{y} = (y_1 \ldots y_n) \\
y \in \{-1, 1\} \\
\mathbf{w}, \mathbf{x} \in \mathbb{R}^d \\
f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b
}

コード

print "##### Logistic Regression solved with Gradient Descent #####"
# Loading dataset
data = np.loadtxt("/home/kzk/datasets/uci_csv/breast_cancer.csv")
y_ = data[:, 0]
X_ = data[:, 1:]
set_y = list(set(y_))
for i, label in enumerate(y_):
    if label == set_y[0]:
        y_[i] = 1
    else:
        y_[i] = -1
    pass

# Learning setting
## param
max_iter = 1000
w_threshold = 0.001
step_size = 0.01
d = X_.shape[1]
n = X_.shape[0]

## variables
w = shared(np.random.rand(d), name="w")
b = shared(np.random.rand(1)[0], name="b")
X = T.dmatrix("X")
y = T.dvector("y")
c = 0.1 # lambda
regularizer = (w ** 2).sum() / 2
loss = T.log((1 + T.exp(-y * (X.dot(w) + b)))).sum()
obj_func = c * regularizer + loss / n
grad_obj_func_wrt_w = T.grad(obj_func, w)
grad_obj_func_wrt_b = T.grad(obj_func, b)

## train/predict
train = function(
    inputs=[X, y],
    outputs=[w, b],
    updates=[(w, w - step_size * grad_obj_func_wrt_w), (b, b - step_size * grad_obj_func_wrt_b)]
)

x = T.dvector("x")
y_pred = T.dscalar("y")
prob = 1 / (1 + T.exp(- y_pred * (w.dot(x) + b)))
predict = function(
    inputs=[x, y_pred],
    outputs=prob
)

print "Learn"
w_prev = w.get_value()
cnt = 0
while True:
    cnt += 1
    train(X_, y_)
    w_cur = w.get_value()
    diff_w = np.linalg.norm(w_cur) - np.linalg.norm(w_prev)
    print cnt, np.abs(diff_w)
    if np.abs(diff_w) < w_threshold or cnt > max_iter:
        break
        pass
    w_prev = w_cur

print "Predict"
hit = 0
for i, z in enumerate(X_):
    pred_1 = predict(z, 1)
    pred__1 = predict(z, -1)
    
    print "1 probability is", pred_1
    print "-1 probability is", pred__1
    print "label is ", y_[i]
    pred_value = 1 if pred_1 >= pred__1 else -1
    hit += 1 if pred_value == y_[i] else 0

print "Accuracy is ", (100.0 * hit/len(y_)), " %"

trainning/validation/test datasetを全くわけてないけど,パラメータのノルム小さくなっているし,分類できてるし,過学習はとりあえず気にしない.