Theano: Logistic Regressionまでの道のり

Theanoの基本的シンボル操作から，基本的な微分を経て，Logistic Regressionまでの道のり．

基本

installation

ubuntu14.04の場合

$ sudo pip install theano

version確認

$ python -c "import theano; print theano.__version__"
0.6.0

import

取りあえずimportしておくmoduleなど

import numpy as np
import theano
import theano.tensor as T
from theano import function
from theano import shared

使い方の基礎

1. T.xxxでsymbolをつくる
2. functionで関数を作る

inputs: symbol
outputs: inputsの組み合わせのsymbol

3. 作ったfunctionに対して実際の値を入れて操作

symbolと書いたが，実体は基本，theano.tensor.var.TensorVariable．

Function

関数を作りたい場合

f = function([symbol_x, symboly, symbolz, ...], returned_symbol)

いろんな例

print "##### Basic Examples #####"
print "Scala Addition"
x = T.dscalar("x")
y = T.dscalar("y")
z = x + y
f = function([x, y], z)
print f(2, 1.3)

print "Vector Manipulation"
x = T.dvector("x")
y = T.dvector("y")
z = x.dot(y)
f = function([x, y], z)
x_ = np.random.rand(10)
y_ = np.random.rand(10)
print f(x_, y_)

print "Logistic Function for Vector"
x = T.dvector("x")
s = 1 / (1 + T.exp(-x))
f = function([x], s)
x_ = np.random.rand(10)
print f(x_)

print "Matrix Vecotr Product"
A = T.dmatrix("A")
b = T.dvector("b")
c = A.dot(b)
f = function([A, b], c)
A_ = np.random.rand(10, 10)
b_ = np.random.rand(10)
print f(A_, b_)

print "Broadcast"
A = T.dmatrix("A")
f = function([A], A * 5)
A_ = np.ones((5, 5))
print f(A_)

Shared Variables

関数内で状態をもてる

state = shared(0)

関数間でこの状態のシェアもできる.
普通にT.dxxxで作ったオブジェクトの様に使える．
ただし，shared variablesはfunctionのexplicit inputとして使えないので注意．

shared varialbeを使って，それのupdateができる．

function(inputs = [], outputs = [], updates = [(), (), ...]

accumulatorの例

state = shared(0)
x = T.dscalar("x")
accumulator = function([x], state, updates=[(state, state + x)])

givenがよくわからん，置き換えぽい (未検証)．

Derivatives

基本

T.grad(objective_function_symbol, with_respect_to_symbol_variable(s))

例

最適化数学では頻出の5つの例

${ \displaystyle \frac{d \mathbf{x}^T \mathbf{y}}{d \mathbf{x}} \\ \frac{d \mathbf{x}^T A \mathbf{x}}{d \mathbf{x}} \\ \frac{d \|\mathbf{w}\|_2^2} {d \mathbf{w}} \\ \frac{d \mathrm{Tr}(AX)}{d X} \\ \frac{d \log \det(X)}{d X} \\ }$

print "##### Derivatives #####"
print "Derivative of Vector-Vector Product w.r.t Vector"
x = T.dvector("x")
y = T.dvector("y")
z = T.dot(x, y)
gz_wrt_x = T.grad(z, x)
f = function([x, y], gz_wrt_x)  # d f(x) / dx
x_ = np.random.rand(5)
y_ = np.random.rand(5)
print f(x_, y_)
print y_  # recalculation

print "Derivative of Quadratic Form w.r.t. Vector"
x = T.dvector("x")
A = T.dmatrix("A")
z = x.dot(A).dot(x)
gz_wrt_x = T.grad(z, x)
f = function([x, A], gz_wrt_x)  # d f(x) / dx
x_ = np.random.rand(5)
A_ = np.random.rand(5, 5)
A_ = A_ + A_.T
print f(x_, A_)
print 2 * np.dot(A_, x_)  # recalculation

print "Derivative of 2-norm"
w = T.dvector("w")
z = T.grad(T.square(w.norm(L=2)) / 2, w)
f = function([w], z)
w_ = np.random.rand(5)
print f(w_, )
print w_  # recalculation

print "Derivative Trace of Matrix-Matrix Product"
A = T.dmatrix("A")
X = T.dmatrix("X")
z = A.dot(X).trace()
gz_wrt_X = T.grad(z, X)
f = function([A, X], gz_wrt_X)
A_ = np.random.rand(5, 5)
X_ = np.random.rand(5, 5)
print f(A_, X_)
print A_  # recalculation

print "Derivative of Log Determinant of Matrix"
X = T.dmatrix("X")
z = T.log(theano.sandbox.linalg.det(X))
gz_wrt_X = T.grad(z, X)
f = function([X], gz_wrt_X)
X_ = np.random.rand(5, 5)
X_ = (X_ + X_) / 2 + np.diag(np.ones(5) * 10)
X_ = np.random.rand(5, 5)
print f(X_)
print np.linalg.inv(X_)  # recalculation

gz_wrt_xを返すfunctionの引数は，zがとり得る引数を入れないといけないよう．
現状，目的関数はscalarでないといけないよう．

proposalを見るとmatrix derivativeはできないのかと思ったけど，目的関数がスカラなら大丈夫ぽい．

determinatはtheano.sandbox.linalgにあった．

The Matrix CookbookにやWikipediaに載っているような複雑な目的関数の行列微分ができるか未検証．

Logistic Regression

ここまでやると線形モデルの超基本，Logistic Regressionくらいはインプリできる．

一番簡単な2クラスのLogistic Regressionのサンプル．
Logistic Regressionは今回は，1次微分まで見るGradient Descentで解く．

データは2クラスがいいので，Breast Cancer DatasetをUCI ML Repoからとってくる．

目的関数

${ \displaystyle L(X, \mathbf{y}, \mathbf{w}, b) = \frac{1}{2} \|\mathbf{w}\|_2^2 + \frac{1}{n} \sum_{i=1}^{n} \ln (1 + \exp(- y_i f(\mathbf{x}_i))) }$

記法

${ \displaystyle X = (\mathbf{x}_1 \ldots \mathbf{x}_n) \\ \mathbf{y} = (y_1 \ldots y_n) \\ y \in \{-1, 1\} \\ \mathbf{w}, \mathbf{x} \in \mathbb{R}^d \\ f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b }$

コード

print "##### Logistic Regression solved with Gradient Descent #####"
# Loading dataset
data = np.loadtxt("/home/kzk/datasets/uci_csv/breast_cancer.csv")
y_ = data[:, 0]
X_ = data[:, 1:]
set_y = list(set(y_))
for i, label in enumerate(y_):
    if label == set_y[0]:
        y_[i] = 1
    else:
        y_[i] = -1
    pass

# Learning setting
## param
max_iter = 1000
w_threshold = 0.001
step_size = 0.01
d = X_.shape[1]
n = X_.shape[0]

## variables
w = shared(np.random.rand(d), name="w")
b = shared(np.random.rand(1)[0], name="b")
X = T.dmatrix("X")
y = T.dvector("y")
c = 0.1 # lambda
regularizer = (w ** 2).sum() / 2
loss = T.log((1 + T.exp(-y * (X.dot(w) + b)))).sum()
obj_func = c * regularizer + loss / n
grad_obj_func_wrt_w = T.grad(obj_func, w)
grad_obj_func_wrt_b = T.grad(obj_func, b)

## train/predict
train = function(
    inputs=[X, y],
    outputs=[w, b],
    updates=[(w, w - step_size * grad_obj_func_wrt_w), (b, b - step_size * grad_obj_func_wrt_b)]
)

x = T.dvector("x")
y_pred = T.dscalar("y")
prob = 1 / (1 + T.exp(- y_pred * (w.dot(x) + b)))
predict = function(
    inputs=[x, y_pred],
    outputs=prob
)

print "Learn"
w_prev = w.get_value()
cnt = 0
while True:
    cnt += 1
    train(X_, y_)
    w_cur = w.get_value()
    diff_w = np.linalg.norm(w_cur) - np.linalg.norm(w_prev)
    print cnt, np.abs(diff_w)
    if np.abs(diff_w) < w_threshold or cnt > max_iter:
        break
        pass
    w_prev = w_cur

print "Predict"
hit = 0
for i, z in enumerate(X_):
    pred_1 = predict(z, 1)
    pred__1 = predict(z, -1)
    
    print "1 probability is", pred_1
    print "-1 probability is", pred__1
    print "label is ", y_[i]
    pred_value = 1 if pred_1 >= pred__1 else -1
    hit += 1 if pred_value == y_[i] else 0

print "Accuracy is ", (100.0 * hit/len(y_)), " %"

trainning/validation/test datasetを全くわけてないけど，パラメータのノルム小さくなっているし，分類できてるし，過学習はとりあえず気にしない．

Tensor Cheat Sheet

http://deeplearning.net/software/theano/library/tensor/basic.html

KZKY memo

自分用メモ．