reduce_xxxは，keep_dims=Trueとしないとtensor-rankが1つ落ちる．落ちたTensorの次元は，長さ1になる．defaultはFalse. reduction_indicesを指定できて，defaultはNoneで，scalarが返ってくる．all, anyは，"logical and", "logical or"accumulateだけは，element-wise.

sample codes

Segmentation

tf.segment_sum
tf.segment_prod
tf.segment_min
tf.segment_max
tf.segment_mean
tf.unsorted_segment_sum
tf.sparse_segment_sum
tf.sparse_segment_mean

segmentationの中であるoperationをelement-wiseで行う感じ．segmentaionはtensorにおける0次元目で行われる．segmentation_idsはtensorの0次元目の長さと同じでなければならない．segmentation_idsの要素は，0次元目の長さを超えてはならない．segmentation_idsの要素は，連続な値を並べていく(e.g., [0, 0, 1, 2, 2])．sortされてないとイケない．

連続した値を使わなくてもいい関数もある(unsorted_segment_xxx)

選択した次元に対して，segment_xxxをする関数もある(sparse_segment_xxx)

ここの図を見るといい．

sample codes

Sequence Comparison and Indexing

tf.argmin
tf.argmax
tf.listdiff
tf.where
tf.unique
tf.edit_distance
tf.invert_permutation

tf.argmin, tf.argmax はreductionなんだけど，なぜかここにある．

sample codes

参考

https://www.tensorflow.org/versions/v0.6.0/api_docs/python/math_ops.html

2016-02-08

Theano: Scan Op

python theano

現時点でTheano(0.7.0)．いまさらながら，scan opをちゃんと見てみることにした．scan opのいいところは，variable length input/outputに対応できること，および，loopの中に，symbolに対してconditionを付けられること．

基本的に，このページを順に見ていく．

サンプルコードは，ここにある．

Simple loop with accumulation: Computing A^k

${ \displaystyle A^k }$

をコードで書きたいときには，普通，次の様に書く

non-scan power code

result = 1
for i in range(k):
    result = result * A

これを見ると，

resultの初期値化
変わらない変数 A
関数 (この場合はaccumulation)

がある．これをscanの引数に対応させると，

resultの初期値化: output_info
変わらない変数 A: non_sequences
関数 (この場合はaccumulation): lambda or named function

となる．これは最も簡単なケース．

上記コードを，Scanに置き換える場合

import theano
import theano.tensor as T

k = T.iscalar("k")
A = T.vector("A")

# Symbolic description of the result
result, updates = theano.scan(fn=lambda prior_result, A: prior_result * A,
                              outputs_info=T.ones_like(A),
                              non_sequences=A,
                              n_steps=k)

# Optimization saving memory.
final_result = result[-1]

# Compiled function that returns A**k
power = theano.function(inputs=[A, k], outputs=final_result, updates=updates)

print power(range(10), 2)
print power(range(10), 4)

result自体は，list of Tesnorで返ってくる．初めの次元がtime step．この例では，computational graphを作る前に，result[-1]のみをtheano.functionの引数にすることで，メモリ最適化をしている．また，n_stepsを指定しており，variable lengthではないので注意．

Iterating over the first dimension of a tensor: Calculating a polynomial

通常のfor x in a_listのloopのように(a_listがlist of list of ...を前提とする)，初めの次元でイテレーションができる．その時には，scan引数のsequecencesを使用する．

多項式のサンプル

import numpy
import theano
import theano.tensor as T

coefficients = theano.tensor.vector("coefficients")
x = T.scalar("x")

max_coefficients_supported = 10000

# Generate the components of the polynomial
components, updates = theano.scan(fn=lambda coefficient, power, free_variable: coefficient * (free_variable ** power),
                                  outputs_info=None,
                                  sequences=[coefficients, theano.tensor.arange(max_coefficients_supported)],
                                  non_sequences=x)
# Sum them up
polynomial = components.sum()

# Compile a function
calculate_polynomial = theano.function(inputs=[coefficients, x], outputs=polynomial)

# Test
test_coefficients = numpy.asarray([1, 0, 2], dtype=numpy.float32)
test_value = 3
print calculate_polynomial(test_coefficients, test_value)
print 1.0 * (3 ** 0) + 0.0 * (3 ** 1) + 2.0 * (3 ** 2)

このサンプルで確認したいのは次の4点．

サンプルでは多項式の項を全部返して最後に，components.sum()しているが，scanの中でcumsumしてから最後の項を取るほうがメモリ最適になる
output_info=Noneにすると，fnの引数にprevious resultを入れないで良い
python's enumerateのシミュレートとして，theano.thensor.arange(n)を使う．itereated indicesを使いたい場合に使うんだろう
sequencesに同じ長さでない変数を入れると，長さが小さい方に合わさる様に打ち切られる

重要なのはscanの引数がどうfnの引数に渡るのか

sequences (if any), prior result(s) (if needed), non-sequences (if any)

のように渡る．入力，出力，それ以外と覚えればいいか．

Simple accumulation into a scalar, ditching lambda

Named Function Sample

import numpy as np
import theano
import theano.tensor as T

up_to = T.iscalar("up_to")

# define a named function, rather than using lambda
def accumulate_by_adding(arange_val, sum_to_date):
    return sum_to_date + arange_val
seq = T.arange(up_to)

# An unauthorized implicit downcast from the dtype of 'seq', to that of
# 'T.as_tensor_variable(0)' which is of dtype 'int8' by default would occur
# if this instruction were to be used instead of the next one:
# outputs_info = T.as_tensor_variable(0)

outputs_info = T.as_tensor_variable(np.asarray(0, seq.dtype))
scan_result, scan_updates = theano.scan(fn=accumulate_by_adding,
                                        outputs_info=outputs_info,
                                        sequences=seq)
triangular_sequence = theano.function(inputs=[up_to], outputs=scan_result)

# test
some_num = 15
print triangular_sequence(some_num)
print [n * (n + 1) // 2 for n in xrange(some_num)]

注意するのは，

output_infoは関数の出力と同じshapeであること
output_infoで，暗示できなdowncastがあるとエラー

Another simple example

このサンプルはscanを使わないと再現が難しい．(らしい)

特に難しいことはやっていないが，symbolでmatrix, location, valuesを用意する．
scanのinner funcの中で，特定のlocationにvalueを入れたmatrixを各time stepで返却する．

another simple example

import numpy as np
import theano
import theano.tensor as T

location = T.imatrix("location")
values = T.vector("values")
output_model = T.matrix("output_model")

def set_value_at_position(a_location, a_value, output_model):
    zeros = T.zeros_like(output_model)
    zeros_subtensor = zeros[a_location[0], a_location[1]]
    return T.set_subtensor(zeros_subtensor, a_value)

result, updates = theano.scan(fn=set_value_at_position,
                              outputs_info=None,
                              sequences=[location, values],
                              non_sequences=output_model)

assign_values_at_positions = theano.function(inputs=[location, values, output_model], outputs=result)

# test
test_locations = np.asarray([[1, 1], [2, 3]], dtype=np.int32)
test_values = np.asarray([42, 50], dtype=np.float32)
test_output_model = np.zeros((5, 5), dtype=np.float32)
print assign_values_at_positions(test_locations, test_values, test_output_model)

Using shared variables - Gibbs sampling

この辺から，shared variableとupdateの関係の説明がはいってくる.
観測されている変数と隠れ変数でパラメータをシェアしている場合に，それらのレイヤー間で，サンプルとパラメータで線形変換，その結果を2項分布の平均に使ってサンプリング，そのサンプルとパラメータで線形変換，その結果を2項分布の平均に使ってまたサンプリング，している(多分 RBM系の例だと思う)．

Gibbs Sampling Sample

import theano
import numpy as np
from theano import tensor as T

# Paramter initial values
W_values = np.random.rand(10, 5).astype(np.float32)
bvis_values = np.random.rand(10).astype(np.float32)
bhid_values = np.random.rand(10).astype(np.float32)

# Paramter symbols
W = theano.shared(W_values)
bvis = theano.shared(bvis_values)
bhid = theano.shared(bhid_values)

trng = T.shared_randomstreams.RandomStreams(1234)

def OneStep(vsample):
    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)
    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)
    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)
    return trng.binomial(size=vsample.shape, n=1, p=vmean,
                         dtype=theano.config.floatX)

sample = theano.tensor.vector()

values, updates = theano.scan(OneStep, outputs_info=sample, n_steps=10)

gibbs10 = theano.function([sample], values[-1], updates=updates)

print gibbs10([np.random.rand(5)])

updates dictionaryは後でも使える
updatesをfunctionに渡さないと，scanの中でupdateされる変数はupdateされない

上記例では，shared varialbesをscanに渡していないが，scanは，ちゃんと判断してくれる．最適化を考えると渡したほうがいい．その場合は，non_sequencesに渡す．

Using shared variables - the strict flag

最適化を考えると，scanの引数にshared variablesを渡したほうがいい．そこで，それを強制する引数がある．scan(..., strict=True, ...)とすると，fnで必要な，shared varialbeがちゃんと，non_sequencesに入っているかチェックしてくれる．入っていないとエラーを出す．

Multiple outputs, several taps values - Recurrent Neural Network with Scan

複雑な例．
今までは，scanの引数のsequences, output_info, non_sequencesにlist of nondictを入れていたが，dictとかlist of dictを入れる．sequencesとoutput_infoは，list of symbol or dictionaryが引数として許容されている様．

${ \displaystyle x(n) = tanh(W x(n - 1) + W^{in}_1 u(n) + W^{in}_2 u(n - 4) + W^{feedback} y(n - 1)) \\ y(n) = W^{out} x(n - 3) }$

上記式のRNN（全く意味を持たない単なるサンプル）

import theano
import numpy as np
from theano import tensor as T

def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out):

    x_t = T.tanh(theano.dot(x_tm1, W) +
                 theano.dot(u_t, W_in_1) +
                 theano.dot(u_tm4, W_in_2) +
                 theano.dot(y_tm1, W_feedback))
    y_t = theano.dot(x_tm3, W_out)

    return [x_t, y_t]

W_ = np.random.rand(10, 10).astype(np.float32)
W_in_1_ = np.random.rand(10, 10).astype(np.float32)
W_in_2_ = np.random.rand(10, 10).astype(np.float32)
W_feedback_ = np.random.rand(10, 10).astype(np.float32)
W_out_ = np.random.rand(10, 10).astype(np.float32)
    
W = theano.shared(W_)
W_in_1 = theano.shared(W_in_1_)
W_in_2 = theano.shared(W_in_2_)
W_feedback = theano.shared(W_feedback_)
W_out = theano.shared(W_out_)

u = T.matrix()    # it is a sequence of vectors
x0 = T.matrix()  # initial state of x has to be a matrix, since it has to cover x[-3]
y0 = T.vector()  # y0 is just a vector since scan has only to provide y[-1]

([x_vals, y_vals], updates) = theano.scan(fn=oneStep,
                                          sequences=dict(input=u, taps=[-4, -0]),
                                          outputs_info=[dict(initial=x0, taps=[-3, -1]), y0],
                                          non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
                                          strict=True)
# for second input y, scan adds -1 in output_taps by default

func = theano.function(inputs=[u, x0, y0], outputs=[x_vals, y_vals], updates=updates)

u_ = np.random.rand(50, 10).astype(np.float32)
x0_ = np.random.rand(5, 10).astype(np.float32)
y0_ = np.random.rand(10).astype(np.float32)

print func(u_, x0_, y0_)

注意するのは，x(n -3)まで見ているので，xは，matrix(0-dが4次元以上)でなくてはならない．過去の出力(x_t)を，x0にとってくれているのだろうか?

sequences=dict(input=u, taps[-4, 0])となっているので，uvals = [0,1,2,3,4,5,6,7,8]とした時に，u[-4]がuvals[0]となって，u[0]がuvals[4]から始まるのにも注意．

Conditional ending of Scan

symbolに対してconditionを付けられる. repeat-untilの例．

conditionで使える引数は，fnの引数
conditionはfnの戻り値の最後につける

import theano
from theano import tensor as T

def power_of_2(previous_power, max_value):
    return previous_power*2, theano.scan_module.until(previous_power*2 > max_value)

max_value = T.scalar()
values, _ = theano.scan(power_of_2,
                        outputs_info=T.constant(1.),
                        non_sequences=max_value,
                        n_steps=1024)

f = theano.function([max_value], values)

print(f(45))

Optimizing Scan’s performance

あまりscanは使わないようにする
inputは明示的にfnに入れる
scanでgcしない，"config.scan.allow_gc" or "allow_gc"を使う
LSTMとかそうだけれど，細かいmatrix mulitplicationsをするより，大きい行列にconcatしてから一回matrix mulitplicationsしたほうがメモリは食うけど早い．

参考

2016-01-23

Celery with Multiprocessing and SQLAlchemy

python celery sqlalchemy mysql

celeryでMultiprocessingをするときにどういう挙動をするのか気になったので，調査．

1. 普通にMultiprocessing
2. SQLAlchemyでsessionを作ってから，Multiprocessing

を検証する．

1のコードはここ
2のコードはここ

普通にMultiprocessing

AssertionError: daemonic processes are not allowed to have children

がでる

github issueによると，

$ export PYTHONOPTIMIZE=1

で取り敢えず直る．PYTHONOPTIMIZEは，disable Assertのoptimizationらしい．
でているエラーは，AssertionErrorなので，取り敢えず，抑えられた．supervisordで管理しているときはどうなんだろう(未調査)?

SQLAlchemyでsessionを作ってから，Multiprocessing

The above Session is associated with our SQLite-enabled Engine, but it hasn’t opened any connections yet. When it’s first used, it retrieves a connection from a pool of connections maintained by the Engine, and holds onto it until we commit all changes and/or close the session object.

のように，connection poolを持っているようだが．forkする前に

session = Session()

しても特に問題なかったよう．逆に，workerの中で

session = Session()

をするとWorker分しか，recordができないのでconnectionが切れている様に感じる．

なので，forkする前にsession = Session()しておくこと．

参考

2016-01-10

TensorFlow: Control Dependency

tensorflow python deeplearning

Contol Dependencyをさわってみた

結論からいうと，cpu上でやるif, thenみたいな細かいflow制御というより，あるTensorと別のあるTensorの値をelement-wiseで比較して，大きい方をとるみたいな感じで使う．

サンプル

#!/usr/bin/env python
import numpy as np
import tensorflow as tf

b = tf.constant(np.random.rand(5), name="b")
x = tf.Variable(np.random.rand(10, 5), name="x")
W = tf.Variable(np.random.rand(10, 10), name="W")
h = tf.matmul(W, x) + b

c = tf.constant(np.random.rand(5), name="c")
y = tf.Variable(np.random.rand(10, 5), name="y")
V = tf.Variable(np.random.rand(10, 10), name="V")
g = tf.matmul(V, y) + c

with tf.control_dependencies([h, g]):
    #h_sum = tf.reduce_sum(h)
    #g_sum = tf.reduce_sum(g)
    # 
    #if tf.greater(h_sum, g_sum): # can not execute eval here, so that this is always True
    #    f = tf.Variable(1)
    #else:
    #    f = tf.Variable(0)

    condition = tf.greater(h, g)
    f = tf.select(condition, h, g)

init_op = tf.initialize_all_variables()

with tf.Session() as sess:
    sess.run(init_op)
    
    #ret_h = sess.run(h)
    #ret_g = sess.run(g)
    #print ret_h, ret_g

    #ret_f, ret_h_sum, ret_g_sum = sess.run([f, h_sum, g_sum])
    ret_f = sess.run(f)

    #print ret_h_sum, ret_g_sum
    print ret_f

参考

https://www.tensorflow.org/versions/master/api_docs/python/control_flow_ops.html#control-flow

2016-01-10

TensorFlow: tf.get_colleciton()とtf.add_to_collection()

python tensorflow deeplearning

なぞのメソッドtf.get_collection()とtf.get_to_collection()を調べた．

API時には，

tf.Graph.get_collection(name, scope=None)
Returns a list of values in the collection with the given name.
Args:
key: The key for the collection. For example, the GraphKeys class contains many standard names for collections.
scope: (Optional.) If supplied, the resulting list is filtered to include only items whose name begins with this string.
Returns:
The list of values in the collection with the given name, or an empty list if no value has been added to that collection. The list contains the values in the order under which they were collected

こう書いてある．key = nameとみていいだろう．scopeはvarialbe_scopeでない，name_scopeなので注意．

公式サンプルでは，cifar10_multi_gpu_train.pyとcifar10.pyで使用されている．

GraphKeysに基本的なkey nameが入っているというので，見てみると，

'QUEUE_RUNNERS',
'SUMMARIES',
'TABLE_INITIALIZERS',
'TRAINABLE_VARIABLES',
'VARIABLES'

がクラス変数にある．APIの説明と名前を見る感じだとvariableとか作ったら，これらのmap<key, list>の変数に入ると思われる．なにも考えない(Graph object を明示的に作らないで，tf.Varialbe()するとか)と，default graphのcollectionsに入る.

サンプルコード

#!/usr/bin/env python

import tensorflow as tf
import numpy as np


# Collection keys
print "### Collection keys ###"
print dir(tf.GraphKeys)
print ""

# Show values in VARIABLES collections
print "### Show values in VARIABLES collections ###"
print tf.get_collection(tf.GraphKeys.VARIABLES)
print ""

# Create Varialbes and show the colllections again
print "### Create Varialbes and show the colllections again ###"
x = tf.Variable(np.random.rand(5, 5, 5), name="x")
y = tf.Variable(np.random.rand(5, 5, 5), name="y")
c = tf.constant(np.random.rand(5, 5, 5), name="c")
init_op = tf.initialize_all_variables()
w = tf.get_variable("w", shape=(5, 5), initializer=tf.random_normal_initializer())

print "VARIABLES", tf.get_collection(tf.GraphKeys.VARIABLES)
print "TRAINABLE_VARIABLES", tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
print "TABLE_INITIALIZERS", tf.get_collection(tf.GraphKeys.TABLE_INITIALIZERS)
print "SUMMARIES", tf.get_collection(tf.GraphKeys.SUMMARIES)
print "QUEUE_RUNNERS", tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)
print ""

# Add somthing to any collection and show that
print "### Add somthing to any collection ###"
sample = tf.get_collection("sample")
sample.append(x)
print tf.get_collection("sample")

tf.add_to_collection("sample", x)
tf.add_to_collection("sample", y)
print tf.get_collection("sample")
print ""

# Add somthing to any collection and show that with scope filter
print "### Add somthing to any collection and show that with scope filter ###"
tf.add_to_collection("sample", x)
with tf.name_scope("name_scope") as scope:
    print tf.get_collection("sample", scope)
    z = tf.Variable(np.random.rand(5, 5, 5), name="z")
    tf.add_to_collection("sample", z)
    
print len(tf.get_collection("sample"))
print len(tf.get_collection("sample", scope))

print x.name
print y.name
print z.name

一番確認したいのは，name_scopeを作って，その中でvarialbeを作る．その中であるcollectionに入れる．scopeでちゃんとフィルタできているかどうか．
こうすることで，同じcollectionでも，scopeでフィルタできる

TABLE_INITIALIZERSがなぞのまま．個人的には未解決．

参考

2016-01-06

TensorFlow: Graphをもう少し理解

tensorflow python deeplearning

Graphをもう少し理解する．

Default graph
Create another graph in this thread (main thread)
Graph in multi thread
Write graph as protbuf to disk
Read graph from disk and to Graph

の5通りのサンプル．チェックポイントからの復帰はやってない.

チェックポイントからの復帰は，これを参考にすればできるはず．

tf.import_graph_def(graph_def)でimportしたときには，opsは，default_graphに追加される．別のgraphに追加したいなら，g = tf.Graph()して，g.as_default()して，そのコンテキストの中で，tf.import_graph_def(graph_def)すること．

#!/usr/bin/env python

import tensorflow as tf
import threading
import numpy as np

class GraphWorker(threading.Thread):
    """
    """
    
    def __init__(self, index):
        """
        """
        super(GraphWorker, self).__init__()
        self.index = index
        
        pass

    def run(self):

        g = tf.get_default_graph()
        #print "Graph in main thread", g

        g_local = tf.Graph()
        #print "Graph in this thread", g_local

def main():
    # Default graph
    """
    Confirm that a default Graph is always registered, and accessible by calling tf.get_default_graph(). To add an operation to the default graph, simply call one of the functions that defines a new Operation:
    """
    c = tf.constant(4.0)
    assert c.graph is tf.get_default_graph()

    # Create another graph in this thread (main thread)
    """
    Confirm that tf.Graph.as_default() method should be used if you want to create multiple graphs in the same process.
    """
    with tf.Graph().as_default() as g:
        c = tf.constant(30.0)
        assert c.graph is g
        d = tf.constant(40.0)
        assert d.graph is g
        e = tf.Variable(np.random.rand(10, 10), name="e")
        
    assert tf.get_default_graph() != g

    # Graph in multi thread
    """
    The default graph is a property of the current thread. If you create a new thread, and wish to use the default graph in that thread, you must explicitly add a with g.as_default(): in that thread's function.
    """
    n = 4
    graph_workers = []
    for i in xrange(n):
        worker = GraphWorker(i)
        worker.start()
        graph_workers.append(worker)

    for i in xrange(n):
        graph_workers[i].join()

    # Write graph as protbuf to disk
    print g.get_operations()
    print len(g.get_operations())
    tf.train.write_graph(g.as_graph_def(), "./graph_dir_text", "./graph.pbtxt")
    tf.train.write_graph(g.as_graph_def(), "./graph_dir", "./graph.pb", as_text=False)

    # Read graph from disk and to Graph
    print tf.get_default_graph()
    with open("./graph_dir/graph.pb", "rb") as fp:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(fp.read())
        tf.import_graph_def(graph_def)
    print tf.get_default_graph()

    print "----- ops in g -----"
    for op in g.get_operations():
        print op.name

    print "----- ops in default graph after import_graph_def -----"
    for op in tf.get_default_graph().get_operations():
        print op.name
    
    pass


if __name__ == '__main__':
    main()