python: multiprocessing 2
前回zipfileのunzipをpythonのmultithreadingで行って全然早くなっていないことを確認したが,
- image 2 ndarray
- pickle 2 ndarray
だとどうなるのが調べてみた.
pickleはndarray.dumpしたもの.
実験設定と環境
- OS
- Ubuntu14.04
- CPU
- Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz
- Disk
- Hitachi HTS54756
- FS
- Library
- scipy: 0.13.3
- opencv: 2.4.8
- pil : 1.1.7
- Data
- cifer10 (obtained from Kaggle so that it is jpeg)
- 50000 images
- 総サイズ: 197M
- 平均サイズ: 3.94 K
実験内容
実験1
- scipy, PIL, opencv2のimreadの違いを見る
- それぞれのimage2ndarrayの時間を計測
実験2
- 実験1 で一番速度が早かったLibを選ぶ
- multithreadにした時の速度を計測
- image2ndarrayとpickle2ndarrayの2つの場合の速度差をみる
実験1
コード
- comp_pil_imread.py
#!/usr/bin/env python """ To check if diffrence in speed exists. """ from PIL import Image from scipy.misc import imread import cv2 import numpy as np import glob import time @profile def read_with_pil(): """ """ base_dirpath = "/home/kzk/datasets/cifar10/train" filepaths = glob.glob("{}/*".format(base_dirpath)) for i, filepath in enumerate(filepaths): if i % 1000 == 0: print i I = np.asarray(Image.open(filepath)) def main(): read_with_pil() pass if __name__ == '__main__': main()
- comp_cv2_imread.py
#!/usr/bin/env python """ To check if diffrence in speed exists. """ from PIL import Image from scipy.misc import imread import cv2 import numpy as np import glob import time @profile def read_with_cv2(): """ """ base_dirpath = "/home/kzk/datasets/cifar10/train" filepaths = glob.glob("{}/*".format(base_dirpath)) for i, filepath in enumerate(filepaths): if i % 1000 == 0: print i I = cv2.imread(filepath) def main(): read_with_cv2() pass if __name__ == '__main__': main()
- comp_scipy_imread.py
#!/usr/bin/env python """ To check if diffrence in speed exists. """ from PIL import Image from scipy.misc import imread import cv2 import numpy as np import glob import time @profile def read_with_scipy(): """ """ base_dirpath = "/home/kzk/datasets/cifar10/train" filepaths = glob.glob("{}/*".format(base_dirpath)) for i, filepath in enumerate(filepaths): if i % 1000 == 0: print i I = imread(filepath) def main(): read_with_scipy() pass if __name__ == '__main__': main()
- comp_imread.sh
- 実行 shell script
#!/bin/sh # scipy (457.126190) echo 1 > /proc/sys/vm/drop_caches kernprof -l comp_scipy_imread.py # pil (457.224693) echo 1 > /proc/sys/vm/drop_caches kernprof -l comp_pil_imread.py # opencv (441.201759) echo 1 > /proc/sys/vm/drop_caches kernprof -l comp_cv2_imread.py
disk cacheを毎回消している.
結果
line_profilerで実行した決結果の内,該当行のみを記録した.
method | sec | sec/file |
---|---|---|
scipy | 457.13 | 0.0091426 |
pil | 457.22 | 0.0091444 |
opencv | 441.20 | 0.008824 |
特別速度に違いはなしどれもコアは同じなんだろうか?
次のファイルをみると
... def imread(fname, flatten=False, mode=None): """ Load an image from file. Parameters ...
PIL.Imageを使っていることがわかる.
opencvはというと
- /usr/lib/python2.7/dist-packages/cv.py
from cv2.cv import *
cv2はcv.soなのでおそらくOpenCV自身(c/c++で書かれた)のimreadのpython interfaceを使っていると思われる.
取り敢えず一番早かったopencvを選ぶ.
実験2
- threading数は3で行う.
コード
- thread_sample_imread.py
#!/usr/bin/python # -*- coding: utf-8 -*- # url ## http://ja.pymotw.com/2/threading/ ## http://docs.python.jp/2.7/library/threading.html import threading import glob import time import Queue import cv2 import pickle # http://docs.python.jp/2/library/queue.html#module-Queue # read jpeg to ndarray def read_jpeg_2_ndarray(filepath): return cv2.imread(filepath) # queue queue = Queue.Queue() # wokrer def worker(): """ """ while True: path = queue.get() read_jpeg_2_ndarray(path) queue.task_done() pass pass # put task base_dirpath = "/home/kzk/datasets/cifar10/train" filepaths = glob.glob("{}/*".format(base_dirpath)) for filepath in filepaths: queue.put(filepath) pass # create/start thread st = time.time() num_worker_threads = 3 for i in range(num_worker_threads): t = threading.Thread(target=worker) t.daemon = True t.start() pass queue.join() # wait et = time.time() print "total execution time with threading: ", (et - st), "[s]"
- thread_sample_pklread.py
#!/usr/bin/python # -*- coding: utf-8 -*- # url ## http://ja.pymotw.com/2/threading/ ## http://docs.python.jp/2.7/library/threading.html import threading import glob import time import Queue import cv2 import pickle # http://docs.python.jp/2/library/queue.html#module-Queue # read pickle to ndarray def read_pkl_2_ndarray(filepath): return pickle.load(open(filepath, "r")) # queue queue = Queue.Queue() # wokrer def worker(): """ """ while True: path = queue.get() read_pkl_2_ndarray(path) queue.task_done() pass pass # put task base_dirpath = "/home/kzk/datasets/cifar10/train_pkl" filepaths = glob.glob("{}/*".format(base_dirpath)) for filepath in filepaths: queue.put(filepath) pass # create/start thread st = time.time() num_worker_threads = 3 for i in range(num_worker_threads): t = threading.Thread(target=worker) t.daemon = True t.start() pass queue.join() # wait et = time.time() print "total execution time with threading: ", (et - st), "[s]"
結果
cv2.imread | pickle | cv2.imread (disk cached) | pickle (disk cached) |
---|---|---|---|
332.17 | 358.31 | 3.43 | 21.45 |
読み込みスピード関して,
image (jpeg) 2 ndarray > pickle 2 ndarray
になったが,誤差な気がする(これ以上追わない).
キャッシュしている場合だと
image (jpeg) 2 ndarray > pickle 2 ndarray
なった.
kaggleからcifar10を持ってきて,trainを使えば同じ様な結果はでると思う.
multithreadingスピードアップに関しては
441.201759/332.17 =~ 1.328240837522955
1.3倍くらいか...