2017-05-19

pandasの使い方（３）　インデックス

pandas ライブラリ python

pandasのインデックスオブジェクトについて。

行や列のインデックスは「インデックスオブジェクト」として保持。
行：DataFrame.index
列：DataFrame.columns
　

import numpy as np
import pandas as pd

data = {"year": [2010, 2011, 2012, 2013],
        "pref":["千葉", "山口", "岐阜", "東京"]}

f = pd.DataFrame(data)
print(f)
#   pref  year
# 0   千葉  2010
# 1   山口  2011
# 2   岐阜  2012
# 3   東京  2013

print(f.index)   # RangeIndex(start=0, stop=4, step=1)
print(f.columns) # Index(['pref', 'year'], dtype='object')

　
インデックスオブジェクトは固定長のsetのような動作。

print(0 in f.index) # True
print(4 in f.index) # False

　
・インデックスオブジェクトのメソッド

append	インデックスオブジェクトを追加
diff	集合の差
intersection	集合の積(共通部分)
union	和集合
isin	各要素が含まれているかのbool配列
delete	i番目の要素を削除
drop	指定した値を削除
insert	要素の挿入
is_monotonic	各要素が1つ前の要素と等しいか大きいならTrue
is_unique	重複なしならTrue
unique	重複のない配列にする

print(f.index)   # RangeIndex(start=0, stop=4, step=1)
ind = f.index.insert(2, 10)
print(ind)       # Int64Index([0, 1, 10, 2, 3], dtype='int64')

print(f.index.isin([1, 3, 10])) # [False  True False  True]
f2 = f.reindex(ind)
print(f2)
#   pref    year
# 0    千葉  2010.0
# 1    山口  2011.0
# 10  NaN     NaN
# 2    岐阜  2012.0
# 3    東京  2013.0

再インデックス reindex

新しいインデックスで再構成
　
reindexのメソッド

index	行に対する新しいインデックスを指定
columns	列に対する新しいインデックスを指定
method	ffill, pad：前方穴埋め、 bfill, backfill：後方穴埋め
fill_value	欠損を埋める値
limit	method梅のときの最大個数
level	階層型インデックスの階層指定
copy	変更前と同じインデックスのとき、Trueならコピー。デフォルトはTrue

2017-05-18

テストデータ（７）　東京の年平均気温

テストデータ python

東京の平均気温（気象庁のサイトより）

# coding: utf-8

import pandas as pd

url = 'http://python-remrin.hatenadiary.jp/entry/2017/05/18/142423'
## DataFrameのリストを得る。header=0のオプション指定で、最初の行をheader扱い。
fetched = pd.io.html.read_html(url)
print(len(fetched))  # listの要素であるDataFrameの個数
df = fetched[0]

# 2次元リスト化
data = []
for i in range(len(df)):
    x = []
    for j in range(len(df.iloc[0])):
        x.append(df.iloc[i, j])
    data.append(x)
print(data)

# hatena表組み化
for i in range(len(df)):
    s = "|"
    for j in range(len(df.iloc[0])):
        s += str(df.iloc[i, j]) + "|"
    print(s)

2017-05-18

最小二乗法

python アルゴリズム

最小二乗法について。
　
(2, 3), (4, 7), (9, 11)の3点をデータとして、最小二乗法を1次式で行う場合。
f:id:rare_Remrin:20170518140226p:plain
とすると、
f:id:rare_Remrin:20170518140256p:plain
これの最小値を求めることになり、偏微分をして
f:id:rare_Remrin:20170518140345p:plain
となるベクトルが係数ベクトルとなる。
　

# coding: utf-8

import numpy as np
import matplotlib.pyplot as plt

data = [(2, 3), (4, 7), (9, 11)]
a = np.matrix(data)
b = a[:, 1].copy()
a[:, 1] = 1

# 係数ベクトルを求める
x = ((a.T * a)**-1) * a.T * b
x= np.array(x)

# matplotlibで描画
datax, datay = np.split(np.array(data), 2, axis=1)
maxx, maxy = max(datax), max(datay)
plt.xlim(0, maxx*1.1)
plt.ylim(0, maxy*1.1)
plt.scatter(datax, datay)
plt.plot([0, maxx*1.1], [x[1], maxx*x[0]*1.1 + x[1]], color="r")
plt.text(1, maxy, "y={:.2f}x+{:.2f}".format(*x[:, 0]), fontsize=14)
plt.show()

　
f:id:rare_Remrin:20170518140520p:plain
　
2次式以上でも最小二乗法で近似できるようにプログラムを作ってみました。

# coding: utf-8

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# 東京の過去140年間の平均気温のデータを取得
url = 'http://python-remrin.hatenadiary.jp/entry/2017/05/18/142423'
## DataFrameのリストを得る。header=0のオプション指定で、最初の行をheader扱い。
fetched = pd.io.html.read_html(url, header=0)
df = fetched[0]

# 2次元リスト化
data = []
for i in range(len(df)):
    x = []
    for j in range(len(df.iloc[0])):
        x.append(df.iloc[i, j])
    data.append(x)

data = np.array(data)
xlabel = data[0:, 0]
datasize = data.shape[0]
datax = np.arange(datasize)
datay = np.array(data[:, -1], dtype=float)

# n次式で近似するための行列を作る
n = 2
a = np.empty((n + 1, datasize), dtype=int)
for i in range(n + 1):
    a[i] = datax**(n - i)
a = np.matrix(a.T)
b = np.matrix(datay).T

## 係数ベクトルを求める
x = ((a.T * a)**-1) * a.T * b
x= np.array(x)

def f(a):
    fa = 0
    for i in range(n + 1):
        fa += x[i] * a**(n-i)   
    return fa

def textf():
    txt = "y="
    for i in range(n):
        txt +="{:+.3f}x^{}".format(x[i][0], n-i)
    txt += "{:+.1f}".format(x[n][0])
    return txt

print(x)
print(textf())

# matplotlibで描画
maxx, maxy = max(datax), max(datay)
plt.xlim(0, maxx*1)
plt.ylim(10, maxy*1.1)
plt.scatter(datax, datay)
plt.plot(datax, f(datax), color="r")                # 近似曲線、直線
plt.text(1, maxy, textf(), fontsize=14)             # 近似式
plt.xticks(np.arange(datasize)[::20], xlabel[::20]) # 横軸目盛りは20ごと
plt.title("mean temperature in Tokyo")
plt.show()

　
東京の過去140年の平均気温を最小二乗法で近似。
1次式で最小二乗法
　
f:id:rare_Remrin:20170518163251p:plain

2次式で最小二乗法
　
f:id:rare_Remrin:20170518163301p:plain
　
ただし、3次以上だとうまくいかないので、原因調査中
　
参考：
正規方程式の導出と計算例 | 高校数学の美しい物語

2017-05-17

NumPyの使い方（１８）　次元操作

numpy ライブラリ python

NumPyの次元操作について
　

次元の増減	保持	削減	増加
reshape	○	○	○
resize	○	○	○
flatten		○	×
rabel		○	×
any, all	×	○	×
sort	○	△	×	axis略で1d
スライス	○	×	×
インデックス	×	○	×
ファンシーインデックス	○	○	×
take	○	△	×	axis略で1d
mean, sum	×	○	×
min, max	×	○	×
concatenate	○	×	×
split	○	×	×
repeat	○	△	×	axis略で1d
tile	○	×	○
expand_dims	×	×	○
ブロードキャスト	○	×	○
イテレート	×	○	×	次元が1減る
compress	○	△	×	axis略で1d

　
・np.expand_dims()
　軸を指定して次元を増やす。

import numpy as np

a1 = np.array([0, 1, 2])
a2 = np.expand_dims(a1, axis=0)
print(a2)
# [[0 1 2]]

# 以下も同義
a3 = a1[np.newaxis, :]
print(a3)
# [[0 1 2]]

a4 =np.expand_dims(a1, axis=1)
print(a4)
# [[0]
#  [1]
#  [2]]

# 以下も同義
a5 = a1[:, np.newaxis]
print(a5)
#[[0]
# [1]
# [2]]

2017-05-16

NumPyの使い方（１７）　構造化配列

numpy python ライブラリ

NumPyの構造化配列(structured array)について。
　
dtypeを自分で設定して、異なる型の変数を織り込める。
フィールド名で要素を参照できる。

dtype = [("x", int), ("y", float)]
a1 = np.array([(1, 2), (3, 4), (5, 6)], dtype=dtype)

print(a1)      # [(1, 2.0) (3, 4.0) (5, 6.0)]
print(a1["x"]) # [1 3 5 ]
print(a1["y"]) # [ 2.  4.  6.]

　
フィールドのsizeを指定したりもできる。

dtype = [("x", int, 3), ("y", float)]

a2= np.array(([1, 2, 3], 4), dtype=dtype)
print(a2["y"]) # 4.0

a3 = np.zeros(5, dtype=dtype)
print(a3["x"])
# [[0 0 0]
#  [0 0 0]
#  [0 0 0]
#  [0 0 0]
#  [0 0 0]]

print(a3["y"]) # [ 0.  0.  0.  0.  0.]

　
フィールドのネストもできる。

print(a3["y"]) # [ 0.  0.  0.  0.  0.]

dtype = [("x", [("y", float), ("z", int)]), ("a", int)]
a4 = np.array([((1, 2), 3), ((4, 5), 6)], dtype=dtype)
print(a4["x"])      # [(1.0, 2) (4.0, 5)]
print(a4["x"]["z"]) # [2 5]

2017-05-16

麻雀（１）

麻雀 python

何年かかるかわかりませんがプログラムの練習で麻雀を作ってみようと思います。
　

m	萬子	0-8
p	筒子	9-17
s	索子	18-26
h	字牌	東:27～中：33

pair	対子
triplet	暗刻
Triplet	明刻
seq	順子
kan	暗槓
Kan	明槓

prevailing_wind	場風
seat_wind	自風
dealer	親

wall_seq	136枚の牌をシャッフルしたシーケンス。
hand	手牌。1次元ndarrayでsize34

　
文字列入力をhand配列へ。

import numpy as np

ji_dic2= dict((s, i + 27) for i, s in enumerate("東南西北白發中"))

def str_to_hand(string):
    hand = np.zeros(34, dtype=int)
    suit = np.zeros(9, dtype=int)
    while string:
        if string[0] in "123456789":
            suit[int(string[0]) - 1] +=1
        elif string[0] in "mps":
            mps = (string[0]=="p")*9 + (string[0]=="s")*18
            hand[mps:mps+9] += suit
            suit.fill(0)
        else:
            hand[ji_dic2[string[0]]] += 1
        string = string[1:]
    return hand

data = ["123m123p123789s中中"]
data.append("1133557799m1133p")
data.append("19m19p19s東南西北白發中")
data.append("123m123p123789s白中")
data.append("111222333999m11s")
data.append("112233777999m11p")
data.append("白1m1s白白1m1m222m11s中中")
for i, hand in enumerate(data):
    print(i, str_to_hand(data[i]))
        
# 0 [1 1 1 ..., 0 0 2]
# 1 [2 0 2 ..., 0 0 0]
# 2 [1 0 0 ..., 1 1 1]
# 3 [1 1 1 ..., 1 0 1]
# 4 [3 3 3 ..., 0 0 0]
# 5 [2 2 2 ..., 0 0 0]
# 6 [3 3 0 ..., 3 0 2]

　
クラスで作ってみると

import numpy as np

ji_dic2= dict((s, i + 27) for i, s in enumerate("東南西北白發中"))

class Hand():
    def __init__(self, hand):
        if isinstance(hand, str):
            hand = self.str_to_hand(hand)
        self.hand = hand
        
    def str_to_hand(self, string):
        hand = np.zeros(34, dtype=int)
        suit = np.zeros(9, dtype=int)
        string = string.replace(" ", "").replace("[", "").replace("]", "")
        while string:
            if string[0] in "123456789":
                suit[int(string[0]) - 1] +=1
            elif string[0] in "mps":
                mps = (string[0]=="p")*9 + (string[0]=="s")*18
                hand[mps:mps+9] += suit
                suit.fill(0)
            elif string[0] in "東南西北白發中":
                hand[ji_dic2[string[0]]] += 1
            string = string[1:]
        return hand

    def hand_to_str(self):
        pass
               
        
if __name__ == "__main__":
    data = ["123m123p123789s中中"]
    data.append("1133557799m1133p")
    data.append("19m19p19s東南西北白發中中")
    data.append("123m123p123789s白中")
    data.append("111222333999m11s")
    data.append("112233777999m11p")
    data.append("白1m1s白白1m1m222m11s 中中")
    for i, hand in enumerate(data):
        h = Hand(hand).hand
        print(i, h)
# 0 [1 1 1 ..., 0 0 2]
# 1 [2 0 2 ..., 0 0 0]
# 2 [1 0 0 ..., 1 1 2]
# 3 [1 1 1 ..., 1 0 1]
# 4 [3 3 3 ..., 0 0 0]
# 5 [2 2 2 ..., 0 0 0]
# 6 [3 3 0 ..., 3 0 2]

　
役判定から点数計算までの流れ（仮）
・手牌の形だけで判断できる役
　・国士無双かどうか
　・七対子かどうか
　・その他：1雀頭、4面子になっているか
　　　・まず雀頭で場合分け
　　　・刻子で場合分け
　　　・残りが順子か
・手牌の形と状態(面前かどうかなど)で判断できる役
・風牌
・状態で判断できる役(天和、地和、海底、嶺上、立直、一発、ツモなど)
・ドラのせ
・点数計算

# coding: utf-8

import numpy as np
from itertools import combinations

ji_dic = dict((i + 27, s) for i, s in enumerate("東南西北白發中"))
ji_dic2= dict((s, i + 27) for i, s in enumerate("東南西北白發中"))
yaku_dic = {0:"7 pairs", 1:"13 orphans", 2:"Concealed four pongs"}

def hand_to_str(hand):
    string = str(np.repeat(np.arange(1, 10), hand[:9]))
    string += "m " * any(hand[:9])    
    string += str(np.repeat(np.arange(1, 10), hand[9:18]))
    string += "p " * any(hand[9:18])    
    string += str(np.repeat(np.arange(1, 10), hand[18:27]))
    string += "s " * any(hand[18:27])
    for i in range(27, 34):
        string += ji_dic[i] * hand[i]
    return string

def is_winhand(hand):
    # seven pairs
    seven_pairs = sum(hand == 2) == 7
    yaku = "seven_pairs" if seven_pairs else ""
    # 13 orphans
    if np.all(hand[[0, 8, 9, 17, 18, 26, 27, 28, 29, 30, 31, 32, 33]] > 0):
        return "13 orphans"
    else:
        result = melding(hand)
    return yaku if yaku else result if result else "not a winning hand"

# parsing
def melding(tiles):
    # 順子さがし
    def find_seq(tiles, melds):
        while sum(tiles):
            where = np.where(tiles > 0)[0] # 牌が存在する位置の0次元目→横方向インデックス
            if len(where)>2 and where[0] + 2 == where[1] + 1 == where[2]:
                melds.append((where[0], "seq"))
                tiles[where[0]] -= 1
                tiles[where[1]] -= 1
                tiles[where[2]] -= 1
            else:
                melds = []
                break
        return melds 
    
    # 刻子さがし
    def find_triplet(tiles, pair):
        items = np.where(tiles > 2)[0]
        triplets = []
        # find all subsets of triplets(刻子の全組み合わせを列挙)
        for i in range(len(items) + 1):
            for c in combinations(items, i):
                triplets.append(c)
        
        # find seq for each triplet-subset(それぞれについて順子さがし)
        result=[]
        for triplet in triplets:
            melds = [pair]
            tile = tiles.copy()
            for t in triplet:
                melds.append((t, "triplet"))
                tile[t] -= 3
            seq = find_seq(tile, melds)
            if seq:
                result.append(seq)
        return result

    # find pair(対子さがし)
    m = []
    for i in np.where(tiles > 1)[0]:
        hand = tiles.copy()
        meld = (i, "pair")
        hand[i] -= 2
        meld = find_triplet(hand, meld)
        if meld:
            m.append(meld)
    return m if m else ""

class Hand():
    def __init__(self, hand):
        #手牌が配列ではなく文字列なら、配列に変換
        if isinstance(hand, str):
            hand = self.str_to_hand(hand)
        self.hand = hand
    
    # 文字入力をNumPy配列に変換    
    def str_to_hand(self, string):
        hand = np.zeros(34, dtype=int)
        suit = np.zeros(9, dtype=int)
        string = string.replace(" ", "").replace("[", "").replace("]", "")
        while string:
            if string[0] in "123456789":
                suit[int(string[0]) - 1] +=1
            elif string[0] in "mps":
                mps = (string[0]=="p")*9 + (string[0]=="s")*18
                hand[mps:mps+9] += suit
                suit.fill(0)
            elif string[0] in "東南西北白發中":
                hand[ji_dic2[string[0]]] += 1
            string = string[1:]
        return hand               
        
if __name__ == "__main__":   
    data = ["123m123p123789s中中"]
    data.append("1133557799m1133p")        # 7 pairs
    data.append("19m19p19s東南西北白發中中") # 13 orphans
    data.append("123m123p123789s白中")      # not a winning hand
    data.append("111222333999m11s")
    data.append("112233777999m11p")
    data.append("白1m1s白白1m1m222m11s 中中")
    for i, hand in enumerate(data):
        h = Hand(hand).hand
        print(i, is_winhand(h))

        
# 0 [[[(33, 'pair'), (0, 'seq'), (9, 'seq'), (18, 'seq'), (24, 'seq')]]]
# 1 seven_pairs
# 2 13 orphans
# 3 not a winning hand
# 4 [[[(18, 'pair'), (8, 'triplet'), (0, 'seq'), (0, 'seq'), (0, 'seq')], [(18, 'pair'), (0, 'triplet'), (1, 'triplet'), (2, 'triplet'), (8, 'triplet')]]]
# 5 [[[(9, 'pair'), (6, 'triplet'), (8, 'triplet'), (0, 'seq'), (0, 'seq')]]]
# 6 [[[(33, 'pair'), (0, 'triplet'), (1, 'triplet'), (18, 'triplet'), (31, 'triplet')]]]

2017-05-16

NumPyの使い方（１６）　行列matrix

numpy python ライブラリ

NumPyのmatrixについて。
　

class	種類	次元
ndarray	配列	次元は自由
matrix	行列	2次元に固定

　
ndarrayでは行列の積を表すときにdot()を呼びだすが、
matrixでは「*」や「**」で行列の積などを表せる。
その他の使い方はndarrayと同様のようです。
　

import numpy as np

a1 = np.array([[1, 2], [3, 4]])
a2 = np.array([[1, 1], [1, 1]])

print(a1 * a2)    #同じ位置の要素どうしの積
# [[1 2]
#  [3 4]]

print(a1.dot(a2)) # 内積
# [[3 3]
#  [7 7]]

m1 = np.matrix([[1, 2], [3, 4]])
m2 = np.matrix([[1, 1], [1, 1]])

print(m1 * m2)
# [[3 3]
#  [7 7]]

print(m1 ** -1)  # 逆行列(1)
# [[-2.   1. ]
#  [ 1.5 -0.5]]

print(m1.I)      # 逆行列(2)
# [[-2.   1. ]
#  [ 1.5 -0.5]]

m3 = np.matrix([[1, 3], [-2, -4]])
P = (np.linalg.eig(m3))[1]         # 正規化された固有ベクトルから作った対角化行列
print(P)
# [[ 0.83205029 -0.70710678]
#  [-0.5547002   0.70710678]]

PiAP = P.I * m3 * P  # 対角化された行列
PiAP = PiAP.round(3) # 見やすくするため四捨五入
print(PiAP)
# [[-1.  0.]
#  [-0. -2.]]

print((PiAP)**2)
# [[ 1.  0.]
#  [ 0.  4.]]

print((PiAP)**3)
# [[-1.  0.]
#  [-0. -8.]]

# ndarrayでの内積の書き方
a3 = np.array(PiAP)
print(PiAP.dot(PiAP).dot(PiAP).dot(PiAP))
# [[  1.   0.]
#  [  0.  16.]]

　
行列の積などを多用するプログラムのときはmatrixを用いると見やすく書けそう。
　

操作・演算	ndarray	matrix
*	対応要素の積	内積matrix product
**	各要素の２乗	行列の２乗
*-1	各要素の逆数	逆行列inverse
I属性	なし	逆行列inverse
H属性	なし	共役転置conjugate transpose

　
matrixは2次元固定なので次元削減の操作にも一部違いがあります。
axisを指定したmean()だと、2次元(ndim=2)のままです。

m1 = np.matrix([5, 1])
print(m1)         # [[5 1]]
print(m1.shape)   # (1, 2)
print(m1.mean(1)) # [[3]]        # 2次元のまま
print(m1.mean(0)) # [[ 5.  1.]]
print(m1.mean())  # 3.0

　
特にこだわりがなければ、matrixを使わずに、ndarrayだけで必要な操作ができるみたいです。

Remrinのpython攻略日記

python3に入門しました。python3についてあれこれとサンプルコードとか。

pandasの使い方（３）　インデックス

再インデックス reindex

テストデータ（７）　東京の年平均気温

最小二乗法

NumPyの使い方（１８）　次元操作

NumPyの使い方（１７）　構造化配列

麻雀（１）

NumPyの使い方（１６）　行列matrix