2019-01-14

対応のあるt検定

10点テスト学習した結果、サンプル数が一緒。対応があるt検定を行う

import scipy as sp
b= [9,8,10,7,5,9,10,10,8,10,10,6,8,9,10,9,10,9]
a =[9,9,10,7,6,10,10,9,8,10,7,8,10,10,10,10,10,10]
print(len(a),len(b))
print(sp.stats.ttest_rel( b,a))

Ttest_relResult(statistic=-1.243163121016122, pvalue=0.23068052813068232)

差がないという事象が23%の確率で起こる。5%の有意水準では棄却できず。

差がないが採択される。

学習しても差(効果)がないかなしい。

2019-01-14

信頼区間

このサンプル数から95%の確率で母平均が含まれるだろう区間を求めよ。それが信頼区間。

import scipy as sp
x = [45,39,42,57,28,33,40,52]
se = stats.sem(x)
sp.stats.t.interval(alpha=0.95,loc=(np.mean(x)),df=(len(x)-1),scale=se)

(34.106667079288194, 49.893332920711806)

タプルで返ってくる。

2019-01-14

対応の無いt検定

$t値 =\frac{(標本平均A-標本平均B)}{\sqrt{(不偏分散A÷サンプルサイズA)+(不偏分散B÷サンプルサイズB)}}$

import scipy as sp
b = [7,8,10,5,8,7,9,5,6,9,10,6,7,8,7,9,10,10]
a = [9,9,6,10,9,8,10,7,9,10,6,8,9,9,10,7,8,8,10,9]
t_value = (sp.mean(a)-sp.mean(b))/sp.sqrt((sp.var(b,ddof=1)/len(b) + sp.var(a,ddof=1)/len(a)))
t_value

1.463060101595841

一行で書くなら

t_value,p_value =sp.stats.ttest_ind(a,b,equal_var=False)
t_value

1.463060101595841

sp.stats.ttest_ind(a,b,equal_var=False)

だとp値も表示してくれる。

Ttest_indResult(statistic=1.463060101595841, pvalue=0.15335375977552246)

対応の無いt検定は、サンプル数が異なっても大丈夫

2018-08-26

関西電力の電気データをDataFrameにしてみた。

pandasからのplotって手軽で良い。関西電力は、消費電力をCSVでダウンロードできる。 pandas学習がてらにread_csvをしてみました。

温度が高いと電気代上がるのではなく、冬のエコキュートの電気代が高いので温度に対して負の相関になった。

2018-08-12

json.dump()

import json
from io import StringIO
io = StringIO()
json.dump(['streaming API'], io)#dump()は書き出す
io.getvalue()

'["streaming API"]'

2018-08-12

rjust(),ljust(),zfill()たち

In [49]: s =  'Hello, world'
    ...: print (str(s))
    ...: print (repr(s))#インタープリタが読めるよう表現を生成する。
    ...: print(s.rjust(30))#30文字の中で右寄せる。
    ...: print(s.ljust(30))#30文字の中で左寄せる。
    ...: print(s.zfill(30))#30文字の中で0でパディング。
    ...:
    ...:
Hello, world
'Hello, world'
                  Hello, world
Hello, world
000000000000000000Hello, world

2018-08-03

np.whereって

import numpy as np
i = np.min(np.where(y > 0.5))

とはどういう意味か。 whereって

In [1]: import numpy as np
In [2]: a = np.arange(8).reshape((2, 4))
In [3]: np.where(a > 2) 

Out[3]: (array([0, 1, 1, 1, 1], dtype=int64), array([3, 0, 1, 2, 3], dtype=int64))

最初意味がわからなかったけどこれ、indexを表示している。 2つのarray[0]の組み合わせ、(0,3)(1,0)という感じ

In [5]: a >2
Out[5]:
array([[False, False, False,  True],
       [ True,  True,  True,  True]])

2より大きいものは、5個あって、(0,3)と(1,0),(1,1),(1,2),(1,3)が該当する。