๐Ÿ’ก WIDA/DACON ๋ถ„๋ฅ˜-ํšŒ๊ท€

[DACON/๊น€์„ธ์—ฐ] ํŒŒ์ด์ฌ์„ ์ด์šฉํ•œ EDA

์•Œ ์ˆ˜ ์—†๋Š” ์‚ฌ์šฉ์ž 2023. 4. 7. 14:08
#1 import pandas as pd
#2 import numpy as np
#3 import matplotlib.pyplot as plt
#4 import seaborn as sns

#5 df=pd.read_csv("C:/Users/lucy8/PycharmProjects/test2/DSOB/train.csv")
#6 print(df.head(3))

#7 print(df.shape)
#8 print(df.isnull().sum())
#9 print(df.info())

6ํ–‰์˜ ์ถœ๋ ฅ๊ฐ’

df ํŒŒ์ผ์— ํ• ๋‹น๋œ ๋ฐ์ดํ„ฐ ์ค‘์— 3๊ฐœ๋ฅผ ๋ฝ‘์•„, ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ ๋“ฑ์„ ํŒŒ์•…ํ•จ 

 

 

7ํ–‰,8ํ–‰,9ํ–‰์˜ ์ถœ๋ ฅ๊ฐ’

์ขŒ(7ํ–‰,8ํ–‰)๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋ฅผ ํ–‰๋ ฌ๋กœ ์•Œ๋ ค์ฃผ๊ณ , null๊ฐ’์ด ์žˆ๋Š”์ง€ ์•Œ๋ ค์คŒ (null๊ฐ’์ด ์žˆ๋‹ค๋ฉด, ํ•ด๊ฒฐํ•ด์•ผํ•จ (ํ‰๊ท ,์‚ญ์ œ ๋“ฑ))

์šฐ(9ํ–‰)๋Š” ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์•Œ๋ ค์คŒ

 

#10 num_type = df['type'].unique()
#11 print(num_type)
#12 print(len(num_type))

#13 fiberid_type = df['fiberID'].unique()
#14 print(fiberid_type)
#15 print(len(fiberid_type))

#16 numerical_columns=['psfMag_u','psfMag_g','psfMag_r','psfMag_i',	'psfMag_z','fiberMag_u','fiberMag_g','fiberMag_r','fiberMag_i','fiberMag_z','petroMag_u','petroMag_g','petroMag_r','petroMag_i','petroMag_z','modelMag_u','modelMag_g']

#17 corr = df[numerical_columns].corr(method = 'pearson')
print(corr)

#18 fig = plt.figure(figsize = (12, 8))
ax = fig.gca()

#19 sns.set(font_scale = 1.5)  # heatmap ์•ˆ์˜ font-size ์„ค์ •
heatmap = sns.heatmap(corr.values, annot = True, fmt='.2f', annot_kws={'size':15},
                      yticklabels = numerical_columns, xticklabels = numerical_columns, ax=ax, cmap = "RdYlBu")
plt.tight_layout()

plt.show

#20 plt.boxplot(df['fiberMag_u'])
#21 plt.boxplot(df['petroMag_u'])

plt.show()

#22 numerical_columns=['psfMag_u','psfMag_g','psfMag_r','psfMag_i','psfMag_z','fiberMag_u','fiberMag_g','fiberMag_r','fiberMag_i','fiberMag_z','petroMag_u','petroMag_g','petroMag_r','petroMag_i','petroMag_z','modelMag_u','modelMag_g']

fig =plt.figure(figsize = (20, 20))
ax = fig.gca()

df[numerical_columns].hist(ax=ax)
plt.show()

17ํ–‰ ์ถœ๋ ฅ๊ฐ’

์œ ์˜๋ฏธํ•œ ์ปฌ๋Ÿผ๋“ค์„ ๋”ฐ๋กœ ๋ชจ์€ ํ›„, ์ƒ๊ด€๊ด€๊ณ„๋ฅผ -1๋ถ€ํ„ฐ 1๊นŒ์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์‹œ๊ฐํ™” 

19ํ–‰ ์ถœ๋ ฅ๊ฐ’

์ƒ๊ด€๊ด€๊ณ„๋ฅผ ์‹œ๊ฐํ™” ํ•œ ํžˆํŠธ๋งต

 

๋นจ๊ฐ•๊ฐ’๊ณผ ํŒŒ๋ž‘๊ฐ’์„ ๊ฐ–๋Š” ์ปฌ๋Ÿผ์˜ ๋ฐ•์Šคํ”Œ๋กฏ 

 

20ํ–‰ ์ถœ๋ ฅ๊ฐ’
21ํ–‰ ์ถœ๋ ฅ๊ฐ’

 

22ํ–‰ ์ถœ๋ ฅ๊ฐ’