1. 데이터 로드
- df = pd.read_확장자 (대상, sep, encoding)
- sep 인자 : \t 기준으로 구분
- encoding : "euc-kr" 한글 / UTF-8
DataUrl = 'https://raw.githubusercontent.com/Datamanim/pandas/main/lol.csv'
df = pd.read_csv(DataUrl,sep='\t')
2. 상위, 하위 데이터 출력
- df.head() : 기본은 5개
- df.tail() : 기본은 5개
3. 데이터 구조 파악
- df.index : 인덱스 정보파악
df.index
RangeIndex(start=0, stop=51490, step=1)
- df.shape : 행과 열의 갯수 파악
df.shape
(51490, 61)
- df.info() : 결측치 파악에 유용
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51490 entries, 0 to 51489
Data columns (total 61 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gameId 51490 non-null int64
1 creationTime 51490 non-null int64
2 gameDuration 51490 non-null int64
3 seasonId 51490 non-null int64
4 winner 51490 non-null int64
5 firstBlood 51490 non-null int64
6 firstTower 51490 non-null int64
7 firstInhibitor 51490 non-null int64
8 firstBaron 51490 non-null int64
9 firstDragon 51490 non-null int64
10 firstRiftHerald 51490 non-null int64
11 t1_champ1id 51490 non-null int64
12 t1_champ1_sum1 51490 non-null int64
13 t1_champ1_sum2 51490 non-null int64
14 t1_champ2id 51490 non-null int64
15 t1_champ2_sum1 51490 non-null int64
16 t1_champ2_sum2 51490 non-null int64
17 t1_champ3id 51490 non-null int64
18 t1_champ3_sum1 51490 non-null int64
19 t1_champ3_sum2 51490 non-null int64
20 t1_champ4id 51490 non-null int64
21 t1_champ4_sum1 51490 non-null int64
22 t1_champ4_sum2 51490 non-null int64
23 t1_champ5id 51490 non-null int64
24 t1_champ5_sum1 51490 non-null int64
25 t1_champ5_sum2 51490 non-null int64
26 t1_towerKills 51490 non-null int64
27 t1_inhibitorKills 51490 non-null int64
28 t1_baronKills 51490 non-null int64
29 t1_dragonKills 51490 non-null int64
30 t1_riftHeraldKills 51490 non-null int64
31 t1_ban1 51490 non-null int64
32 t1_ban2 51490 non-null int64
33 t1_ban3 51490 non-null int64
34 t1_ban4 51490 non-null int64
35 t1_ban5 51490 non-null int64
36 t2_champ1id 51490 non-null int64
37 t2_champ1_sum1 51490 non-null int64
38 t2_champ1_sum2 51490 non-null int64
39 t2_champ2id 51490 non-null int64
40 t2_champ2_sum1 51490 non-null int64
41 t2_champ2_sum2 51490 non-null int64
42 t2_champ3id 51490 non-null int64
43 t2_champ3_sum1 51490 non-null int64
44 t2_champ3_sum2 51490 non-null int64
45 t2_champ4id 51490 non-null int64
46 t2_champ4_sum1 51490 non-null int64
47 t2_champ4_sum2 51490 non-null int64
48 t2_champ5id 51490 non-null int64
49 t2_champ5_sum1 51490 non-null int64
50 t2_champ5_sum2 51490 non-null int64
51 t2_towerKills 51490 non-null int64
52 t2_inhibitorKills 51490 non-null int64
53 t2_baronKills 51490 non-null int64
54 t2_dragonKills 51490 non-null int64
55 t2_riftHeraldKills 51490 non-null int64
56 t2_ban1 51490 non-null int64
57 t2_ban2 51490 non-null int64
58 t2_ban3 51490 non-null int64
59 t2_ban4 51490 non-null int64
60 t2_ban5 51490 non-null int64
dtypes: int64(61)
memory usage: 24.0 MB
* 결측치 확인
- df.isnull().sum() : is null에 대해 True : 1, False : 0 으로 합계
df.isnull().sum()
gameId 0
creationTime 0
gameDuration 0
seasonId 0
winner 0
..
t2_ban1 0
t2_ban2 0
t2_ban3 0
t2_ban4 0
t2_ban5 0
Length: 61, dtype: int64
'Data Science > python' 카테고리의 다른 글
2. Selection (0) | 2021.12.22 |
---|---|
1. Viewing data (0) | 2021.12.22 |
리스트(list) (0) | 2021.12.19 |
문자열(string) (2) (0) | 2021.12.19 |
문자열(string) (1) (0) | 2021.12.19 |