2024. 7. 29. 20:29ㆍ프로그래밍 (확장)/Python-Pandas
Pandas
Importing Pandas
import pandas as pd
Creating DataFrames
# From a dictionary
data = {'Column1': [1, 2, 3, 4], 'Column2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame(data)
# From a CSV file
df = pd.read_csv('file_path.csv')
# From an Excel file
df = pd.read_excel('file_path.xlsx', sheet_name='Sheet1')
https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html
Intro to data structures — pandas 2.2.2 documentation
Intro to data structures We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objec
pandas.pydata.org
Viewing Data
df.head() # 첫 5행
df.tail() # 마지막 5행
df.sample(n=5) # 무작위 샘플 n행
df.info() # 데이터프레임 정보
df.describe() # 기술 통계
https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#viewing-data
Essential basic functionality — pandas 2.2.2 documentation
Essential basic functionality Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let’s create some example objects like we did in the 10 minutes to pandas section: In [1]: index = pd.date_range("1/1/2000"
pandas.pydata.org
Selecting Data
# Column selection
df['Column1'] # 시리즈 반환
df[['Column1', 'Column2']] # 데이터프레임 반환
# Row selection by index
df.iloc[0] # 첫 번째 행 (integer-location based)
df.loc[0] # 첫 번째 행 (label-based)
# Conditional selection
df[df['Column1'] > 2] # 조건 만족하는 행 선택
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
Indexing and selecting data — pandas 2.2.2 documentation
Indexing and selecting data The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display. Enables automatic and
pandas.pydata.org
Data Cleaning
# Handling missing values
df.dropna() # 결측값 있는 행 삭제
df.fillna(value) # 결측값 채우기
# Duplicates
df.drop_duplicates() # 중복 행 제거
# Renaming columns
df.rename(columns={'OldName': 'NewName'}, inplace=True)
# Changing data types
df['Column1'] = df['Column1'].astype('int')
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
Working with missing data — pandas 2.2.2 documentation
Starting from pandas 1.0, an experimental NA value (singleton) is available to represent scalar missing values. The goal of NA is provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending
pandas.pydata.org
Data Manipulation
# Adding a new column
df['NewColumn'] = df['Column1'] + df['Column2']
# Applying functions
df['NewColumn'] = df['Column1'].apply(lambda x: x * 2)
# Grouping data
grouped = df.groupby('Column1').sum()
# Merging DataFrames
df_merged = pd.merge(df1, df2, on='KeyColumn', how='inner') # 'left', 'right', 'outer' 가능
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Merge, join, concatenate and compare — pandas 2.2.2 documentation
Merge, join, concatenate and compare pandas provides various methods for combining and comparing Series or DataFrame. The concat() function concatenates an arbitrary amount of Series or DataFrame objects along an axis while performing optional set logic (u
pandas.pydata.org
Saving Data
df.to_csv('file_path.csv', index=False)
df.to_excel('file_path.xlsx', sheet_name='Sheet1', index=False)
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html
IO tools (text, CSV, HDF5, …) — pandas 2.2.2 documentation
The pandas I/O API is a set of top level reader functions accessed like pandas.read_csv() that generally return a pandas object. The corresponding writer functions are object methods that are accessed like DataFrame.to_csv(). Below is a table containing av
pandas.pydata.org
Useful Functions
# Sorting
df.sort_values(by='Column1', ascending=False)
# Pivot Table
pivot = df.pivot_table(index='Column1', columns='Column2', values='Column3', aggfunc='sum')
# Reset index
df.reset_index(drop=True, inplace=True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html
Essential basic functionality — pandas 2.2.2 documentation
Essential basic functionality Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let’s create some example objects like we did in the 10 minutes to pandas section: In [1]: index = pd.date_range("1/1/2000"
pandas.pydata.org
Time Series
# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])
# Setting index
df.set_index('Date', inplace=True)
# Resampling
df.resample('M').mean() # 'D': day, 'W': week, 'M': month, 'Q': quarter, 'Y': year
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html
Time series / date functionality — pandas 2.2.2 documentation
Time series / date functionality pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python
pandas.pydata.org
판다스 치트 시트 경로: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
10 minutes to pandas: https://pandas.pydata.org/docs/user_guide/10min.html
'프로그래밍 (확장) > Python-Pandas' 카테고리의 다른 글
판다스 (Pandas) (Pivot) (0) | 2024.08.03 |
---|---|
판다스 (Pandas) (데이터 정제) (0) | 2024.08.03 |
판다스 (Pandas) (데이터 선택 및 필터링) (0) | 2024.08.02 |
판다스 (Pandas) (데이터 입출력) (0) | 2024.08.02 |
판다스 (Pandas) (소개 및 기본 내용) (0) | 2024.08.02 |