Pandas 기본 기능

2024. 7. 29. 20:29프로그래밍 (확장)/Python-Pandas

Pandas

Importing Pandas

import pandas as pd

Creating DataFrames

# From a dictionary
data = {'Column1': [1, 2, 3, 4], 'Column2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame(data)

# From a CSV file
df = pd.read_csv('file_path.csv')

# From an Excel file
df = pd.read_excel('file_path.xlsx', sheet_name='Sheet1')

https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html

 

Intro to data structures — pandas 2.2.2 documentation

Intro to data structures We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objec

pandas.pydata.org

 

Viewing Data

df.head()  # 첫 5행
df.tail()  # 마지막 5행
df.sample(n=5)  # 무작위 샘플 n행
df.info()  # 데이터프레임 정보
df.describe()  # 기술 통계

https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#viewing-data

 

Essential basic functionality — pandas 2.2.2 documentation

Essential basic functionality Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let’s create some example objects like we did in the 10 minutes to pandas section: In [1]: index = pd.date_range("1/1/2000"

pandas.pydata.org

 

Selecting Data

# Column selection
df['Column1']  # 시리즈 반환
df[['Column1', 'Column2']]  # 데이터프레임 반환

# Row selection by index
df.iloc[0]  # 첫 번째 행 (integer-location based)
df.loc[0]  # 첫 번째 행 (label-based)

# Conditional selection
df[df['Column1'] > 2]  # 조건 만족하는 행 선택

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

 

Indexing and selecting data — pandas 2.2.2 documentation

Indexing and selecting data The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display. Enables automatic and

pandas.pydata.org

 

Data Cleaning

# Handling missing values
df.dropna()  # 결측값 있는 행 삭제
df.fillna(value)  # 결측값 채우기

# Duplicates
df.drop_duplicates()  # 중복 행 제거

# Renaming columns
df.rename(columns={'OldName': 'NewName'}, inplace=True)

# Changing data types
df['Column1'] = df['Column1'].astype('int')

https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

 

Working with missing data — pandas 2.2.2 documentation

Starting from pandas 1.0, an experimental NA value (singleton) is available to represent scalar missing values. The goal of NA is provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending

pandas.pydata.org

 

Data Manipulation

# Adding a new column
df['NewColumn'] = df['Column1'] + df['Column2']

# Applying functions
df['NewColumn'] = df['Column1'].apply(lambda x: x * 2)

# Grouping data
grouped = df.groupby('Column1').sum()

# Merging DataFrames
df_merged = pd.merge(df1, df2, on='KeyColumn', how='inner')  # 'left', 'right', 'outer' 가능

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

 

Merge, join, concatenate and compare — pandas 2.2.2 documentation

Merge, join, concatenate and compare pandas provides various methods for combining and comparing Series or DataFrame. The concat() function concatenates an arbitrary amount of Series or DataFrame objects along an axis while performing optional set logic (u

pandas.pydata.org

 

Saving Data

df.to_csv('file_path.csv', index=False)
df.to_excel('file_path.xlsx', sheet_name='Sheet1', index=False)

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

 

IO tools (text, CSV, HDF5, …) — pandas 2.2.2 documentation

The pandas I/O API is a set of top level reader functions accessed like pandas.read_csv() that generally return a pandas object. The corresponding writer functions are object methods that are accessed like DataFrame.to_csv(). Below is a table containing av

pandas.pydata.org

 

Useful Functions

# Sorting
df.sort_values(by='Column1', ascending=False)

# Pivot Table
pivot = df.pivot_table(index='Column1', columns='Column2', values='Column3', aggfunc='sum')

# Reset index
df.reset_index(drop=True, inplace=True)

https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html

 

Essential basic functionality — pandas 2.2.2 documentation

Essential basic functionality Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let’s create some example objects like we did in the 10 minutes to pandas section: In [1]: index = pd.date_range("1/1/2000"

pandas.pydata.org

 

Time Series

# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Setting index
df.set_index('Date', inplace=True)

# Resampling
df.resample('M').mean()  # 'D': day, 'W': week, 'M': month, 'Q': quarter, 'Y': year

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

 

Time series / date functionality — pandas 2.2.2 documentation

Time series / date functionality pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python

pandas.pydata.org

 

 

 

 

판다스 치트 시트 경로: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

10 minutes to pandas: https://pandas.pydata.org/docs/user_guide/10min.html