Pandas 기본 기능

2024. 7. 29. 20:29ㆍ프로그래밍 (확장)/Python-Pandas

Pandas

Importing Pandas

import pandas as pd

Creating DataFrames

# From a dictionary
data = {'Column1': [1, 2, 3, 4], 'Column2': ['a', 'b', 'c', 'd']}
df = pd.DataFrame(data)

# From a CSV file
df = pd.read_csv('file_path.csv')

# From an Excel file
df = pd.read_excel('file_path.xlsx', sheet_name='Sheet1')

https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html

Intro to data structures — pandas 2.2.2 documentation

Intro to data structures We’ll start with a quick, non-comprehensive overview of the fundamental data structures in pandas to get you started. The fundamental behavior about data types, indexing, axis labeling, and alignment apply across all of the objec

pandas.pydata.org

Viewing Data

df.head()  # 첫 5행
df.tail()  # 마지막 5행
df.sample(n=5)  # 무작위 샘플 n행
df.info()  # 데이터프레임 정보
df.describe()  # 기술 통계

https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#viewing-data

Essential basic functionality — pandas 2.2.2 documentation

Essential basic functionality Here we discuss a lot of the essential functionality common to the pandas data structures. To begin, let’s create some example objects like we did in the 10 minutes to pandas section: In [1]: index = pd.date_range("1/1/2000"

pandas.pydata.org

Selecting Data

# Column selection
df['Column1']  # 시리즈 반환
df[['Column1', 'Column2']]  # 데이터프레임 반환

# Row selection by index
df.iloc[0]  # 첫 번째 행 (integer-location based)
df.loc[0]  # 첫 번째 행 (label-based)

# Conditional selection
df[df['Column1'] > 2]  # 조건 만족하는 행 선택

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Indexing and selecting data — pandas 2.2.2 documentation

Indexing and selecting data The axis labeling information in pandas objects serves many purposes: Identifies data (i.e. provides metadata) using known indicators, important for analysis, visualization, and interactive console display. Enables automatic and

pandas.pydata.org

Data Cleaning

# Handling missing values
df.dropna()  # 결측값 있는 행 삭제
df.fillna(value)  # 결측값 채우기

# Duplicates
df.drop_duplicates()  # 중복 행 제거

# Renaming columns
df.rename(columns={'OldName': 'NewName'}, inplace=True)

# Changing data types
df['Column1'] = df['Column1'].astype('int')

https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html

Working with missing data — pandas 2.2.2 documentation

Starting from pandas 1.0, an experimental NA value (singleton) is available to represent scalar missing values. The goal of NA is provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending

pandas.pydata.org

Data Manipulation

# Adding a new column
df['NewColumn'] = df['Column1'] + df['Column2']

# Applying functions
df['NewColumn'] = df['Column1'].apply(lambda x: x * 2)

# Grouping data
grouped = df.groupby('Column1').sum()

# Merging DataFrames
df_merged = pd.merge(df1, df2, on='KeyColumn', how='inner')  # 'left', 'right', 'outer' 가능

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Merge, join, concatenate and compare — pandas 2.2.2 documentation

Merge, join, concatenate and compare pandas provides various methods for combining and comparing Series or DataFrame. The concat() function concatenates an arbitrary amount of Series or DataFrame objects along an axis while performing optional set logic (u

pandas.pydata.org

Saving Data

df.to_csv('file_path.csv', index=False)
df.to_excel('file_path.xlsx', sheet_name='Sheet1', index=False)

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

IO tools (text, CSV, HDF5, …) — pandas 2.2.2 documentation

The pandas I/O API is a set of top level reader functions accessed like pandas.read_csv() that generally return a pandas object. The corresponding writer functions are object methods that are accessed like DataFrame.to_csv(). Below is a table containing av

pandas.pydata.org

Useful Functions

# Sorting
df.sort_values(by='Column1', ascending=False)

# Pivot Table
pivot = df.pivot_table(index='Column1', columns='Column2', values='Column3', aggfunc='sum')

# Reset index
df.reset_index(drop=True, inplace=True)

https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html

Essential basic functionality — pandas 2.2.2 documentation

pandas.pydata.org

Time Series

# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Setting index
df.set_index('Date', inplace=True)

# Resampling
df.resample('M').mean()  # 'D': day, 'W': week, 'M': month, 'Q': quarter, 'Y': year

https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

Time series / date functionality — pandas 2.2.2 documentation

Time series / date functionality pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python

pandas.pydata.org

판다스 치트 시트 경로: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

10 minutes to pandas: https://pandas.pydata.org/docs/user_guide/10min.html

'프로그래밍 (확장) > Python-Pandas' 카테고리의 다른 글

판다스 (Pandas) (Pivot) (0)	2024.08.03
판다스 (Pandas) (데이터 정제) (0)	2024.08.03
판다스 (Pandas) (데이터 선택 및 필터링) (0)	2024.08.02
판다스 (Pandas) (데이터 입출력) (0)	2024.08.02
판다스 (Pandas) (소개 및 기본 내용) (0)	2024.08.02

개발_노트

개발_노트

태그

최근글

댓글

공지사항

아카이브

Pandas

Importing Pandas

Creating DataFrames

Viewing Data

Selecting Data

Data Cleaning

Data Manipulation

Saving Data

Useful Functions

Time Series

'프로그래밍 (확장) > Python-Pandas' 카테고리의 다른 글

관련글

티스토리툴바