Introduction to pandas πΌΒΆ
Prerequisites: Python Lists, Python pip basics
Mentor's Note: Before pandas, analysing data in Python meant writing hundreds of lines. pandas collapsed that to five. Once you understand what it does, you'll wonder how anyone worked without it. π‘
What You'll Learn
By the end of this tutorial, you'll know:
- What pandas is and why it exists (not just "it's a library")
- How to install pandas and import it correctly β including the
pdalias - How to create your first DataFrame from a Python dictionary
- When to use pandas instead of plain Python lists
π The Scenario: The Excel SuperpowerΒΆ
Your school has just handed you a spreadsheet with 10,000 student records β names, marks, attendance, and city. Finding the class average in Excel takes a few clicks and a formula. But what if you need to:
- Find the average marks for students only from Surat?
- Identify who scored below 35 in more than two subjects?
- Export a cleaned version without absent students?
With pandas, all three are fewer than 10 lines of Python. That's why every data analyst, data scientist, and CBSE Class 12 student learns pandas.
π Concept ExplanationΒΆ
1. What is pandas?ΒΆ
pandas is a Python library for data manipulation and analysis. It provides two core data structures:
- Series β a labelled 1D array (like a single column from a spreadsheet)
- DataFrame β a labelled 2D table (like an entire spreadsheet)
It was created by Wes McKinney in 2008 and is built on top of NumPy.
2. Why use pandas over plain Python lists?ΒΆ
| Feature | Python List | pandas |
|---|---|---|
| Column labels | β No | β Yes |
| Filter rows by condition | Verbose loop | df[df['marks'] > 80] |
| Handle missing data | Manual | Built-in dropna(), fillna() |
| Read CSV in one line | Needs csv module | pd.read_csv('file.csv') |
| Statistical summaries | Manual | df.describe() |
3. Where is pandas used?ΒΆ
- CBSE Class 12 Python practical exams
- Finance and banking (stock analysis, risk models)
- Government data portals and journalism
- Machine learning pipelines (data preprocessing)
π¨ Visual LogicΒΆ
mindmap
root((pandas))
Series
1D labelled array
Index + Values
DataFrame
2D table
Rows and Columns
I/O
CSV
Excel
JSON
SQL
Operations
GroupBy
Merge/Join
Pivot Table
π» ImplementationΒΆ
Run this in your terminal (not inside Python):
Verify installation:
import pandas as pd # pd is the universal alias
import numpy as np # often used alongside pandas
print(pd.__version__) # e.g., 2.1.0
print(np.__version__) # e.g., 1.26.0
# Output:
# 2.1.0
# 1.26.0
Why as pd?
import pandas as pd is the industry standard alias. Every tutorial, book, and job uses pd. CBSE exams also expect this alias.
Open a terminal, type python3, and try these line by line β results appear instantly.
>>> import pandas as pd
>>> pd.__version__
'2.1.0'
>>> import numpy as np
>>> np.__version__
'1.26.0'
>>> data = {'name': ['Vishnu', 'Ankit'], 'marks': [95, 82]}
>>> df = pd.DataFrame(data)
>>> df
name marks
0 Vishnu 95
1 Ankit 82
>>> df['marks'].mean()
88.5
New to the REPL?
Each >>> is something you type. The line below it is what Python prints back. Type exit() to leave.
import pandas as pd
# Create a dictionary of student data
data = {
'name': ['Vishnu', 'Ankit', 'Priya'],
'marks': [95, 82, 70],
'city': ['Surat', 'Mumbai', 'Delhi']
}
# Convert to a DataFrame β a 2D table
df = pd.DataFrame(data)
print(df)
# Output:
# name marks city
# 0 Vishnu 95 Surat
# 1 Ankit 82 Mumbai
# 2 Priya 70 Delhi
# One-line average:
print(df['marks'].mean()) # 82.33333...
π― Practice Lab π§ͺΒΆ
Task: Your First pandas Program
- Install pandas in your Python environment.
- Import it and print its version.
- Create a dictionary with 3 student names and their ages.
- Convert it to a DataFrame and print it.
- Print the average age using
.mean().
Hint: The column name for ages is just a dictionary key. Access it with df['age'].
β Frequently Asked QuestionsΒΆ
Q: Why do we write import pandas as pd and not just import pandas?
You can write import pandas β it works. But then every method call becomes pandas.DataFrame(...), pandas.read_csv(...), etc. The alias pd is the universal shorthand used in every textbook, job interview, Stack Overflow answer, and CBSE exam. If you use pandas instead of pd, your code will confuse other developers immediately.
Q: What's the difference between a Series and a DataFrame?
A Series is one column β a 1D labelled array. A DataFrame is a full table β multiple Series sharing the same index. Think of a Series as one column from a spreadsheet, and a DataFrame as the entire sheet.
Q: Does pandas work without NumPy?
No. pandas is built on top of NumPy and requires it. When you pip install pandas, NumPy is installed automatically as a dependency. You don't need to install it separately, but many data science workflows import both.
Q: CBSE exam β what is the standard import statement for pandas?
import pandas as pd β this exact line is expected in CBSE Class 12 practical exams. Using any other alias will be marked incorrect in most board examinations.
β SummaryΒΆ
In this tutorial, you've learned:
- β pandas is a Python library built on NumPy for data manipulation β it gives your data a labelled table structure
- β
The two core structures are
Series(1D) andDataFrame(2D) - β
import pandas as pdis the industry standard β usepd, notpandas - β
You can create a DataFrame from a dictionary in one line with
pd.DataFrame(data) - β
Use pandas when you need column labels, filtering by condition, or built-in statistics like
.mean()
π‘ Interview & Exam TipsΒΆ
Q: What is the standard alias for importing pandas?
import pandas as pd β always use pd. CBSE exams and every industry tutorial expect this exact alias.
Q: What are the two main data structures in pandas?
Series (1D labelled array) and DataFrame (2D labelled table). A DataFrame is a collection of Series sharing the same index.
Q: Who created pandas and in which year?
Wes McKinney, 2008. He created it while working at AQR Capital Management to solve financial data analysis problems.
Q: Which library is pandas built on top of?
NumPy. pandas arrays are backed by NumPy arrays, which is why NumPy operations are so fast inside pandas.
π Further ReadingΒΆ
Continue your learning path:
- β Data Science Roadmap β overview of the full pandas learning path
- Next: pandas Series β β learn the 1D building block before tackling DataFrames
Go deeper:
- Official pandas docs β 10 minutes to pandas β the best official quick-start
- DataFrame Basics β once you know Series, this is your next stop
- CSV with pandas β read real data from files instead of typing it by hand