Skip to content

Introduction to pandas 🐼¢

Python Professional PathData Science with pandas

Prerequisites: Python Lists, Python pip basics

Mentor's Note: Before pandas, analysing data in Python meant writing hundreds of lines. pandas collapsed that to five. Once you understand what it does, you'll wonder how anyone worked without it. πŸ’‘

What You'll Learn

By the end of this tutorial, you'll know:

  • What pandas is and why it exists (not just "it's a library")
  • How to install pandas and import it correctly β€” including the pd alias
  • How to create your first DataFrame from a Python dictionary
  • When to use pandas instead of plain Python lists

🌟 The Scenario: The Excel Superpower¢

Your school has just handed you a spreadsheet with 10,000 student records β€” names, marks, attendance, and city. Finding the class average in Excel takes a few clicks and a formula. But what if you need to:

  • Find the average marks for students only from Surat?
  • Identify who scored below 35 in more than two subjects?
  • Export a cleaned version without absent students?

With pandas, all three are fewer than 10 lines of Python. That's why every data analyst, data scientist, and CBSE Class 12 student learns pandas.


πŸ“– Concept ExplanationΒΆ

1. What is pandas?ΒΆ

pandas is a Python library for data manipulation and analysis. It provides two core data structures:

  • Series β€” a labelled 1D array (like a single column from a spreadsheet)
  • DataFrame β€” a labelled 2D table (like an entire spreadsheet)

It was created by Wes McKinney in 2008 and is built on top of NumPy.

2. Why use pandas over plain Python lists?ΒΆ

Feature Python List pandas
Column labels ❌ No βœ… Yes
Filter rows by condition Verbose loop df[df['marks'] > 80]
Handle missing data Manual Built-in dropna(), fillna()
Read CSV in one line Needs csv module pd.read_csv('file.csv')
Statistical summaries Manual df.describe()

3. Where is pandas used?ΒΆ

  • CBSE Class 12 Python practical exams
  • Finance and banking (stock analysis, risk models)
  • Government data portals and journalism
  • Machine learning pipelines (data preprocessing)

🎨 Visual Logic¢

mindmap
  root((pandas))
    Series
      1D labelled array
      Index + Values
    DataFrame
      2D table
      Rows and Columns
    I/O
      CSV
      Excel
      JSON
      SQL
    Operations
      GroupBy
      Merge/Join
      Pivot Table

πŸ’» ImplementationΒΆ

Run this in your terminal (not inside Python):

pip install pandas
# Recommended: install NumPy alongside it
pip install pandas numpy

Verify installation:

pip show pandas

import pandas as pd      # pd is the universal alias
import numpy as np       # often used alongside pandas

print(pd.__version__)    # e.g., 2.1.0
print(np.__version__)    # e.g., 1.26.0

# Output:
# 2.1.0
# 1.26.0

Why as pd?

import pandas as pd is the industry standard alias. Every tutorial, book, and job uses pd. CBSE exams also expect this alias.

Open a terminal, type python3, and try these line by line β€” results appear instantly.

>>> import pandas as pd
>>> pd.__version__
'2.1.0'
>>> import numpy as np
>>> np.__version__
'1.26.0'
>>> data = {'name': ['Vishnu', 'Ankit'], 'marks': [95, 82]}
>>> df = pd.DataFrame(data)
>>> df
     name  marks
0  Vishnu     95
1   Ankit     82
>>> df['marks'].mean()
88.5

New to the REPL?

Each >>> is something you type. The line below it is what Python prints back. Type exit() to leave.

import pandas as pd

# Create a dictionary of student data
data = {
    'name':   ['Vishnu', 'Ankit', 'Priya'],
    'marks':  [95, 82, 70],
    'city':   ['Surat', 'Mumbai', 'Delhi']
}

# Convert to a DataFrame β€” a 2D table
df = pd.DataFrame(data)

print(df)
# Output:
#      name  marks    city
# 0  Vishnu     95   Surat
# 1   Ankit     82  Mumbai
# 2   Priya     70   Delhi

# One-line average:
print(df['marks'].mean())   # 82.33333...

🎯 Practice Lab πŸ§ͺΒΆ

Task: Your First pandas Program

  1. Install pandas in your Python environment.
  2. Import it and print its version.
  3. Create a dictionary with 3 student names and their ages.
  4. Convert it to a DataFrame and print it.
  5. Print the average age using .mean().

Hint: The column name for ages is just a dictionary key. Access it with df['age'].


❓ Frequently Asked QuestionsΒΆ

Q: Why do we write import pandas as pd and not just import pandas?

You can write import pandas β€” it works. But then every method call becomes pandas.DataFrame(...), pandas.read_csv(...), etc. The alias pd is the universal shorthand used in every textbook, job interview, Stack Overflow answer, and CBSE exam. If you use pandas instead of pd, your code will confuse other developers immediately.

Q: What's the difference between a Series and a DataFrame?

A Series is one column β€” a 1D labelled array. A DataFrame is a full table β€” multiple Series sharing the same index. Think of a Series as one column from a spreadsheet, and a DataFrame as the entire sheet.

Q: Does pandas work without NumPy?

No. pandas is built on top of NumPy and requires it. When you pip install pandas, NumPy is installed automatically as a dependency. You don't need to install it separately, but many data science workflows import both.

Q: CBSE exam β€” what is the standard import statement for pandas?

import pandas as pd β€” this exact line is expected in CBSE Class 12 practical exams. Using any other alias will be marked incorrect in most board examinations.


βœ… SummaryΒΆ

In this tutorial, you've learned:

  • βœ… pandas is a Python library built on NumPy for data manipulation β€” it gives your data a labelled table structure
  • βœ… The two core structures are Series (1D) and DataFrame (2D)
  • βœ… import pandas as pd is the industry standard β€” use pd, not pandas
  • βœ… You can create a DataFrame from a dictionary in one line with pd.DataFrame(data)
  • βœ… Use pandas when you need column labels, filtering by condition, or built-in statistics like .mean()

πŸ’‘ Interview & Exam TipsΒΆ

Q: What is the standard alias for importing pandas?

import pandas as pd β€” always use pd. CBSE exams and every industry tutorial expect this exact alias.

Q: What are the two main data structures in pandas?

Series (1D labelled array) and DataFrame (2D labelled table). A DataFrame is a collection of Series sharing the same index.

Q: Who created pandas and in which year?

Wes McKinney, 2008. He created it while working at AQR Capital Management to solve financial data analysis problems.

Q: Which library is pandas built on top of?

NumPy. pandas arrays are backed by NumPy arrays, which is why NumPy operations are so fast inside pandas.


πŸ“š Further ReadingΒΆ

Continue your learning path:

Go deeper: