Skip to main content

Introduction to pandas ๐Ÿผ

Mentor's Note: Before pandas, analysing data in Python meant writing hundreds of lines. pandas collapsed that to five. Once you understand what it does, you'll wonder how anyone worked without it. ๐Ÿ’ก

What You'll Learn

By the end of this tutorial, you'll know:

  • What pandas is and why it exists (not just "it's a library")
  • How to install pandas and import it correctly โ€” including the pd alias
  • How to create your first DataFrame from a Python dictionary
  • When to use pandas instead of plain Python lists

๐ŸŒŸ The Scenario: The Excel Superpowerโ€‹

Your school has just handed you a spreadsheet with 10,000 student records โ€” names, marks, attendance, and city. Finding the class average in Excel takes a few clicks and a formula. But what if you need to:

  • Find the average marks for students only from Surat?
  • Identify who scored below 35 in more than two subjects?
  • Export a cleaned version without absent students?

With pandas, all three are fewer than 10 lines of Python. That's why every data analyst, data scientist, and CBSE Class 12 student learns pandas.


๐Ÿ“– Concept Explanationโ€‹

1. What is pandas?โ€‹

pandas is a Python library for data manipulation and analysis. It provides two core data structures:

  • Series โ€” a labelled 1D array (like a single column from a spreadsheet)
  • DataFrame โ€” a labelled 2D table (like an entire spreadsheet)

It was created by Wes McKinney in 2008 and is built on top of NumPy.

2. Why use pandas over plain Python lists?โ€‹

FeaturePython Listpandas
Column labelsโŒ Noโœ… Yes
Filter rows by conditionVerbose loopdf[df['marks'] > 80]
Handle missing dataManualBuilt-in dropna(), fillna()
Read CSV in one lineNeeds csv modulepd.read_csv('file.csv')
Statistical summariesManualdf.describe()

3. Where is pandas used?โ€‹

  • CBSE Class 12 Python practical exams
  • Finance and banking (stock analysis, risk models)
  • Government data portals and journalism
  • Machine learning pipelines (data preprocessing)

๐ŸŽจ Visual Logicโ€‹


๐Ÿ’ป Implementationโ€‹

Run this in your terminal (not inside Python):

pip install pandas
# Recommended: install NumPy alongside it
pip install pandas numpy

Verify installation:

pip show pandas

๐ŸŽฏ Practice Lab ๐Ÿงชโ€‹

Task: Your First pandas Program
  1. Install pandas in your Python environment.
  2. Import it and print its version.
  3. Create a dictionary with 3 student names and their ages.
  4. Convert it to a DataFrame and print it.
  5. Print the average age using .mean().

Hint: The column name for ages is just a dictionary key. Access it with df['age'].


โ“ Frequently Asked Questionsโ€‹

Q: Why do we write import pandas as pd and not just import pandas?

You can write import pandas โ€” it works. But then every method call becomes pandas.DataFrame(...), pandas.read_csv(...), etc. The alias pd is the universal shorthand used in every textbook, job interview, Stack Overflow answer, and CBSE exam. If you use pandas instead of pd, your code will confuse other developers immediately.

Q: What's the difference between a Series and a DataFrame?

A Series is one column โ€” a 1D labelled array. A DataFrame is a full table โ€” multiple Series sharing the same index. Think of a Series as one column from a spreadsheet, and a DataFrame as the entire sheet.

Q: Does pandas work without NumPy?

No. pandas is built on top of NumPy and requires it. When you pip install pandas, NumPy is installed automatically as a dependency. You don't need to install it separately, but many data science workflows import both.

Q: CBSE exam โ€” what is the standard import statement for pandas?

import pandas as pd โ€” this exact line is expected in CBSE Class 12 practical exams. Using any other alias will be marked incorrect in most board examinations.


โœ… Summaryโ€‹

In this tutorial, you've learned:

  • โœ… pandas is a Python library built on NumPy for data manipulation โ€” it gives your data a labelled table structure
  • โœ… The two core structures are Series (1D) and DataFrame (2D)
  • โœ… import pandas as pd is the industry standard โ€” use pd, not pandas
  • โœ… You can create a DataFrame from a dictionary in one line with pd.DataFrame(data)
  • โœ… Use pandas when you need column labels, filtering by condition, or built-in statistics like .mean()

๐Ÿ’ก Interview & Exam Tipsโ€‹

Q: What is the standard alias for importing pandas?

import pandas as pd โ€” always use pd. CBSE exams and every industry tutorial expect this exact alias.

Q: What are the two main data structures in pandas?

Series (1D labelled array) and DataFrame (2D labelled table). A DataFrame is a collection of Series sharing the same index.

Q: Who created pandas and in which year?

Wes McKinney, 2008. He created it while working at AQR Capital Management to solve financial data analysis problems.

Q: Which library is pandas built on top of?

NumPy. pandas arrays are backed by NumPy arrays, which is why NumPy operations are so fast inside pandas.


๐Ÿ“š Further Readingโ€‹

Continue your learning path:

Go deeper: