pandas Series 📈¶

Python Professional PathData Science with pandas

Prerequisites: Introduction to pandas, Python Lists and Dictionaries

Mentor's Note: A Series is the simplest building block in pandas. Before you can understand a DataFrame (a full table), you need to understand a Series (a single column). Think of it as a Python list that knows the name of every item. 💡

What You'll Learn

By the end of this tutorial, you'll know:

What a pandas Series is and how it differs from a Python list
How to create a Series from a list (default index) and from a dictionary (custom index)
How to filter, slice, and do math on a Series without writing any loops
The common mistake of assuming s[0] always works when you have a custom index

🌟 The Scenario: The Marks Register Column¶

Your teacher has a register book. The left column has student names, the right column has marks. If you tear out just the marks column and keep the names as labels — that's a pandas Series.

The Logic:
- Values: [95, 82, 70] — the actual data
- Index: ['Vishnu', 'Ankit', 'Priya'] — the labels
- Name: 'marks' — what this column represents
The Result: A 1D array where every value has a meaningful label. ✅

📖 Concept Explanation¶

1. What is a Series?¶

A pandas Series is a one-dimensional labelled array. It has:

Values — the data (numbers, strings, booleans, etc.)
Index — labels for each value (default: 0, 1, 2... or custom strings)
dtype — the data type of the values (int64, float64, object, etc.)
name — an optional name for the Series

2. Default Index vs Custom Index¶

	Default Index	Custom Index
How	Created from a list	Created from a dictionary
Labels	0, 1, 2, ...	'Vishnu', 'Ankit', ...
Access	`s[0]`	`s['Vishnu']`

3. Series vs Python List¶

Feature	Python List	pandas Series
Labels	❌ Index only	✅ Custom labels
Vectorised math	❌ Needs loop	✅ `s * 2`, `s + 10`
Built-in stats	❌ Needs code	✅ `.mean()`, `.max()`
Filter by condition	Verbose	`s[s > 80]`

🎨 Visual Logic¶

graph LR
    subgraph Default Index
        A0["Index: 0"] --> V0["Value: 95"]
        A1["Index: 1"] --> V1["Value: 82"]
        A2["Index: 2"] --> V2["Value: 70"]
    end
    subgraph Custom Index
        B0["Index: 'Vishnu'"] --> W0["Value: 95"]
        B1["Index: 'Ankit'"] --> W1["Value: 82"]
        B2["Index: 'Priya'"] --> W2["Value: 70"]
    end

💻 Implementation¶

1. From a List (Default Index)2. From a Dictionary (Custom Index)3. Key Methods & Operations4. Interactive REPL4. Slicing

import pandas as pd

# Create a Series from a list
marks = pd.Series([95, 82, 70])

print(marks)
# 0    95
# 1    82
# 2    70
# dtype: int64

print(marks.index)   # RangeIndex(start=0, stop=3, step=1)
print(marks.values)  # [95 82 70]
print(marks.dtype)   # int64

import pandas as pd

# Dictionary keys become the index (labels)
marks = pd.Series({'Vishnu': 95, 'Ankit': 82, 'Priya': 70})

print(marks)
# Vishnu    95
# Ankit     82
# Priya     70
# dtype: int64

# Access by label
print(marks['Vishnu'])  # 95

# Give the Series a name
marks.name = 'marks'

import pandas as pd

marks = pd.Series({'Vishnu': 95, 'Ankit': 82, 'Priya': 70, 'Sara': 88})

# Statistical methods
print(marks.mean())   # 83.75
print(marks.max())    # 95
print(marks.min())    # 70
print(marks.sum())    # 335

# Vectorised operations (no loop needed!)
bonus = marks + 5     # Add 5 to every value
scaled = marks * 1.1  # Scale all marks by 10%

# Boolean filtering — select marks above 80
toppers = marks[marks > 80]
print(toppers)
# Output:
# Vishnu    95
# Ankit     82
# Sara      88
# dtype: int64

Try this in your Python shell — no file needed.

>>> import pandas as pd
>>> marks = pd.Series({'Vishnu': 95, 'Ankit': 82, 'Priya': 70})
>>> marks
Vishnu    95
Ankit     82
Priya     70
dtype: int64
>>> marks.mean()
82.33333333333333
>>> marks[marks > 80]
Vishnu    95
Ankit     82
dtype: int64
>>> marks['Vishnu']
95
>>> marks + 5
Vishnu    100
Ankit      87
Priya      75
dtype: int64

New to the REPL?

Type python3 in your terminal to start. Each >>> is what you type; the line below is Python's response. Type exit() to quit.

import pandas as pd

marks = pd.Series({'Vishnu': 95, 'Ankit': 82, 'Priya': 70, 'Sara': 88})

# Slice by position (like a list)
print(marks[0:2])
# Vishnu    95
# Ankit     82

# Slice by label (inclusive on both ends)
print(marks['Ankit':'Sara'])
# Ankit    82
# Priya    70
# Sara     88

📊 Sample Dry Run¶

Series: marks = pd.Series({'Vishnu': 95, 'Ankit': 82, 'Priya': 70, 'Sara': 88})

Expression: marks[marks > 80]

Step	Action	Result
1	Evaluate `marks > 80`	`Vishnu: True, Ankit: True, Priya: False, Sara: True`
2	Use boolean mask to filter	Keep rows where mask is `True`
3	Return filtered Series	`Vishnu: 95, Ankit: 82, Sara: 88`

🎯 Practice Lab 🧪¶

Task: City Population Analysis

Create a pandas Series of 5 Indian city populations (in millions):

Surat: 7.8, Mumbai: 20.7, Delhi: 31.2, Chennai: 10.9, Kolkata: 14.9

Then answer these questions using Series methods:

Which city has the highest population? (idxmax())
What is the average population? (.mean())
Filter cities with population above 12 million.
Add 0.5 million to every city (vectorised addition).

Hint: Pass a dictionary to pd.Series() with city names as keys.

❓ Frequently Asked Questions¶

Q: I created a Series with string labels. Why does s[0] give a KeyError?

When you create a Series from a dictionary, the keys become the index. If your index is ['Vishnu', 'Ankit', 'Priya'], then s[0] looks for a label 0 — which doesn't exist. Use s.iloc[0] to access by position, or s['Vishnu'] to access by label. This is one of the most common beginner mistakes.

Q: What's the difference between label-based slicing and position-based slicing?

With a custom string index, s['Ankit':'Sara'] includes both endpoints (inclusive). With integer position slicing s[0:2], the end is exclusive (like a Python list). This inconsistency trips up many beginners — use .loc[] for label-based and .iloc[] for position-based access to be explicit.

Q: Can a Series hold mixed data types?

Yes, but the dtype will become object (like a Python list). Performance and memory use suffer because pandas can't use optimised numeric operations. Try to keep Series columns uniform — all integers, all floats, or all strings.

Q: CBSE exam — how do you create a Series from a dictionary?

s = pd.Series({'key1': val1, 'key2': val2}) — dictionary keys become the index labels. This is the standard CBSE answer.

✅ Summary¶

In this tutorial, you've learned:

✅ A pandas Series is a 1D labelled array — values plus an index
✅ Create from a list (default 0-based index) or a dictionary (keys become labels)
✅ Vectorised operations like s + 5 or s * 1.1 work on every element — no loop needed
✅ Boolean filtering s[s > 80] returns a new Series with only matching values
✅ s[0] fails with a custom string index — use s.iloc[0] for position, s['label'] for labels

💡 Interview & Exam Tips¶

Q: What is the default index of a Series created from a list?

Integer-based RangeIndex starting from 0: 0, 1, 2, ...

Q: What is the difference between a Python list and a pandas Series?

A Series has a labelled index, supports vectorised math (s * 2), and has built-in statistical methods (.mean(), .max()). A list has none of these.

Q: How do you create a Series with custom labels?

Pass a dictionary: pd.Series({'Vishnu': 95, 'Ankit': 82}) — keys become the index labels.

Q: What does s[s > 80] return?

A new Series containing only the values where the condition is True — this is called boolean indexing.

📚 Further Reading¶

Continue your learning path:

← Introduction to pandas — install pandas and understand why it exists
Next: DataFrame Basics → — a DataFrame is just multiple Series sharing an index

Go deeper:

Official pandas docs — Series — full method reference
Indexing & Selection — master .loc[] and .iloc[] to avoid index confusion
CSV with pandas — load real data and each column becomes a Series automatically