Indexing & Selection in pandas 🔍¶

Python Professional PathData Science with pandas

Prerequisites: DataFrame Basics, Python Boolean Logic

Mentor's Note: Selecting the right data is 80% of data analysis. Master loc and iloc and you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. 💡

What You'll Learn

By the end of this tutorial, you'll know:

The difference between loc (label-based) and iloc (position-based) — and when to use each
How to filter rows using boolean conditions and combine them with & and |
How to select one column (Series) vs multiple columns (DataFrame) — the double-bracket trap
Why loc slices are inclusive at both ends while iloc slices are exclusive at the end

🌟 The Scenario: The Hotel Room Booking System¶

A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:

By room name (e.g., "Deluxe Suite A") — this is loc (label-based)
By room number (e.g., room #12) — this is iloc (integer position-based)

Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.

📖 Concept Explanation¶

1. `loc` — Label-Based Selection¶

df.loc[row_label, column_label]

Uses actual index labels and column names
The slice endpoint is inclusive (unlike Python slices)
Works with custom string indexes

2. `iloc` — Integer Position-Based Selection¶

df.iloc[row_position, column_position]

Uses integer positions (0, 1, 2...)
The slice endpoint is exclusive (like Python slices)
Always works regardless of the index labels

3. Boolean Filtering¶

df[condition]

condition is a boolean Series (True/False per row)
Returns only rows where condition is True
Multiple conditions: use & (and), | (or) — not and/or

4. Selecting Columns¶

Syntax	Returns	When to use
`df['name']`	Series	Single column
`df[['name', 'marks']]`	DataFrame	Multiple columns (double brackets!)

🎨 Visual Logic¶

graph LR
    A[DataFrame] --> B["df.loc[label, label]"]
    A --> C["df.iloc[int, int]"]
    A --> D["df[condition]"]
    B --> E[Label-based subset]
    C --> F[Position-based subset]
    D --> G[Filtered rows]

💻 Implementation¶

1. loc — Label-Based2. iloc — Position-Based3. Boolean Filtering4. Column Selection5. Interactive REPL

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell — row label 0, column 'name'
print(df.loc[0, 'name'])        # Vishnu

# Single row — all columns
print(df.loc[1])
# name     Ankit
# marks       82
# grade        B

# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell — row 0, column 1 (marks)
print(df.iloc[0, 1])            # 95

# First 3 rows, first 2 columns (EXCLUSIVE end)
print(df.iloc[0:3, 0:2])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

# Last row
print(df.iloc[-1])
# name     Sara
# marks      88
# grade       A

import pandas as pd

df = pd.DataFrame({
    'name':     ['Vishnu', 'Ankit', 'Priya', 'Sara', 'Raj'],
    'marks':    [95, 82, 70, 88, 60],
    'city':     ['Surat', 'Mumbai', 'Surat', 'Delhi', 'Mumbai'],
    'passed':   [True, True, True, True, False]
})

# Single condition
toppers = df[df['marks'] > 80]
print(toppers)
# Output:
#      name  marks     city  passed
# 0  Vishnu     95    Surat    True
# 1   Ankit     82   Mumbai    True
# 3    Sara     88    Delhi    True

# Multiple conditions — use & and |, with parentheses!
surat_toppers = df[(df['marks'] > 80) & (df['city'] == 'Surat')]
print(surat_toppers)
# Output:
#      name  marks   city  passed
# 0  Vishnu     95  Surat    True

# Filter by boolean column
passed_students = df[df['passed'] == True]
print(passed_students['name'].tolist())
# Output: ['Vishnu', 'Ankit', 'Priya', 'Sara']

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya'],
    'marks':  [95, 82, 70],
    'grade':  ['A', 'B', 'C'],
    'city':   ['Surat', 'Mumbai', 'Delhi']
})

# Single column → returns a Series
names = df['name']
print(type(names))   # <class 'pandas.core.series.Series'>

# Multiple columns → returns a DataFrame (double brackets!)
subset = df[['name', 'marks']]
print(type(subset))  # <class 'pandas.core.frame.DataFrame'>
print(subset)
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

Open a terminal, type python3, and explore indexing line by line.

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'], 'marks': [95, 82, 70, 88]})
>>> df.loc[0, 'name']
'Vishnu'
>>> df.iloc[0, 1]
95
>>> df[df['marks'] > 80]
     name  marks
0  Vishnu     95
1   Ankit     82
3    Sara     88
>>> df[['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
3    Sara     88
>>> df.loc[0:2, ['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
>>> df.iloc[0:2, 0:2]
     name  marks
0  Vishnu     95
1   Ankit     82

New to the REPL?

Type python3 in your terminal. Each >>> is what you type; the line below is Python's response. Notice loc[0:2] gives 3 rows (inclusive) but iloc[0:2] gives 2 rows (exclusive) — that's the key difference to remember!

📊 Sample Dry Run¶

DataFrame: df with columns name, marks — 4 rows (index 0–3)

Expression: df.loc[1:2, ['name', 'marks']]

Step	Action	Result
1	`loc` uses labels	Row labels 1 and 2 (inclusive)
2	Column filter	Keep only `'name'` and `'marks'`
3	Return subset	Row 1: Ankit, 82 / Row 2: Priya, 70

     name  marks
1   Ankit     82
2   Priya     70

🎯 Practice Lab 🧪¶

Task: Product Selection

Create a DataFrame of 6 products:

products = pd.DataFrame({
    'name':     ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
    'price':    [55000, 25000, 40000, 3000, 30000, 20000],
    'stock':    [10, 50, 8, 200, 15, 5]
})

Then:

Use loc to select rows 1 to 3, columns name and price.
Use iloc to select the first 3 rows and first 2 columns.
Filter products with price > 25000.
Filter products in category 'Electronics' with stock < 15.
Select only the name and category columns as a DataFrame.

📚 Best Practices & Common Mistakes¶

✅ Best Practices¶

Always use .loc[] or .iloc[] explicitly — df[0] is ambiguous (it tries column label 0). Be explicit: df.iloc[0] for the first row
Use .copy() after filtering — subset = df[df['marks'] > 80].copy() prevents SettingWithCopyWarning when you modify subset later
Chain with .reset_index(drop=True) — after filtering, index labels become non-contiguous. Use .reset_index(drop=True) to get a clean 0-based index

❌ Common Mistakes¶

df[['col']] vs df['col'] — single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why .shape shows one dimension
Using and/or instead of &/| — df[df['marks'] > 80 and df['city'] == 'Surat'] raises ValueError. Always use &, |, and wrap each condition in parentheses
loc endpoint inclusive, iloc exclusive — df.loc[0:2] gives 3 rows (0, 1, 2). df.iloc[0:2] gives 2 rows (0, 1). This inconsistency trips up everyone once

❓ Frequently Asked Questions¶

Q: What is the key difference between loc and iloc?

loc is label-based — it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based — it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5^th row", use loc when you want the row with index label 5.

Q: Is the end of a slice inclusive in loc?

Yes — df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.

Q: Why use & instead of and in boolean filtering?

and operates on Python scalars — it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.

✅ Summary¶

In this tutorial, you've learned:

✅ loc[row_label, col_label] selects by label — slices are inclusive at both ends
✅ iloc[row_pos, col_pos] selects by integer position — slices are exclusive at the end
✅ Boolean filtering df[condition] returns only rows where the condition is True
✅ Combine conditions with & (and) and | (or) — never use Python's and/or
✅ df['col'] returns a Series; df[['col']] returns a DataFrame

💡 Interview & Exam Tips¶

Q: What is the key difference between loc and iloc?

loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.

Q: Is the end of a slice inclusive in loc?

Yes — df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive — it returns only rows 0 and 1.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.

Q: Why use & instead of and in boolean filtering?

and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.

📚 Further Reading¶

Continue your learning path:

← DataFrame Basics — you need DataFrames before you can index them
Next: CSV with pandas → — apply loc, iloc, and filtering on real CSV data

Go deeper:

Official pandas docs — Indexing — complete reference for all selection methods
DataFrame Basics — revisit df.info() and df.shape before filtering
pandas Series — boolean filtering on a Series works the same way