Indexing & Selection in pandas 🔍

Q: What is the key difference between `loc` and `iloc`?

`loc` is **label-based** — it uses the actual index labels (like `'Vishnu'` or `0, 1, 2`) and column names. `iloc` is **integer position-based** — it always uses `0, 1, 2...` positions regardless of the actual index labels. Use `iloc` when you want the "5th row", use `loc` when you want the row with index label `5`.

Q: Is the end of a slice inclusive in `loc`?

Yes — `df.loc[0:2]` includes the row at label `2` (returns 3 rows). This is **different** from Python list slicing and `iloc`, where `df.iloc[0:2]` is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.

Q: Why do you need double brackets `df[['col']]` to get a DataFrame?

Single brackets `df['col']` returns a **Series** (one column). Double brackets `df[['col']]` pass a *list* of column names, so pandas returns a **DataFrame** even with one column. The outer `[]` is the indexing operator; the inner `[]` creates a Python list.

Q: Why use `&` instead of `and` in boolean filtering?

`and` operates on Python scalars — it can't compare element-wise on a whole column. `&` is the bitwise AND operator which works **element-wise** on pandas boolean Series. For example: `df[(df['marks'] > 80) & (df['city'] == 'Surat')]`. Always wrap each condition in parentheses to avoid operator precedence bugs.

CSV Files with pandas is a core Python concept covering master row and column selection in pandas DataFrames using loc (label-based) and iloc (position-based). Learn to filter rows with boolean conditions. This topic is essential for academic learning, board exam preparation, and developing optimized real-world code.

Mentor's Note: Selecting the right data is 80% of data analysis. Master loc and iloc and you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. 💡

What You'll Learn

By the end of this tutorial, you'll know:

The difference between loc (label-based) and iloc (position-based) — and when to use each
How to filter rows using boolean conditions and combine them with & and |
How to select one column (Series) vs multiple columns (DataFrame) — the double-bracket trap
Why loc slices are inclusive at both ends while iloc slices are exclusive at the end

🌟 The Scenario: The Hotel Room Booking System

A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:

By room name (e.g., "Deluxe Suite A") — this is loc (label-based)
By room number (e.g., room #12) — this is iloc (integer position-based)

Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.

📖 Concept Explanation

1. `loc` — Label-Based Selection

df.loc[row_label, column_label]

Uses actual index labels and column names
The slice endpoint is inclusive (unlike Python slices)
Works with custom string indexes

2. `iloc` — Integer Position-Based Selection

df.iloc[row_position, column_position]

Uses integer positions (0, 1, 2...)
The slice endpoint is exclusive (like Python slices)
Always works regardless of the index labels

3. Boolean Filtering

df[condition]

condition is a boolean Series (True/False per row)
Returns only rows where condition is True
Multiple conditions: use & (and), | (or) — not and/or

4. Selecting Columns

Syntax	Returns	When to use
`df['name']`	Series	Single column
`df[['name', 'marks']]`	DataFrame	Multiple columns (double brackets!)

🎨 Visual Logic

💻 Implementation

1. loc — Label-Based
2. iloc — Position-Based
3. Boolean Filtering
4. Column Selection
5. Interactive REPL

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell — row label 0, column 'name'
print(df.loc[0, 'name'])        # Vishnu

# Single row — all columns
print(df.loc[1])
# name     Ankit
# marks       82
# grade        B

# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell — row 0, column 1 (marks)
print(df.iloc[0, 1])            # 95

# First 3 rows, first 2 columns (EXCLUSIVE end)
print(df.iloc[0:3, 0:2])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

# Last row
print(df.iloc[-1])
# name     Sara
# marks      88
# grade       A

import pandas as pd

df = pd.DataFrame({
    'name':     ['Vishnu', 'Ankit', 'Priya', 'Sara', 'Raj'],
    'marks':    [95, 82, 70, 88, 60],
    'city':     ['Surat', 'Mumbai', 'Surat', 'Delhi', 'Mumbai'],
    'passed':   [True, True, True, True, False]
})

# Single condition
toppers = df[df['marks'] > 80]
print(toppers)
# Output:
#      name  marks     city  passed
# 0  Vishnu     95    Surat    True
# 1   Ankit     82   Mumbai    True
# 3    Sara     88    Delhi    True

# Multiple conditions — use & and |, with parentheses!
surat_toppers = df[(df['marks'] > 80) & (df['city'] == 'Surat')]
print(surat_toppers)
# Output:
#      name  marks   city  passed
# 0  Vishnu     95  Surat    True

# Filter by boolean column
passed_students = df[df['passed'] == True]
print(passed_students['name'].tolist())
# Output: ['Vishnu', 'Ankit', 'Priya', 'Sara']

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya'],
    'marks':  [95, 82, 70],
    'grade':  ['A', 'B', 'C'],
    'city':   ['Surat', 'Mumbai', 'Delhi']
})

# Single column → returns a Series
names = df['name']
print(type(names))   # <class 'pandas.core.series.Series'>

# Multiple columns → returns a DataFrame (double brackets!)
subset = df[['name', 'marks']]
print(type(subset))  # <class 'pandas.core.frame.DataFrame'>
print(subset)
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

Open a terminal, type python3, and explore indexing line by line.

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'], 'marks': [95, 82, 70, 88]})
>>> df.loc[0, 'name']
'Vishnu'
>>> df.iloc[0, 1]
95
>>> df[df['marks'] > 80]
     name  marks
0  Vishnu     95
1   Ankit     82
3    Sara     88
>>> df[['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
3    Sara     88
>>> df.loc[0:2, ['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
>>> df.iloc[0:2, 0:2]
     name  marks
0  Vishnu     95
1   Ankit     82

New to the REPL?

Type python3 in your terminal. Each >>> is what you type; the line below is Python's response. Notice loc[0:2] gives 3 rows (inclusive) but iloc[0:2] gives 2 rows (exclusive) — that's the key difference to remember!

📊 Sample Dry Run

DataFrame: df with columns name, marks — 4 rows (index 0–3)

Expression: df.loc[1:2, ['name', 'marks']]

Step	Action	Result
1	`loc` uses labels	Row labels 1 and 2 (inclusive)
2	Column filter	Keep only `'name'` and `'marks'`
3	Return subset	Row 1: Ankit, 82 / Row 2: Priya, 70

     name  marks
1   Ankit     82
2   Priya     70

🎯 Practice Lab 🧪

Task: Product Selection

Create a DataFrame of 6 products:

products = pd.DataFrame({
    'name':     ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
    'price':    [55000, 25000, 40000, 3000, 30000, 20000],
    'stock':    [10, 50, 8, 200, 15, 5]
})

Then:

Use loc to select rows 1 to 3, columns name and price.
Use iloc to select the first 3 rows and first 2 columns.
Filter products with price > 25000.
Filter products in category 'Electronics' with stock < 15.
Select only the name and category columns as a DataFrame.

📚 Best Practices & Common Mistakes

✅ Best Practices

Always use .loc[] or .iloc[] explicitly — df[0] is ambiguous (it tries column label 0). Be explicit: df.iloc[0] for the first row
Use .copy() after filtering — subset = df[df['marks'] > 80].copy() prevents SettingWithCopyWarning when you modify subset later
Chain with .reset_index(drop=True) — after filtering, index labels become non-contiguous. Use .reset_index(drop=True) to get a clean 0-based index

❌ Common Mistakes

df[['col']] vs df['col'] — single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why .shape shows one dimension
Using and/or instead of &/| — df[df['marks'] > 80 and df['city'] == 'Surat'] raises ValueError. Always use &, |, and wrap each condition in parentheses
loc endpoint inclusive, iloc exclusive — df.loc[0:2] gives 3 rows (0, 1, 2). df.iloc[0:2] gives 2 rows (0, 1). This inconsistency trips up everyone once

❓ Frequently Asked Questions

Q: What is the key difference between loc and iloc?

loc is label-based — it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based — it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5th row", use loc when you want the row with index label 5.

Q: Is the end of a slice inclusive in loc?

Yes — df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.

Q: Why use & instead of and in boolean filtering?

and operates on Python scalars — it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.

✅ Summary

In this tutorial, you've learned:

✅ loc[row_label, col_label] selects by label — slices are inclusive at both ends
✅ iloc[row_pos, col_pos] selects by integer position — slices are exclusive at the end
✅ Boolean filtering df[condition] returns only rows where the condition is True
✅ Combine conditions with & (and) and | (or) — never use Python's and/or
✅ df['col'] returns a Series; df[['col']] returns a DataFrame

💡 Interview & Exam Tips

Q: What is the key difference between loc and iloc?

loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.

Q: Is the end of a slice inclusive in loc?

Yes — df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive — it returns only rows 0 and 1.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.

Q: Why use & instead of and in boolean filtering?

and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.

📚 Further Reading

Continue your learning path:

← DataFrame Basics — you need DataFrames before you can index them
Next: CSV with pandas → — apply loc, iloc, and filtering on real CSV data

Go deeper:

Official pandas docs — Indexing — complete reference for all selection methods
DataFrame Basics — revisit df.info() and df.shape before filtering
pandas Series — boolean filtering on a Series works the same way

Indexing & Selection in pandas 🔍

🌟 The Scenario: The Hotel Room Booking System

📖 Concept Explanation

1. `loc` — Label-Based Selection

2. `iloc` — Integer Position-Based Selection

3. Boolean Filtering

4. Selecting Columns

🎨 Visual Logic

💻 Implementation

📊 Sample Dry Run

🎯 Practice Lab 🧪

📚 Best Practices & Common Mistakes

✅ Best Practices

❌ Common Mistakes

❓ Frequently Asked Questions

✅ Summary

💡 Interview & Exam Tips

📚 Further Reading

📍 Visit Us

🏫 VD Computer Tuition Surat

Computer Classes & Tuition — Areas We Serve in Surat

🌟 The Scenario: The Hotel Room Booking System​

📖 Concept Explanation​

1. loc — Label-Based Selection​

2. iloc — Integer Position-Based Selection​

3. Boolean Filtering​

4. Selecting Columns​

🎨 Visual Logic​

💻 Implementation​

📊 Sample Dry Run​

🎯 Practice Lab 🧪​

📚 Best Practices & Common Mistakes​

✅ Best Practices​

❌ Common Mistakes​

❓ Frequently Asked Questions​

✅ Summary​

💡 Interview & Exam Tips​

📚 Further Reading​

🔗 Related Topics​

📍 Visit Us

🏫 VD Computer Tuition Surat

Computer Classes & Tuition — Areas We Serve in Surat

🌟 The Scenario: The Hotel Room Booking System

📖 Concept Explanation

1. `loc` — Label-Based Selection

2. `iloc` — Integer Position-Based Selection

3. Boolean Filtering

4. Selecting Columns

🎨 Visual Logic

💻 Implementation

📊 Sample Dry Run

🎯 Practice Lab 🧪

📚 Best Practices & Common Mistakes

✅ Best Practices

❌ Common Mistakes

❓ Frequently Asked Questions

✅ Summary

💡 Interview & Exam Tips

📚 Further Reading

🔗 Related Topics