Skip to content

Indexing & Selection in pandas πŸ”ΒΆ

Python Professional PathData Science with pandas

Prerequisites: DataFrame Basics, Python Boolean Logic

Mentor's Note: Selecting the right data is 80% of data analysis. Master loc and iloc and you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. πŸ’‘

What You'll Learn

By the end of this tutorial, you'll know:

  • The difference between loc (label-based) and iloc (position-based) β€” and when to use each
  • How to filter rows using boolean conditions and combine them with & and |
  • How to select one column (Series) vs multiple columns (DataFrame) β€” the double-bracket trap
  • Why loc slices are inclusive at both ends while iloc slices are exclusive at the end

🌟 The Scenario: The Hotel Room Booking System¢

A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:

  • By room name (e.g., "Deluxe Suite A") β€” this is loc (label-based)
  • By room number (e.g., room #12) β€” this is iloc (integer position-based)

Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.


πŸ“– Concept ExplanationΒΆ

1. loc β€” Label-Based SelectionΒΆ

df.loc[row_label, column_label]

  • Uses actual index labels and column names
  • The slice endpoint is inclusive (unlike Python slices)
  • Works with custom string indexes

2. iloc β€” Integer Position-Based SelectionΒΆ

df.iloc[row_position, column_position]

  • Uses integer positions (0, 1, 2...)
  • The slice endpoint is exclusive (like Python slices)
  • Always works regardless of the index labels

3. Boolean FilteringΒΆ

df[condition]

  • condition is a boolean Series (True/False per row)
  • Returns only rows where condition is True
  • Multiple conditions: use & (and), | (or) β€” not and/or

4. Selecting ColumnsΒΆ

Syntax Returns When to use
df['name'] Series Single column
df[['name', 'marks']] DataFrame Multiple columns (double brackets!)

🎨 Visual Logic¢

graph LR
    A[DataFrame] --> B["df.loc[label, label]"]
    A --> C["df.iloc[int, int]"]
    A --> D["df[condition]"]
    B --> E[Label-based subset]
    C --> F[Position-based subset]
    D --> G[Filtered rows]

πŸ’» ImplementationΒΆ

import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell β€” row label 0, column 'name'
print(df.loc[0, 'name'])        # Vishnu

# Single row β€” all columns
print(df.loc[1])
# name     Ankit
# marks       82
# grade        B

# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70
import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya', 'Sara'],
    'marks':  [95, 82, 70, 88],
    'grade':  ['A', 'B', 'C', 'A']
})

# Single cell β€” row 0, column 1 (marks)
print(df.iloc[0, 1])            # 95

# First 3 rows, first 2 columns (EXCLUSIVE end)
print(df.iloc[0:3, 0:2])
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

# Last row
print(df.iloc[-1])
# name     Sara
# marks      88
# grade       A
import pandas as pd

df = pd.DataFrame({
    'name':     ['Vishnu', 'Ankit', 'Priya', 'Sara', 'Raj'],
    'marks':    [95, 82, 70, 88, 60],
    'city':     ['Surat', 'Mumbai', 'Surat', 'Delhi', 'Mumbai'],
    'passed':   [True, True, True, True, False]
})

# Single condition
toppers = df[df['marks'] > 80]
print(toppers)
# Output:
#      name  marks     city  passed
# 0  Vishnu     95    Surat    True
# 1   Ankit     82   Mumbai    True
# 3    Sara     88    Delhi    True

# Multiple conditions β€” use & and |, with parentheses!
surat_toppers = df[(df['marks'] > 80) & (df['city'] == 'Surat')]
print(surat_toppers)
# Output:
#      name  marks   city  passed
# 0  Vishnu     95  Surat    True

# Filter by boolean column
passed_students = df[df['passed'] == True]
print(passed_students['name'].tolist())
# Output: ['Vishnu', 'Ankit', 'Priya', 'Sara']
import pandas as pd

df = pd.DataFrame({
    'name':   ['Vishnu', 'Ankit', 'Priya'],
    'marks':  [95, 82, 70],
    'grade':  ['A', 'B', 'C'],
    'city':   ['Surat', 'Mumbai', 'Delhi']
})

# Single column β†’ returns a Series
names = df['name']
print(type(names))   # <class 'pandas.core.series.Series'>

# Multiple columns β†’ returns a DataFrame (double brackets!)
subset = df[['name', 'marks']]
print(type(subset))  # <class 'pandas.core.frame.DataFrame'>
print(subset)
# Output:
#      name  marks
# 0  Vishnu     95
# 1   Ankit     82
# 2   Priya     70

Open a terminal, type python3, and explore indexing line by line.

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'], 'marks': [95, 82, 70, 88]})
>>> df.loc[0, 'name']
'Vishnu'
>>> df.iloc[0, 1]
95
>>> df[df['marks'] > 80]
     name  marks
0  Vishnu     95
1   Ankit     82
3    Sara     88
>>> df[['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
3    Sara     88
>>> df.loc[0:2, ['name', 'marks']]
     name  marks
0  Vishnu     95
1   Ankit     82
2   Priya     70
>>> df.iloc[0:2, 0:2]
     name  marks
0  Vishnu     95
1   Ankit     82

New to the REPL?

Type python3 in your terminal. Each >>> is what you type; the line below is Python's response. Notice loc[0:2] gives 3 rows (inclusive) but iloc[0:2] gives 2 rows (exclusive) β€” that's the key difference to remember!


πŸ“Š Sample Dry RunΒΆ

DataFrame: df with columns name, marks β€” 4 rows (index 0–3)

Expression: df.loc[1:2, ['name', 'marks']]

Step Action Result
1 loc uses labels Row labels 1 and 2 (inclusive)
2 Column filter Keep only 'name' and 'marks'
3 Return subset Row 1: Ankit, 82 / Row 2: Priya, 70
     name  marks
1   Ankit     82
2   Priya     70

🎯 Practice Lab πŸ§ͺΒΆ

Task: Product Selection

Create a DataFrame of 6 products:

products = pd.DataFrame({
    'name':     ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
    'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
    'price':    [55000, 25000, 40000, 3000, 30000, 20000],
    'stock':    [10, 50, 8, 200, 15, 5]
})

Then:

  1. Use loc to select rows 1 to 3, columns name and price.
  2. Use iloc to select the first 3 rows and first 2 columns.
  3. Filter products with price > 25000.
  4. Filter products in category 'Electronics' with stock < 15.
  5. Select only the name and category columns as a DataFrame.

πŸ“š Best Practices & Common MistakesΒΆ

βœ… Best PracticesΒΆ

  • Always use .loc[] or .iloc[] explicitly β€” df[0] is ambiguous (it tries column label 0). Be explicit: df.iloc[0] for the first row
  • Use .copy() after filtering β€” subset = df[df['marks'] > 80].copy() prevents SettingWithCopyWarning when you modify subset later
  • Chain with .reset_index(drop=True) β€” after filtering, index labels become non-contiguous. Use .reset_index(drop=True) to get a clean 0-based index

❌ Common Mistakes¢

  • df[['col']] vs df['col'] β€” single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why .shape shows one dimension
  • Using and/or instead of &/| β€” df[df['marks'] > 80 and df['city'] == 'Surat'] raises ValueError. Always use &, |, and wrap each condition in parentheses
  • loc endpoint inclusive, iloc exclusive β€” df.loc[0:2] gives 3 rows (0, 1, 2). df.iloc[0:2] gives 2 rows (0, 1). This inconsistency trips up everyone once

❓ Frequently Asked QuestionsΒΆ

Q: What is the key difference between loc and iloc?

loc is label-based β€” it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based β€” it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5th row", use loc when you want the row with index label 5.

Q: Is the end of a slice inclusive in loc?

Yes β€” df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.

Q: Why use & instead of and in boolean filtering?

and operates on Python scalars β€” it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.


βœ… SummaryΒΆ

In this tutorial, you've learned:

  • βœ… loc[row_label, col_label] selects by label β€” slices are inclusive at both ends
  • βœ… iloc[row_pos, col_pos] selects by integer position β€” slices are exclusive at the end
  • βœ… Boolean filtering df[condition] returns only rows where the condition is True
  • βœ… Combine conditions with & (and) and | (or) β€” never use Python's and/or
  • βœ… df['col'] returns a Series; df[['col']] returns a DataFrame

πŸ’‘ Interview & Exam TipsΒΆ

Q: What is the key difference between loc and iloc?

loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.

Q: Is the end of a slice inclusive in loc?

Yes β€” df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive β€” it returns only rows 0 and 1.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.

Q: Why use & instead of and in boolean filtering?

and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.


πŸ“š Further ReadingΒΆ

Continue your learning path:

Go deeper: