Skip to main content

Indexing & Selection in pandas ๐Ÿ”

Mentor's Note: Selecting the right data is 80% of data analysis. Master loc and iloc and you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. ๐Ÿ’ก

What You'll Learn

By the end of this tutorial, you'll know:

  • The difference between loc (label-based) and iloc (position-based) โ€” and when to use each
  • How to filter rows using boolean conditions and combine them with & and |
  • How to select one column (Series) vs multiple columns (DataFrame) โ€” the double-bracket trap
  • Why loc slices are inclusive at both ends while iloc slices are exclusive at the end

๐ŸŒŸ The Scenario: The Hotel Room Booking Systemโ€‹

A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:

  • By room name (e.g., "Deluxe Suite A") โ€” this is loc (label-based)
  • By room number (e.g., room #12) โ€” this is iloc (integer position-based)

Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.


๐Ÿ“– Concept Explanationโ€‹

1. loc โ€” Label-Based Selectionโ€‹

df.loc[row_label, column_label]

  • Uses actual index labels and column names
  • The slice endpoint is inclusive (unlike Python slices)
  • Works with custom string indexes

2. iloc โ€” Integer Position-Based Selectionโ€‹

df.iloc[row_position, column_position]

  • Uses integer positions (0, 1, 2...)
  • The slice endpoint is exclusive (like Python slices)
  • Always works regardless of the index labels

3. Boolean Filteringโ€‹

df[condition]

  • condition is a boolean Series (True/False per row)
  • Returns only rows where condition is True
  • Multiple conditions: use & (and), | (or) โ€” not and/or

4. Selecting Columnsโ€‹

SyntaxReturnsWhen to use
df['name']SeriesSingle column
df[['name', 'marks']]DataFrameMultiple columns (double brackets!)

๐ŸŽจ Visual Logicโ€‹


๐Ÿ’ป Implementationโ€‹

import pandas as pd

df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'],
'marks': [95, 82, 70, 88],
'grade': ['A', 'B', 'C', 'A']
})

# Single cell โ€” row label 0, column 'name'
print(df.loc[0, 'name']) # Vishnu

# Single row โ€” all columns
print(df.loc[1])
# name Ankit
# marks 82
# grade B

# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70

๐Ÿ“Š Sample Dry Runโ€‹

DataFrame: df with columns name, marks โ€” 4 rows (index 0โ€“3)

Expression: df.loc[1:2, ['name', 'marks']]

StepActionResult
1loc uses labelsRow labels 1 and 2 (inclusive)
2Column filterKeep only 'name' and 'marks'
3Return subsetRow 1: Ankit, 82 / Row 2: Priya, 70
name marks
1 Ankit 82
2 Priya 70

๐ŸŽฏ Practice Lab ๐Ÿงชโ€‹

Task: Product Selection

Create a DataFrame of 6 products:

products = pd.DataFrame({
'name': ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
'price': [55000, 25000, 40000, 3000, 30000, 20000],
'stock': [10, 50, 8, 200, 15, 5]
})

Then:

  1. Use loc to select rows 1 to 3, columns name and price.
  2. Use iloc to select the first 3 rows and first 2 columns.
  3. Filter products with price > 25000.
  4. Filter products in category 'Electronics' with stock < 15.
  5. Select only the name and category columns as a DataFrame.

๐Ÿ“š Best Practices & Common Mistakesโ€‹

โœ… Best Practicesโ€‹

  • Always use .loc[] or .iloc[] explicitly โ€” df[0] is ambiguous (it tries column label 0). Be explicit: df.iloc[0] for the first row
  • Use .copy() after filtering โ€” subset = df[df['marks'] > 80].copy() prevents SettingWithCopyWarning when you modify subset later
  • Chain with .reset_index(drop=True) โ€” after filtering, index labels become non-contiguous. Use .reset_index(drop=True) to get a clean 0-based index

โŒ Common Mistakesโ€‹

  • df[['col']] vs df['col'] โ€” single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why .shape shows one dimension
  • Using and/or instead of &/| โ€” df[df['marks'] > 80 and df['city'] == 'Surat'] raises ValueError. Always use &, |, and wrap each condition in parentheses
  • loc endpoint inclusive, iloc exclusive โ€” df.loc[0:2] gives 3 rows (0, 1, 2). df.iloc[0:2] gives 2 rows (0, 1). This inconsistency trips up everyone once

โ“ Frequently Asked Questionsโ€‹

Q: What is the key difference between loc and iloc?

loc is label-based โ€” it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based โ€” it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5th row", use loc when you want the row with index label 5.

Q: Is the end of a slice inclusive in loc?

Yes โ€” df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.

Q: Why use & instead of and in boolean filtering?

and operates on Python scalars โ€” it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.


โœ… Summaryโ€‹

In this tutorial, you've learned:

  • โœ… loc[row_label, col_label] selects by label โ€” slices are inclusive at both ends
  • โœ… iloc[row_pos, col_pos] selects by integer position โ€” slices are exclusive at the end
  • โœ… Boolean filtering df[condition] returns only rows where the condition is True
  • โœ… Combine conditions with & (and) and | (or) โ€” never use Python's and/or
  • โœ… df['col'] returns a Series; df[['col']] returns a DataFrame

๐Ÿ’ก Interview & Exam Tipsโ€‹

Q: What is the key difference between loc and iloc?

loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.

Q: Is the end of a slice inclusive in loc?

Yes โ€” df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive โ€” it returns only rows 0 and 1.

Q: Why do you need double brackets df[['col']] to get a DataFrame?

Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.

Q: Why use & instead of and in boolean filtering?

and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.


๐Ÿ“š Further Readingโ€‹

Continue your learning path:

Go deeper: