Indexing & Selection in pandas πΒΆ
Prerequisites: DataFrame Basics, Python Boolean Logic
Mentor's Note: Selecting the right data is 80% of data analysis. Master
locandilocand you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. π‘
What You'll Learn
By the end of this tutorial, you'll know:
- The difference between
loc(label-based) andiloc(position-based) β and when to use each - How to filter rows using boolean conditions and combine them with
&and| - How to select one column (Series) vs multiple columns (DataFrame) β the double-bracket trap
- Why
locslices are inclusive at both ends whileilocslices are exclusive at the end
π The Scenario: The Hotel Room Booking SystemΒΆ
A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:
- By room name (e.g., "Deluxe Suite A") β this is
loc(label-based) - By room number (e.g., room #12) β this is
iloc(integer position-based)
Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.
π Concept ExplanationΒΆ
1. loc β Label-Based SelectionΒΆ
df.loc[row_label, column_label]
- Uses actual index labels and column names
- The slice endpoint is inclusive (unlike Python slices)
- Works with custom string indexes
2. iloc β Integer Position-Based SelectionΒΆ
df.iloc[row_position, column_position]
- Uses integer positions (0, 1, 2...)
- The slice endpoint is exclusive (like Python slices)
- Always works regardless of the index labels
3. Boolean FilteringΒΆ
df[condition]
conditionis a boolean Series (True/False per row)- Returns only rows where condition is
True - Multiple conditions: use
&(and),|(or) β notand/or
4. Selecting ColumnsΒΆ
| Syntax | Returns | When to use |
|---|---|---|
df['name'] |
Series | Single column |
df[['name', 'marks']] |
DataFrame | Multiple columns (double brackets!) |
π¨ Visual LogicΒΆ
graph LR
A[DataFrame] --> B["df.loc[label, label]"]
A --> C["df.iloc[int, int]"]
A --> D["df[condition]"]
B --> E[Label-based subset]
C --> F[Position-based subset]
D --> G[Filtered rows]
π» ImplementationΒΆ
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'],
'marks': [95, 82, 70, 88],
'grade': ['A', 'B', 'C', 'A']
})
# Single cell β row label 0, column 'name'
print(df.loc[0, 'name']) # Vishnu
# Single row β all columns
print(df.loc[1])
# name Ankit
# marks 82
# grade B
# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'],
'marks': [95, 82, 70, 88],
'grade': ['A', 'B', 'C', 'A']
})
# Single cell β row 0, column 1 (marks)
print(df.iloc[0, 1]) # 95
# First 3 rows, first 2 columns (EXCLUSIVE end)
print(df.iloc[0:3, 0:2])
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
# Last row
print(df.iloc[-1])
# name Sara
# marks 88
# grade A
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara', 'Raj'],
'marks': [95, 82, 70, 88, 60],
'city': ['Surat', 'Mumbai', 'Surat', 'Delhi', 'Mumbai'],
'passed': [True, True, True, True, False]
})
# Single condition
toppers = df[df['marks'] > 80]
print(toppers)
# Output:
# name marks city passed
# 0 Vishnu 95 Surat True
# 1 Ankit 82 Mumbai True
# 3 Sara 88 Delhi True
# Multiple conditions β use & and |, with parentheses!
surat_toppers = df[(df['marks'] > 80) & (df['city'] == 'Surat')]
print(surat_toppers)
# Output:
# name marks city passed
# 0 Vishnu 95 Surat True
# Filter by boolean column
passed_students = df[df['passed'] == True]
print(passed_students['name'].tolist())
# Output: ['Vishnu', 'Ankit', 'Priya', 'Sara']
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya'],
'marks': [95, 82, 70],
'grade': ['A', 'B', 'C'],
'city': ['Surat', 'Mumbai', 'Delhi']
})
# Single column β returns a Series
names = df['name']
print(type(names)) # <class 'pandas.core.series.Series'>
# Multiple columns β returns a DataFrame (double brackets!)
subset = df[['name', 'marks']]
print(type(subset)) # <class 'pandas.core.frame.DataFrame'>
print(subset)
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
Open a terminal, type python3, and explore indexing line by line.
>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'], 'marks': [95, 82, 70, 88]})
>>> df.loc[0, 'name']
'Vishnu'
>>> df.iloc[0, 1]
95
>>> df[df['marks'] > 80]
name marks
0 Vishnu 95
1 Ankit 82
3 Sara 88
>>> df[['name', 'marks']]
name marks
0 Vishnu 95
1 Ankit 82
2 Priya 70
3 Sara 88
>>> df.loc[0:2, ['name', 'marks']]
name marks
0 Vishnu 95
1 Ankit 82
2 Priya 70
>>> df.iloc[0:2, 0:2]
name marks
0 Vishnu 95
1 Ankit 82
New to the REPL?
Type python3 in your terminal. Each >>> is what you type; the line below is Python's response. Notice loc[0:2] gives 3 rows (inclusive) but iloc[0:2] gives 2 rows (exclusive) β that's the key difference to remember!
π Sample Dry RunΒΆ
DataFrame: df with columns name, marks β 4 rows (index 0β3)
Expression: df.loc[1:2, ['name', 'marks']]
| Step | Action | Result |
|---|---|---|
| 1 | loc uses labels |
Row labels 1 and 2 (inclusive) |
| 2 | Column filter | Keep only 'name' and 'marks' |
| 3 | Return subset | Row 1: Ankit, 82 / Row 2: Priya, 70 |
π― Practice Lab π§ͺΒΆ
Task: Product Selection
Create a DataFrame of 6 products:
products = pd.DataFrame({
'name': ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
'price': [55000, 25000, 40000, 3000, 30000, 20000],
'stock': [10, 50, 8, 200, 15, 5]
})
Then:
- Use
locto select rows 1 to 3, columnsnameandprice. - Use
ilocto select the first 3 rows and first 2 columns. - Filter products with
price > 25000. - Filter products in category
'Electronics'withstock < 15. - Select only the
nameandcategorycolumns as a DataFrame.
π Best Practices & Common MistakesΒΆ
β Best PracticesΒΆ
- Always use
.loc[]or.iloc[]explicitly βdf[0]is ambiguous (it tries column label0). Be explicit:df.iloc[0]for the first row - Use
.copy()after filtering βsubset = df[df['marks'] > 80].copy()preventsSettingWithCopyWarningwhen you modifysubsetlater - Chain with
.reset_index(drop=True)β after filtering, index labels become non-contiguous. Use.reset_index(drop=True)to get a clean 0-based index
β Common MistakesΒΆ
df[['col']]vsdf['col']β single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why.shapeshows one dimension- Using
and/orinstead of&/|βdf[df['marks'] > 80 and df['city'] == 'Surat']raisesValueError. Always use&,|, and wrap each condition in parentheses locendpoint inclusive,ilocexclusive βdf.loc[0:2]gives 3 rows (0, 1, 2).df.iloc[0:2]gives 2 rows (0, 1). This inconsistency trips up everyone once
β Frequently Asked QuestionsΒΆ
Q: What is the key difference between loc and iloc?
loc is label-based β it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based β it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5th row", use loc when you want the row with index label 5.
Q: Is the end of a slice inclusive in loc?
Yes β df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.
Q: Why do you need double brackets df[['col']] to get a DataFrame?
Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.
Q: Why use & instead of and in boolean filtering?
and operates on Python scalars β it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.
β SummaryΒΆ
In this tutorial, you've learned:
- β
loc[row_label, col_label]selects by label β slices are inclusive at both ends - β
iloc[row_pos, col_pos]selects by integer position β slices are exclusive at the end - β
Boolean filtering
df[condition]returns only rows where the condition is True - β
Combine conditions with
&(and) and|(or) β never use Python'sand/or - β
df['col']returns a Series;df[['col']]returns a DataFrame
π‘ Interview & Exam TipsΒΆ
Q: What is the key difference between loc and iloc?
loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.
Q: Is the end of a slice inclusive in loc?
Yes β df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive β it returns only rows 0 and 1.
Q: Why do you need double brackets df[['col']] to get a DataFrame?
Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.
Q: Why use & instead of and in boolean filtering?
and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.
π Further ReadingΒΆ
Continue your learning path:
- β DataFrame Basics β you need DataFrames before you can index them
- Next: CSV with pandas β β apply
loc,iloc, and filtering on real CSV data
Go deeper:
- Official pandas docs β Indexing β complete reference for all selection methods
- DataFrame Basics β revisit
df.info()anddf.shapebefore filtering - pandas Series β boolean filtering on a Series works the same way