Indexing & Selection in pandas ๐
Mentor's Note: Selecting the right data is 80% of data analysis. Master
locandilocand you can answer almost any business question from a dataset. This is also one of the most frequently tested topics in CBSE Class 12 practicals. ๐ก
By the end of this tutorial, you'll know:
- The difference between
loc(label-based) andiloc(position-based) โ and when to use each - How to filter rows using boolean conditions and combine them with
&and| - How to select one column (Series) vs multiple columns (DataFrame) โ the double-bracket trap
- Why
locslices are inclusive at both ends whileilocslices are exclusive at the end
๐ The Scenario: The Hotel Room Booking Systemโ
A hotel has 100 rooms listed in a register. A staff member can look up a room in two ways:
- By room name (e.g., "Deluxe Suite A") โ this is
loc(label-based) - By room number (e.g., room #12) โ this is
iloc(integer position-based)
Both give you the same room. The difference is how you describe which room you want. pandas works the same way with DataFrame rows and columns.
๐ Concept Explanationโ
1. loc โ Label-Based Selectionโ
df.loc[row_label, column_label]
- Uses actual index labels and column names
- The slice endpoint is inclusive (unlike Python slices)
- Works with custom string indexes
2. iloc โ Integer Position-Based Selectionโ
df.iloc[row_position, column_position]
- Uses integer positions (0, 1, 2...)
- The slice endpoint is exclusive (like Python slices)
- Always works regardless of the index labels
3. Boolean Filteringโ
df[condition]
conditionis a boolean Series (True/False per row)- Returns only rows where condition is
True - Multiple conditions: use
&(and),|(or) โ notand/or
4. Selecting Columnsโ
| Syntax | Returns | When to use |
|---|---|---|
df['name'] | Series | Single column |
df[['name', 'marks']] | DataFrame | Multiple columns (double brackets!) |
๐จ Visual Logicโ
๐ป Implementationโ
- 1. loc โ Label-Based
- 2. iloc โ Position-Based
- 3. Boolean Filtering
- 4. Column Selection
- 5. Interactive REPL
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'],
'marks': [95, 82, 70, 88],
'grade': ['A', 'B', 'C', 'A']
})
# Single cell โ row label 0, column 'name'
print(df.loc[0, 'name']) # Vishnu
# Single row โ all columns
print(df.loc[1])
# name Ankit
# marks 82
# grade B
# Slice rows 0 to 2 (INCLUSIVE), specific columns
print(df.loc[0:2, ['name', 'marks']])
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'],
'marks': [95, 82, 70, 88],
'grade': ['A', 'B', 'C', 'A']
})
# Single cell โ row 0, column 1 (marks)
print(df.iloc[0, 1]) # 95
# First 3 rows, first 2 columns (EXCLUSIVE end)
print(df.iloc[0:3, 0:2])
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
# Last row
print(df.iloc[-1])
# name Sara
# marks 88
# grade A
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya', 'Sara', 'Raj'],
'marks': [95, 82, 70, 88, 60],
'city': ['Surat', 'Mumbai', 'Surat', 'Delhi', 'Mumbai'],
'passed': [True, True, True, True, False]
})
# Single condition
toppers = df[df['marks'] > 80]
print(toppers)
# Output:
# name marks city passed
# 0 Vishnu 95 Surat True
# 1 Ankit 82 Mumbai True
# 3 Sara 88 Delhi True
# Multiple conditions โ use & and |, with parentheses!
surat_toppers = df[(df['marks'] > 80) & (df['city'] == 'Surat')]
print(surat_toppers)
# Output:
# name marks city passed
# 0 Vishnu 95 Surat True
# Filter by boolean column
passed_students = df[df['passed'] == True]
print(passed_students['name'].tolist())
# Output: ['Vishnu', 'Ankit', 'Priya', 'Sara']
import pandas as pd
df = pd.DataFrame({
'name': ['Vishnu', 'Ankit', 'Priya'],
'marks': [95, 82, 70],
'grade': ['A', 'B', 'C'],
'city': ['Surat', 'Mumbai', 'Delhi']
})
# Single column โ returns a Series
names = df['name']
print(type(names)) # <class 'pandas.core.series.Series'>
# Multiple columns โ returns a DataFrame (double brackets!)
subset = df[['name', 'marks']]
print(type(subset)) # <class 'pandas.core.frame.DataFrame'>
print(subset)
# Output:
# name marks
# 0 Vishnu 95
# 1 Ankit 82
# 2 Priya 70
Open a terminal, type python3, and explore indexing line by line.
>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Vishnu', 'Ankit', 'Priya', 'Sara'], 'marks': [95, 82, 70, 88]})
>>> df.loc[0, 'name']
'Vishnu'
>>> df.iloc[0, 1]
95
>>> df[df['marks'] > 80]
name marks
0 Vishnu 95
1 Ankit 82
3 Sara 88
>>> df[['name', 'marks']]
name marks
0 Vishnu 95
1 Ankit 82
2 Priya 70
3 Sara 88
>>> df.loc[0:2, ['name', 'marks']]
name marks
0 Vishnu 95
1 Ankit 82
2 Priya 70
>>> df.iloc[0:2, 0:2]
name marks
0 Vishnu 95
1 Ankit 82
Type python3 in your terminal. Each >>> is what you type; the line below is Python's response. Notice loc[0:2] gives 3 rows (inclusive) but iloc[0:2] gives 2 rows (exclusive) โ that's the key difference to remember!
๐ Sample Dry Runโ
DataFrame: df with columns name, marks โ 4 rows (index 0โ3)
Expression: df.loc[1:2, ['name', 'marks']]
| Step | Action | Result |
|---|---|---|
| 1 | loc uses labels | Row labels 1 and 2 (inclusive) |
| 2 | Column filter | Keep only 'name' and 'marks' |
| 3 | Return subset | Row 1: Ankit, 82 / Row 2: Priya, 70 |
name marks
1 Ankit 82
2 Priya 70
๐ฏ Practice Lab ๐งชโ
Create a DataFrame of 6 products:
products = pd.DataFrame({
'name': ['Laptop', 'Phone', 'TV', 'Headphones', 'Tablet', 'Camera'],
'category': ['Electronics', 'Electronics', 'Electronics', 'Audio', 'Electronics', 'Photography'],
'price': [55000, 25000, 40000, 3000, 30000, 20000],
'stock': [10, 50, 8, 200, 15, 5]
})
Then:
- Use
locto select rows 1 to 3, columnsnameandprice. - Use
ilocto select the first 3 rows and first 2 columns. - Filter products with
price > 25000. - Filter products in category
'Electronics'withstock < 15. - Select only the
nameandcategorycolumns as a DataFrame.
๐ Best Practices & Common Mistakesโ
โ Best Practicesโ
- Always use
.loc[]or.iloc[]explicitly โdf[0]is ambiguous (it tries column label0). Be explicit:df.iloc[0]for the first row - Use
.copy()after filtering โsubset = df[df['marks'] > 80].copy()preventsSettingWithCopyWarningwhen you modifysubsetlater - Chain with
.reset_index(drop=True)โ after filtering, index labels become non-contiguous. Use.reset_index(drop=True)to get a clean 0-based index
โ Common Mistakesโ
df[['col']]vsdf['col']โ single brackets returns a Series; double brackets returns a DataFrame. Most beginners use single brackets then wonder why.shapeshows one dimension- Using
and/orinstead of&/|โdf[df['marks'] > 80 and df['city'] == 'Surat']raisesValueError. Always use&,|, and wrap each condition in parentheses locendpoint inclusive,ilocexclusive โdf.loc[0:2]gives 3 rows (0, 1, 2).df.iloc[0:2]gives 2 rows (0, 1). This inconsistency trips up everyone once
โ Frequently Asked Questionsโ
Q: What is the key difference between loc and iloc?
loc is label-based โ it uses the actual index labels (like 'Vishnu' or 0, 1, 2) and column names. iloc is integer position-based โ it always uses 0, 1, 2... positions regardless of the actual index labels. Use iloc when you want the "5th row", use loc when you want the row with index label 5.
Q: Is the end of a slice inclusive in loc?
Yes โ df.loc[0:2] includes the row at label 2 (returns 3 rows). This is different from Python list slicing and iloc, where df.iloc[0:2] is exclusive (returns 2 rows). This inconsistency is a frequent CBSE exam question.
Q: Why do you need double brackets df[['col']] to get a DataFrame?
Single brackets df['col'] returns a Series (one column). Double brackets df[['col']] pass a list of column names, so pandas returns a DataFrame even with one column. The outer [] is the indexing operator; the inner [] creates a Python list.
Q: Why use & instead of and in boolean filtering?
and operates on Python scalars โ it can't compare element-wise on a whole column. & is the bitwise AND operator which works element-wise on pandas boolean Series. For example: df[(df['marks'] > 80) & (df['city'] == 'Surat')]. Always wrap each condition in parentheses to avoid operator precedence bugs.
โ Summaryโ
In this tutorial, you've learned:
- โ
loc[row_label, col_label]selects by label โ slices are inclusive at both ends - โ
iloc[row_pos, col_pos]selects by integer position โ slices are exclusive at the end - โ
Boolean filtering
df[condition]returns only rows where the condition is True - โ
Combine conditions with
&(and) and|(or) โ never use Python'sand/or - โ
df['col']returns a Series;df[['col']]returns a DataFrame
๐ก Interview & Exam Tipsโ
Q: What is the key difference between loc and iloc?
loc is label-based (uses index labels and column names). iloc is integer position-based (uses 0-based row and column numbers). CBSE exams often test this with a DataFrame that has a custom string index.
Q: Is the end of a slice inclusive in loc?
Yes โ df.loc[0:2] includes row at label 2. df.iloc[0:2] is exclusive โ it returns only rows 0 and 1.
Q: Why do you need double brackets df[['col']] to get a DataFrame?
Single brackets df['col'] returns a Series. Double brackets df[['col']] pass a list of column names, returning a DataFrame.
Q: Why use & instead of and in boolean filtering?
and operates on Python booleans, not element-wise on Series. & is the bitwise AND operator which works element-wise on pandas boolean Series. Always wrap conditions in parentheses.
๐ Further Readingโ
Continue your learning path:
- โ DataFrame Basics โ you need DataFrames before you can index them
- Next: CSV with pandas โ โ apply
loc,iloc, and filtering on real CSV data
Go deeper:
- Official pandas docs โ Indexing โ complete reference for all selection methods
- DataFrame Basics โ revisit
df.info()anddf.shapebefore filtering - pandas Series โ boolean filtering on a Series works the same way