VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - Set F¶
Paper Details
- Subject: Data Analysis Using Python (DAUP)
- Subject Code: 205_04
- Set: F
- Semester: 2
- Month/Year: April 2025
- Max Marks: 25
- Time Recommendation: 45 Minutes
- Paper: View Paper | Download PDF
Questions & Solutions¶
All questions are compulsory¶
Q1: CSV Data Processing Pipeline¶
Max Marks: 20
Write a Python script that perform following:
1. Create students.csv file that contains rno, name, city, address, mob, per.
2. Converting above CSV file into dataframe.
3. Display columns name of students.csv.
4. Display only name and city.
5. Fill empty value with 'Nan'.
1. CSV File Creation¶
Generate the source data file with the required fields.
Hint
You can use the csv module or simply write a string to a file. Ensure you leave some fields empty to test the 'Nan' filling logic later.
View Solution & Output
import pandas as pd
import csv
# [1] Create students.csv file
data = [
['rno', 'name', 'city', 'address', 'mob', 'per'],
[1, 'Aarav', 'Surat', 'Adajan', '9876543210', 85.5],
[2, 'Diya', 'Ahmedabad', 'Satellite', '9876543211', 78.0],
[3, 'Krish', 'Surat', '', '9876543212', 92.0], # Missing address
[4, 'Mira', 'Baroda', 'Alkapuri', '', 65.4], # Missing mobile
[5, 'Aryan', 'Surat', 'Vesu', '9876543214', None] # Missing percentage
]
with open('students.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
print("students.csv created successfully.")
Step-by-Step Explanation:
1. Initialization: Define raw data as a nested list for standard CSV structure.
2. Logic Flow: Use Python's csv.writer to generate the file and populate it with sample rows including deliberate empty values.
3. Completion: Finalize the file creation and confirm its presence in the directory.
2. Loading & Column Inspection¶
Transform the CSV into a Pandas DataFrame and explore its structure.
Hint
Use pd.read_csv() for loading and the .columns attribute to view header names.
flowchart TD
csv[Read students.csv]
df[Create DataFrame]
cols[Display df.columns]
csv --> df
df --> cols
View Solution & Output
# [2] Converting CSV into Data Frame
df = pd.read_csv('students.csv')
# [3] Display columns name
print("\nColumn Names:")
print(df.columns.tolist())
Step-by-Step Explanation:
1. Initialization: Import the pandas library for CSV-to-DataFrame conversion.
2. Logic Flow: Read the students.csv file into memory using pd.read_csv().
3. Completion: Extract and display all column labels as a list for structural verification.
3. Data Selection & Cleanup¶
Extract specific information and handle missing data.
Hint
- Select specific columns:
df[['name', 'city']] - Fill missing values:
df.fillna('Nan')
flowchart TD
sel[Select name & city]
null[Find Missing Values]
fill[fillna('Nan')]
done[Display Final DF]
sel --> null
null --> fill
fill --> done
View Solution & Output
# [4] Display only name and city
print("\nStudent Name and City List:")
print(df[['name', 'city']])
# [5] Fill empty value with 'Nan'
df_filled = df.fillna('Nan')
print("\nData Frame after filling empty values:")
print(df_filled)
Step-by-Step Explanation:
1. Initialization: Prepare to filter the DataFrame columns.
2. Logic Flow: Select specific columns by name and use fillna() to replace missing data with the string 'Nan'.
3. Completion: Print the resulting cleaned DataFrame to verify the successful replacement of all empty fields.
Concept Deep Dive: Missing Data (NaN)
In Data Science, "NaN" (Not a Number) is the standard marker for missing data. Pandas provides powerful tools like isnull(), dropna(), and fillna() to manage these gaps. While the question asks to fill with the string 'Nan', in real analysis, we often fill with the mean or median of the column to maintain statistical consistency.
Q2: Viva Preparation¶
Max Marks: 5
Potential Viva Questions
- Q: What is the difference between
NaNandNonein Pandas? - A:
NaNis a floating-point "Not a Number" used for numerical missing data, whileNoneis Python's internal null type. Pandas usually convertsNonetoNaNfor consistency. - Q: How do you select multiple columns in Pandas?
- A: By passing a list of column names inside double square brackets:
df[['col1', 'col2']]. - Q: What does
df.columnsreturn? - A: It returns an Index object containing all the column labels of the DataFrame.
- Q: How can you find the data type of each column?
- A: Use the
df.dtypesattribute. - Q: What is the difference between
dropna()andfillna()? - A:
dropna()removes rows or columns with missing values, whilefillna()replaces them with a specified value. - Q: How do you check if any value is missing in the whole DataFrame?
- A: Use
df.isnull().values.any().
Common Pitfalls
- Double Brackets: Forgetting the second set of brackets
df['name', 'city']will cause aKeyError. Always usedf[['name', 'city']]for multiple columns. - Inplace Parameter:
df.fillna()returns a new DataFrame. To change the original, usedf.fillna('Nan', inplace=True)or reassign it.
Quick Navigation¶
Related Solutions¶
| Set | Link |
|---|---|
| Set E | Solutions |
| Set F | Current Page |
Last Updated: April 2025