VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - April 2025 Set F
- Subject: Data Analysis Using Python (DAUP)
- Subject Code: 205_04
- Set: F
- Semester: 2
- Month/Year: April 2025
- Max Marks: 25
- Time Recommendation: 45 Minutes
- Paper: View Paper | Download PDF
Questions & Solutions
All questions are compulsory
Q1: CSV Data Processing Pipeline
Max Marks: 20
Write a Python script that perform following:
- Create
students.csvfile that contains rno, name, city, address, mob, per. - Converting above CSV file into dataframe.
- Display columns name of
students.csv. - Display only name and city.
- Fill empty value with 'Nan'.
1. CSV File Creation
Generate the source data file with the required fields.
You can use the csv module or simply write a string to a file. Ensure you leave some fields empty to test the 'Nan' filling logic later.
View Solution & Output
import pandas as pd
import csv
# [1] Create students.csv file
data = [
['rno', 'name', 'city', 'address', 'mob', 'per'],
[1, 'Aarav', 'Surat', 'Adajan', '9876543210', 85.5],
[2, 'Diya', 'Ahmedabad', 'Satellite', '9876543211', 78.0],
[3, 'Krish', 'Surat', '', '9876543212', 92.0], # Missing address
[4, 'Mira', 'Baroda', 'Alkapuri', '', 65.4], # Missing mobile
[5, 'Aryan', 'Surat', 'Vesu', '9876543214', None] # Missing percentage
]
with open('students.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)
print("students.csv created successfully.")
Step-by-Step Explanation:
- Initialization: Define raw data as a nested list for standard CSV structure.
- Logic Flow: Use Python's
csv.writerto generate the file and populate it with sample rows including deliberate empty values. - Completion: Finalize the file creation and confirm its presence in the directory.
2. Loading & Column Inspection
Transform the CSV into a Pandas DataFrame and explore its structure.
Use pd.read_csv() for loading and the .columns attribute to view header names.
View Solution & Output
# [2] Converting CSV into Data Frame
df = pd.read_csv('students.csv')
# [3] Display columns name
print("\nColumn Names:")
print(df.columns.tolist())
Step-by-Step Explanation:
- Initialization: Import the
pandaslibrary for CSV-to-DataFrame conversion. - Logic Flow: Read the
students.csvfile into memory usingpd.read_csv(). - Completion: Extract and display all column labels as a list for structural verification.
3. Data Selection & Cleanup
Extract specific information and handle missing data.
- Select specific columns:
df[['name', 'city']] - Fill missing values:
df.fillna('Nan')
View Solution & Output
# [4] Display only name and city
print("\nStudent Name and City List:")
print(df[['name', 'city']])
# [5] Fill empty value with 'Nan'
df_filled = df.fillna('Nan')
print("\nData Frame after filling empty values:")
print(df_filled)
Step-by-Step Explanation:
- Initialization: Prepare to filter the DataFrame columns.
- Logic Flow: Select specific columns by name and use
fillna()to replace missing data with the string 'Nan'. - Completion: Print the resulting cleaned DataFrame to verify the successful replacement of all empty fields.
Concept Deep Dive: Missing Data (NaN)
In Data Science, "NaN" (Not a Number) is the standard marker for missing data. Pandas provides powerful tools like isnull(), dropna(), and fillna() to manage these gaps. While the question asks to fill with the string 'Nan', in real analysis, we often fill with the mean or median of the column to maintain statistical consistency.
Q2: Viva Preparation
Max Marks: 5
Potential Viva Questions
- Q: What is the difference between
NaNandNonein Pandas?- A:
NaNis a floating-point "Not a Number" used for numerical missing data, whileNoneis Python's internal null type. Pandas usually convertsNonetoNaNfor consistency.
- A:
- Q: How do you select multiple columns in Pandas?
- A: By passing a list of column names inside double square brackets:
df[['col1', 'col2']].
- A: By passing a list of column names inside double square brackets:
- Q: What does
df.columnsreturn?- A: It returns an Index object containing all the column labels of the DataFrame.
- Q: How can you find the data type of each column?
- A: Use the
df.dtypesattribute.
- A: Use the
- Q: What is the difference between
dropna()andfillna()?- A:
dropna()removes rows or columns with missing values, whilefillna()replaces them with a specified value.
- A:
- Q: How do you check if any value is missing in the whole DataFrame?
- A: Use
df.isnull().values.any().
- A: Use
- Double Brackets: Forgetting the second set of brackets
df['name', 'city']will cause aKeyError. Always usedf[['name', 'city']]for multiple columns. - Inplace Parameter:
df.fillna()returns a new DataFrame. To change the original, usedf.fillna('Nan', inplace=True)or reassign it.
Quick Navigation
Related Solutions
| Set | Link |
|---|---|
| Set E | Solutions |
| Set F | Current Page |
Last Updated: April 2026