VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - Set F¶

Paper Details

Subject: Data Analysis Using Python (DAUP)
Subject Code: 205_04
Set: F
Semester: 2
Month/Year: April 2025
Max Marks: 25
Time Recommendation: 45 Minutes
Paper: View Paper | Download PDF

Questions & Solutions¶

All questions are compulsory¶

Q1: CSV Data Processing Pipeline¶

Max Marks: 20

Write a Python script that perform following: 1. Create students.csv file that contains rno, name, city, address, mob, per. 2. Converting above CSV file into dataframe. 3. Display columns name of students.csv. 4. Display only name and city. 5. Fill empty value with 'Nan'.

1. CSV File Creation¶

Generate the source data file with the required fields.

Hint

You can use the csv module or simply write a string to a file. Ensure you leave some fields empty to test the 'Nan' filling logic later.

View Solution & Output

import pandas as pd
import csv

# [1] Create students.csv file
data = [
    ['rno', 'name', 'city', 'address', 'mob', 'per'],
    [1, 'Aarav', 'Surat', 'Adajan', '9876543210', 85.5],
    [2, 'Diya', 'Ahmedabad', 'Satellite', '9876543211', 78.0],
    [3, 'Krish', 'Surat', '', '9876543212', 92.0], # Missing address
    [4, 'Mira', 'Baroda', 'Alkapuri', '', 65.4],    # Missing mobile
    [5, 'Aryan', 'Surat', 'Vesu', '9876543214', None] # Missing percentage
]

with open('students.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

print("students.csv created successfully.")

Step-by-Step Explanation: 1. Initialization: Define raw data as a nested list for standard CSV structure. 2. Logic Flow: Use Python's csv.writer to generate the file and populate it with sample rows including deliberate empty values. 3. Completion: Finalize the file creation and confirm its presence in the directory.

2. Loading & Column Inspection¶

Transform the CSV into a Pandas DataFrame and explore its structure.

Hint

Use pd.read_csv() for loading and the .columns attribute to view header names.

flowchart TD
csv[Read students.csv]
df[Create DataFrame]
cols[Display df.columns]

csv --> df
df --> cols

View Solution & Output

# [2] Converting CSV into Data Frame
df = pd.read_csv('students.csv')

# [3] Display columns name
print("\nColumn Names:")
print(df.columns.tolist())

Step-by-Step Explanation: 1. Initialization: Import the pandas library for CSV-to-DataFrame conversion. 2. Logic Flow: Read the students.csv file into memory using pd.read_csv(). 3. Completion: Extract and display all column labels as a list for structural verification.

3. Data Selection & Cleanup¶

Extract specific information and handle missing data.

Hint

Select specific columns: df[['name', 'city']]
Fill missing values: df.fillna('Nan')

flowchart TD
sel[Select name & city]
null[Find Missing Values]
fill[fillna('Nan')]
done[Display Final DF]

sel --> null
null --> fill
fill --> done

View Solution & Output

# [4] Display only name and city
print("\nStudent Name and City List:")
print(df[['name', 'city']])

# [5] Fill empty value with 'Nan'
df_filled = df.fillna('Nan')

print("\nData Frame after filling empty values:")
print(df_filled)

Step-by-Step Explanation: 1. Initialization: Prepare to filter the DataFrame columns. 2. Logic Flow: Select specific columns by name and use fillna() to replace missing data with the string 'Nan'. 3. Completion: Print the resulting cleaned DataFrame to verify the successful replacement of all empty fields.

Concept Deep Dive: Missing Data (NaN)

In Data Science, "NaN" (Not a Number) is the standard marker for missing data. Pandas provides powerful tools like isnull(), dropna(), and fillna() to manage these gaps. While the question asks to fill with the string 'Nan', in real analysis, we often fill with the mean or median of the column to maintain statistical consistency.

Q2: Viva Preparation¶

Max Marks: 5

Potential Viva Questions

Q: What is the difference between NaN and None in Pandas?
A: NaN is a floating-point "Not a Number" used for numerical missing data, while None is Python's internal null type. Pandas usually converts None to NaN for consistency.
Q: How do you select multiple columns in Pandas?
A: By passing a list of column names inside double square brackets: df[['col1', 'col2']].
Q: What does df.columns return?
A: It returns an Index object containing all the column labels of the DataFrame.
Q: How can you find the data type of each column?
A: Use the df.dtypes attribute.
Q: What is the difference between dropna() and fillna()?
A: dropna() removes rows or columns with missing values, while fillna() replaces them with a specified value.
Q: How do you check if any value is missing in the whole DataFrame?
A: Use df.isnull().values.any().

Common Pitfalls

Double Brackets: Forgetting the second set of brackets df['name', 'city'] will cause a KeyError. Always use df[['name', 'city']] for multiple columns.
Inplace Parameter: df.fillna() returns a new DataFrame. To change the original, use df.fillna('Nan', inplace=True) or reassign it.

Set	Link
Set E	Solutions
Set F	Current Page

Last Updated: April 2025

VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - Set F¶

Questions & Solutions¶

All questions are compulsory¶

Q1: CSV Data Processing Pipeline¶

1. CSV File Creation¶

2. Loading & Column Inspection¶

3. Data Selection & Cleanup¶

Q2: Viva Preparation¶

Quick Navigation¶

Related Solutions¶