Skip to content

VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - Set F

Paper Details

  • Subject: Data Analysis Using Python (DAUP)
  • Subject Code: 205_04
  • Set: F
  • Semester: 2
  • Month/Year: April 2025
  • Max Marks: 25
  • Time Recommendation: 45 Minutes
  • Paper: View Paper | Download PDF

Questions & Solutions

All questions are compulsory

Q1: CSV Data Processing Pipeline

Max Marks: 20

Write a Python script that perform following: 1. Create students.csv file that contains rno, name, city, address, mob, per. 2. Converting above CSV file into dataframe. 3. Display columns name of students.csv. 4. Display only name and city. 5. Fill empty value with 'Nan'.

1. CSV File Creation

Generate the source data file with the required fields.

Hint

You can use the csv module or simply write a string to a file. Ensure you leave some fields empty to test the 'Nan' filling logic later.

View Solution & Output
import pandas as pd
import csv

# [1] Create students.csv file
data = [
    ['rno', 'name', 'city', 'address', 'mob', 'per'],
    [1, 'Aarav', 'Surat', 'Adajan', '9876543210', 85.5],
    [2, 'Diya', 'Ahmedabad', 'Satellite', '9876543211', 78.0],
    [3, 'Krish', 'Surat', '', '9876543212', 92.0], # Missing address
    [4, 'Mira', 'Baroda', 'Alkapuri', '', 65.4],    # Missing mobile
    [5, 'Aryan', 'Surat', 'Vesu', '9876543214', None] # Missing percentage
]

with open('students.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(data)

print("students.csv created successfully.")

Step-by-Step Explanation: 1. Initialization: Define raw data as a nested list for standard CSV structure. 2. Logic Flow: Use Python's csv.writer to generate the file and populate it with sample rows including deliberate empty values. 3. Completion: Finalize the file creation and confirm its presence in the directory.

2. Loading & Column Inspection

Transform the CSV into a Pandas DataFrame and explore its structure.

Hint

Use pd.read_csv() for loading and the .columns attribute to view header names.

flowchart TD
csv[Read students.csv]
df[Create DataFrame]
cols[Display df.columns]

csv --> df
df --> cols
View Solution & Output
# [2] Converting CSV into Data Frame
df = pd.read_csv('students.csv')

# [3] Display columns name
print("\nColumn Names:")
print(df.columns.tolist())

Step-by-Step Explanation: 1. Initialization: Import the pandas library for CSV-to-DataFrame conversion. 2. Logic Flow: Read the students.csv file into memory using pd.read_csv(). 3. Completion: Extract and display all column labels as a list for structural verification.

3. Data Selection & Cleanup

Extract specific information and handle missing data.

Hint

  • Select specific columns: df[['name', 'city']]
  • Fill missing values: df.fillna('Nan')
flowchart TD
sel[Select name & city]
null[Find Missing Values]
fill[fillna('Nan')]
done[Display Final DF]

sel --> null
null --> fill
fill --> done
View Solution & Output
# [4] Display only name and city
print("\nStudent Name and City List:")
print(df[['name', 'city']])

# [5] Fill empty value with 'Nan'
df_filled = df.fillna('Nan')

print("\nData Frame after filling empty values:")
print(df_filled)

Step-by-Step Explanation: 1. Initialization: Prepare to filter the DataFrame columns. 2. Logic Flow: Select specific columns by name and use fillna() to replace missing data with the string 'Nan'. 3. Completion: Print the resulting cleaned DataFrame to verify the successful replacement of all empty fields.

Concept Deep Dive: Missing Data (NaN)

In Data Science, "NaN" (Not a Number) is the standard marker for missing data. Pandas provides powerful tools like isnull(), dropna(), and fillna() to manage these gaps. While the question asks to fill with the string 'Nan', in real analysis, we often fill with the mean or median of the column to maintain statistical consistency.

Q2: Viva Preparation

Max Marks: 5

Potential Viva Questions
  1. Q: What is the difference between NaN and None in Pandas?
  2. A: NaN is a floating-point "Not a Number" used for numerical missing data, while None is Python's internal null type. Pandas usually converts None to NaN for consistency.
  3. Q: How do you select multiple columns in Pandas?
  4. A: By passing a list of column names inside double square brackets: df[['col1', 'col2']].
  5. Q: What does df.columns return?
  6. A: It returns an Index object containing all the column labels of the DataFrame.
  7. Q: How can you find the data type of each column?
  8. A: Use the df.dtypes attribute.
  9. Q: What is the difference between dropna() and fillna()?
  10. A: dropna() removes rows or columns with missing values, while fillna() replaces them with a specified value.
  11. Q: How do you check if any value is missing in the whole DataFrame?
  12. A: Use df.isnull().values.any().

Common Pitfalls

  • Double Brackets: Forgetting the second set of brackets df['name', 'city'] will cause a KeyError. Always use df[['name', 'city']] for multiple columns.
  • Inplace Parameter: df.fillna() returns a new DataFrame. To change the original, use df.fillna('Nan', inplace=True) or reassign it.

Quick Navigation

Set Link
Set E Solutions
Set F Current Page

Last Updated: April 2025