Skip to content

VNSGU BCA Sem 2: Data Analysis Using Python (205_04) Practical Solutions - Set B

Paper Details

  • Subject: Data Analysis Using Python (DAUP)
  • Subject Code: 205_04
  • Set: B
  • Semester: 2
  • Month/Year: April 2025
  • Max Marks: 25
  • Time Recommendation: 45 Minutes
  • Paper: View Paper | Download PDF

Questions & Solutions

All questions are compulsory

Q1: Employee Data Analysis Pipeline

Max Marks: 20

Create a table EMPLOYEE in SQLite with the following columns: EMP_ID, EMP_NAME, DEPARTMENT, SALARY, JOIN_DATE using Python. 1. Insert 10 records into the table. 2. Display records where the salary is greater than 50,000. 3. Export employee data and convert it into a Data Frame. 4. Convert the Data Frame into employee.csv file and save it.

1. SQLite Table Creation & Data Entry

Initialize the database and populate the employee records.

Hint

Use sqlite3.connect() to create the database file. Ensure EMP_ID is defined as PRIMARY KEY.

flowchart TD
start[Start]
conn[Connect to office.db]
tbl[Create EMPLOYEE Table]
ins[Bulk Insert 10 Records]
done[Commit & Close]

start --> conn
conn --> tbl
tbl --> ins
ins --> done
View Solution & Output
import sqlite3

# [1] Create database and table
conn = sqlite3.connect('office.db')
cursor = conn.cursor()

cursor.execute('''
    CREATE TABLE IF NOT EXISTS EMPLOYEE (
        EMP_ID INTEGER PRIMARY KEY,
        EMP_NAME TEXT,
        DEPARTMENT TEXT,
        SALARY REAL,
        JOIN_DATE TEXT
    )
''')

# Insert 10 records
employee_data = [
    (101, 'Amit', 'IT', 55000, '2022-01-10'),
    (102, 'Priya', 'HR', 45000, '2021-03-15'),
    (103, 'Rahul', 'Sales', 60000, '2023-05-20'),
    (104, 'Sia', 'IT', 72000, '2020-11-05'),
    (105, 'Aryan', 'Finance', 48000, '2022-08-12'),
    (106, 'Neha', 'HR', 52000, '2021-07-18'),
    (107, 'Vikram', 'IT', 65000, '2019-09-30'),
    (108, 'Sneha', 'Marketing', 42000, '2023-02-14'),
    (109, 'Karan', 'Sales', 58000, '2021-03-22'),
    (110, 'Mira', 'IT', 49000, '2022-12-01')
]

cursor.executemany('INSERT INTO EMPLOYEE VALUES (?,?,?,?,?)', employee_data)
conn.commit()
print("Table created and 10 records inserted.")

Step-by-Step Explanation: 1. Initialization: Import sqlite3 and connect to 'office.db'. 2. Logic Flow: Create the EMPLOYEE table and use executemany() to insert 10 sample records. 3. Completion: Commit the transaction to save data to the disk.

2. High Salary Query

Fetch and filter employees based on salary criteria.

Hint

Use the SQL WHERE clause: SELECT * FROM EMPLOYEE WHERE SALARY > 50000.

flowchart TD
q[Query: SALARY > 50000]
f[Fetch Records]
p[Print Results]

q --> f
f --> p
View Solution & Output
# [2] Display records where salary > 50,000
cursor.execute('SELECT * FROM EMPLOYEE WHERE SALARY > 50000')
high_salary_emps = cursor.fetchall()

print("\nEmployees with Salary > 50,000:")
for row in high_salary_emps:
    print(row)

Step-by-Step Explanation: 1. Initialization: Use the existing database cursor. 2. Logic Flow: Execute a SELECT query to filter employees with SALARY > 50000 and retrieve them using fetchall(). 3. Completion: Iterate through the result set and print each record.

3. Data Export to Pandas & CSV

Convert database records into a tabular format using Pandas and save to disk.

Hint

pd.read_sql_query() maps the table structure directly to a DataFrame, preserving column names.

flowchart TD
sql[Read from SQL]
df[Convert to Pandas DF]
csv[Save to employee.csv]

sql --> df
df --> csv
View Solution & Output
import pandas as pd

# [3] Export to Data Frame
df = pd.read_sql_query('SELECT * FROM EMPLOYEE', conn)
print("\nData Frame View:")
print(df.head())

# [4] Convert to CSV
df.to_csv('employee.csv', index=False)
print("\nFile 'employee.csv' saved successfully.")

conn.close()

Step-by-Step Explanation: 1. Initialization: Import pandas and use pd.read_sql_query() to fetch all employee data. 2. Logic Flow: Convert the SQL result set directly into a DataFrame and then use to_csv() to save it. 3. Completion: Print confirmation messages and close the database connection.

Concept Deep Dive: Databases vs DataFrames

SQLite is excellent for storing data safely and permanently on disk. Pandas DataFrames are best for analyzing and manipulating data in memory. This exercise demonstrates the common data pipeline: Store (SQL) -> Load (Pandas) -> Export (CSV).

Q2: Viva Preparation

Max Marks: 5

Potential Viva Questions
  1. Q: How do you handle decimal numbers in SQLite?
  2. A: Use the REAL or NUMERIC data type.
  3. Q: What is the purpose of import sqlite3?
  4. A: It provides the API needed to interact with SQLite databases directly from Python scripts.
  5. Q: How do you read a CSV file back into Pandas?
  6. A: Use pd.read_csv('filename.csv').
  7. Q: What is the difference between executemany() and calling execute() in a loop?
  8. A: executemany() is more efficient as it optimizes the database transaction for multiple records.
  9. Q: How do you find the highest salary in the SQLite table via Python?
  10. A: cursor.execute('SELECT MAX(SALARY) FROM EMPLOYEE') then cursor.fetchone().
  11. Q: What does conn.rollback() do?
  12. A: It undoes all changes made in the current transaction if an error occurs before committing.

Common Pitfalls

  • Data Persistence: Changes are not saved to the .db file until you call conn.commit().
  • Header Mismatch: When exporting, ensure your SQL query selects all required columns to match your DataFrame expectations.

Quick Navigation

Set Link
Set A Solutions
Set B Current Page
Set C Solutions
Set D Solutions
Set E Solutions
Set F Solutions

Last Updated: April 2025