★ Capstone Project
Part 1 of 3
Loading Multiple Data Types
Work as a data analyst at a healthcare organization. Your first task: load three different data sources — an Excel file, a plain-text file, and a PDF — each requiring a different import approach.
Scenario
You're a newly hired data analyst at a healthcare organization. Three datasets have landed on your desk — each in a different format. Power Query can handle all of them. This part focuses on the import stage: connecting to each source, removing noise, and getting the data into a clean, typed, loadable state.
Lab Files — Part 1
Capstone Instructions
| Steps | Directions |
|---|---|
| Section 1 — Daily Census: Excel Import | |
| Open a blank Excel workbook and launch the Power Query editor. | |
| Load dailycensus_7665.xlsx (sheet: dailycensus) into Power Query. | |
| Remove the first row. | |
| Promote the first row as headers. | |
| Ensure effective_date is set to Date type. | |
Checkpoint 1What was the last effective date in the dataset?
| Steps | Directions |
|---|---|
| Section 2 — Drug Administrations: Text File Import | |
| Load drug_administrations.txt into a new query. | |
| Keep only prescription_date and prescribing_doctor_id — remove all other columns. | |
| Rename this query: prescription_dates_by_doctor. | Right-click the query name in the Queries pane |
| Close and load the data into the Excel workbook and view the resulting table. | |
| In an empty Excel cell, use COUNTIF() to count the total rows attributed to doctor "D0987". | Use the Excel sheet — not Power Query |
Checkpoint 2Doctor D0987 is expected to account for 5% of total prescriptions. What is their actual proportion?
| Steps | Directions |
|---|---|
| Section 3 — Patient Survey: PDF Import | |
| Load Patient_Experience_Survey_Template.pdf into a new query. | |
| In the Navigator window, preview each option and select the table that contains all 11 survey questions. | |
| Keep only the first and last columns — remove everything in between. | |
| Rename the two columns: question and questionid. | Double-click column headers to rename |
| Re-order columns so that questionid appears first. | Drag column header, or Transform › Move |
| Rename the query: PatientExperienceSurveyQuestions. | |
Checkpoint 3What is the M formula shown in the last applied step of the PatientExperienceSurveyQuestions query?
PDF import tip: Power Query detects tables from the document structure, and the Navigator will show multiple preview options. Always check each one — the correct table is usually the one that shows the cleanest row-per-question layout. Column headers from PDFs are often auto-generated, so renaming is always needed.