Executive Summary

This is a data analytics project focused on evaluating e-commerce purchasing data to identify and verify Flexible Spending Account (FSA) eligible expenses. Conducted by JKLM Data Analytics in March 2026, the project involves processing and auditing raw Amazon transaction data against official tax compliance guidelines.

Objective

The primary goal of this project was to systematically audit raw purchasing data to determine the eligibility of various medical and dental expenses. This required mapping transaction records against the stringent guidelines outlined in IRS Publication 502: Medical and Dental Expenses, which dictates allowable itemized deductions.

Key Questions

How can we effectively identify and map the key variables within the raw order history to distinguish FSA-eligible purchases from non-eligible ones?
How can we scrub and generalize the data to protect privacy while still permitting the end user to accurately query for FSA eligibility?

Functional Requirements

FR-01: Search Criteria – The schema must include a flag or category that maps the item description against the standard list of FSA-eligible categories.
FR-02: Data Sanitization (Privacy) – The system must automatically apply hashing logic to item names and generalize dates/pricing to protect user privacy before the data is made available for querying.
FR-03: Output Format – The final curated dataset must be exported in a fully .csv format for seamless end-user retrieval.
FR-04: Accuracy Disclaimer – The final data output shall display a prominent disclaimer stating that the product’s identification of HSA/FSA eligible items is for informational purposes only. The notice must explicitly state that the project does not guarantee that items meet the requirements defined in IRS Publication 502 and that final verification and user discretion are required before submitting for reimbursement.
FR-05: Documentation & Archival – The project must generate a final technical report and a web-published summary (hosted on jklmdata.net) that is archived in both GitHub and Google Workspace.

Project Scope & Deliverables

To ensure a structured and well-documented analytical process, the project was broken down into core administrative and technical phases. The key phased and supporting deliverables include:

Project Definition

Deliverables: Project Proposal, Project Plan
Objective: Define the scope, key questions, goals, functional requirements, actionable steps, and deliverables needed to ensure project success.

Prepare the Environment

Objective: Create the Google Workspace environment, the GitHub repository, and configure access permissions. Construct the R Project.
Deliverables: The project workspace is linked HERE (access limited to project participants only). The project GitHub repository is linked HERE.

Exploratory Data Analysis (EDA)

Objective: Preliminary R script development to ingest data; select the key variables from the original dataset; evaluate the key variables for privacy content; define sanitization criteria for privacy data; and conduct iterative test & development for sanitization and visualization techniques.
Deliverables: Engineering notebook for EDA developed in Quarto.

Development & Test

Objective: Finalize the script coding for the following development phases:
Deliverables: R Scripts (GitHub). Functional Requirements Verification Matrix.

Documentation

Objective: Finalize customer reports and version lock the GitHub Repository in preparation for project review. Publish the project summary.
Deliverables: The project workspace is linked HERE (access limited to project participants only). The project GitHub repository is linked HERE. The published summary is documented on this page.

Conclusion

Through the systematic execution of these project phases, JKLM Data Analytics has developed a robust framework for auditing e-commerce transactions against complex IRS guidelines. The resulting output not only safeguards consumer privacy through automated sanitization but also provides a clear, categorized dataset for FSA eligibility review. The following sections of this report will detail the methodologies, exploratory data analysis, and technical logic used to achieve these functional requirements.

Installation & Setup

Prerequisites

To run this project you need R (v4.0 or higher) and RStudio Desktop installed.

Obtain the Project Files

The source code, raw dataset, and documentation are hosted on GitHub.

Navigate to the Project fsa-tool GitHub repository.
Click the green Code button and select Download ZIP.
Extract the ZIP file to a dedicated directory on your local machine (e.g. C:\Documents\Project-FSA )

Initialize the Project

This project uses an .Rproj file to ensure reproducibility. By opening the project through this file, RStudio automatically sets the working directory to the project root, ensuring that all file paths (for data ingestion and exports) function correctly without the need for manual adjustment.

Navigate to your local project folder and double-click fsa-tool.Rproj.

Configure the Environment

to ensure the project runs with the exact package versions and settings used during development, follow these three configuration steps:

A. Handle Environment Variables ( `.Renviron` )

This project can pull raw data directly from Google Drive. To facilitate this securely, we use environment variables. Otherwise, this step is not necessary.

Locate the template: Find the file named .Renviron_example in the root directory.
Create your local file: Copy this file and rename it to exactly .Renviron.
Update Credentials: Open your new .Renviron and add your specific Google Service Account paths and emails.
Security: DO NOT commit your .Renviron to version control. It is already included in the .gitignore to prevent sensitive API keys from being published to GitHub.

RStudio will automatically load these variables when you open the .Rproj file.

B. Restore the Package Library

As noted in the 00_setup.R script, this project uses renv to create a private, isolated library. This prevents version conflicts with you other R projects. To synchronize your local library with the project requirements, run:

# Run this in the RStudio Console
renv::restore()

This command will read the renv.lock file and automatically install the correct versions of tidyverse, digest, googledrive, and other dependencies.

C. Initialize the Session ( `00_setup.R` )

Once the packages are installed, you must initialize your R session. This script loads all required libraries, configures "blank slate" settings, and sets global options (like preventing scientific notation for Order IDs).

Run the following command in your console: source("./scripts/01_data_ingestion.R")

You should see the message: Setup complete: All libraries loaded and options configured.

Implementation

This project demonstrates the use of data wrangling skills to convert personal order history data into a format that can be searched for purchases that meet IRS FSA eligibility criteria. The project will leverage the R Tidyverse package to convert the raw order history data into tidy format. The data will be queried to yield the desired order history relevant to FSA eligible purchases. The order history is privacy sensitive, so we will need to sanitize the data to generalize dates, mask personal information, and obfuscate price data, while permitting the end user to utilize the output for follow on tasks. A successful implementation approach will:

Ingest Amazon Order History Data from an access controlled Google Drive folder
Exploratory Data Analysis
Scrub for Personally Identifying Information (PII) and verify that no unmasked sensitive data remains in the working dataset.
Develop the logic to identify FSA-eligible items based on descriptions. Append the IRS Publication 502 usage disclaimer.
Create the final query scripts that allow an end user to yield FSA-eligible data. Assess the results to ensure the mapping logic is accurate and the privacy masks are persistent.

R Packages

The following libraries and options are called from 00_setup.R and are required for implementation:

tidyverse: Core data manipulation (dplyr, stringr, lubridate, etc.)
rmarkdown: Document rendering
knitr: Powers R Markdown by handling the execution of embedded R code
kableextra: Extends the basic functionality of tables produced by the knitr package
scales: Provides functions for human readable labels for axes and legends
usethis: Automates repetitive tasks that arise during project setup and development
googledrive: Google Workspace API integration
janitor: Contains additional "tidyverse" – oriented tools for cleaning "dirty" data
webshot2: Tools for document preparation
pagedown: Allows for full control of a documents borders

Note: Because we are using renv, we trust these packages are already installed in the project library.

Ingest the Data

In order for the audit to run, the raw Amazon Order History must be made available to the 01_data_ingestion.R script. The project expects the data to be included in a access controlled Google Drive folder due to privacy concerns, however, a trainer dataset (./trainer_data/Amazon_FSA_Audit_Trainer.csv) has been added to the project to permit usage by a wider audience. The script is configured to pull raw data from this location provided that the option to read data from the local project folder (Option B. Read trainer dataset from local project folder) is uncommented in the script, as depicted below.

#============================================================================================================================
# START Option A. Read customer dataset from Google Drive
# comment everything between START and END if reading trainer data from local project folder
#============================================================================================================================

# 2. Identify your Drive Folder (Use the folder name or ID)
drive_folder <- drive_ls(path = "~/Projects/R Project: Transform Amazon Order History for FSA Reimbursements/Private/Secure_client_data_PII", pattern = "\\.csv$")


# 3. Function to Download -> Read -> Cleanup
read_drive_csv <- function(drive_file) {
    # Create a temporary path on your local machine
    temp_path <- tempfile(fileext = ".csv")

    # Download from Drive to that temp path
    drive_download(drive_file, path = temp_path, overwrite = TRUE)

    # Read the CSV into R
    message("Ingesting data into R...")
    data <- read_csv(temp_path, col_types = cols(.default = "c")) |>
        janitor::clean_names()

    # DELETE THE FILE from your hard drive
    unlink(temp_path)
    message("Temporary file destroyed.")

    return(data)
}

# 4. Use map_df to iterate through the drive_folder list
amazon_combined <- drive_folder %>%
    split(.$id) %>% # Split the tibble so map sees individual files
    map_df(~ read_drive_csv(.x), .id = "drive_id")

# (Optional) Remove the Drive metadata from the environment
rm(drive_folder)

#============================================================================================================================
# END Option A. 
#============================================================================================================================

#============================================================================================================================
# START Option B. Read trainer dataset from local project folder
# comment everything between START and END if reading trainer data from Google Drive
#============================================================================================================================

# # 2. Load your raw data
# amazon_combined <- read_csv("./trainer_data/Amazon_FSA_Audit_Trainer.csv", col_types = cols(.default = "c") )|>
#     janitor::clean_names()

#============================================================================================================================
# END Option B. 
#============================================================================================================================

Exploratory Data Analysis

This phase serves as a "Raw Data Audit" to ensure the integrity and privacy of the dataset before the final Tidyverse transformation. For a detailed breakdown of the dataset variables, sample sanitization code, and the mapping categories for FSA-eligible product names, please refer to the complete Exploratory Data Analysis Notebook.

1. Data Profiling & Cleaning

A command line run of str() on the Amazon Order History dataset identifies 9,407 rows across 29 columns. The audit identified "grimy" data – such as HTML artifacts, encoding issues, and information irrelevant to Key Questions – that required pre-processing before parsing.

2. Key Variable Selection

Out of 29 original variables, the EDA identified 7 core variables essential for the FSA eligibility summary:

drive_id: Identifies unique order history datasets, used to differentiate between multiple household members.
order_id: Retained for traceability to original invoices without compromising privacy.
order_date: Identified as sensitive; findings recommend generalizing this to Month/Year only to minimize data profiling.
product_name: The primary field for identifying FSA eligibility; requires categorization to mask specific items.
total_amount: Identified for masking with "random noise" to prevent precise spending behavior mining.
order_status: Used to filter only for "Closed" (finalized) purchases.
currency: Retained to ensure proper handling of transaction across currency types.

Depicted below is an excerpt from the Amazon Order History dataset, amazon_combined.csv, which is modified only to remove personally identifying information.

drive_id	asin	billing_address	carrier_name_&_tracking_number	currency	gift_message	gift_recipient_contact	gift_sender_name	item_serial_number	order_date	order_id	order_status	original_quantity	payment_method_type	product_condition	product_name	purchase_order_number	ship_date	shipment_item_subtotal	shipment_item_subtotal_tax	shipment_status	shipping_address	shipping_charge	shipping_option	total_amount	total_discounts	unit_price	unit_price_tax	website
10AxRBXfFP4Yel1gJ_vs85rvWLYbWIEKF	B09BVXT8TJ	(Customer Name Removed)	AMZN_US(TBA313683553136)	USD	Not Available	Not Available	Not Available	Authenticity_2D=AZ:NTTQE4SP6JGOPKPROAJ849WI4A	2024-06-08T03:54:26Z	114-0028800-8904200	Closed	1	Visa – XXXX	New	SHOKZ OpenRun Pro – Open-Ear Bluetooth Bone Conduction Sport Headphones – Sweat Resistant Wireless Earphones for Workouts and Running with Premium Deep Base – Built-in Mic, with Hair Band	Not Applicable	2024-06-08T05:06:17Z	309.85	21.7	Shipped	(Shipping Address Removed)	0	rush	192.55	0	179.95	12.6	Amazon.com
10AxRBXfFP4Yel1gJ_vs85rvWLYbWIEKF	B0BQPW9R9H	(Customer Name Removed)	AMZN_US(TBA313683553136)	USD	Not Available	Not Available	Not Available	Not Available	2024-06-08T03:54:26Z	114-0028800-8904200	Closed	1	Visa – XXXX	New	JBL Endurance Peak 3 – True Wireless Headphones (Black), Small	Not Applicable	2024-06-08T05:06:17Z	309.85	21.7	Shipped	(Shipping Address Removed)	0	rush	106.95	0	99.95	7	Amazon.com

3. Sanitization Strategy

The EDA identified the need for and demonstrated several "proof-of-concept" sanitization strategies.

Customer Anonymization: The drive_id variable contains unique values corresponding to the files of individual household members before the data frames were merged as one. These values will be used later to construct distinct user names associated with the observations, namely, cust_1 and cust_2.

Date Generalization: The variable order_date should be generalized to display month and year only in the sanitized dataset. The format of the variable is “YYYY-MM-DDTHH:MM:SSz”, which is a string type that can be easily handled by the lubridate package to generalize the ymd_hms() format to a MM-YYYY format. This is accomplished as follows:

df <- df |> mutate(order_date = ymd_hms(order_date)) |>
    mutate(order_date = paste0(month(order_date), "-", year(order_date)))
head(df)

Filter for FSA Eligibility: The objective here is to construct a Regular Expression (regex) pattern and using this pattern in conjunction with stringr library tools to detect pattern matches within the product_name column. Thus, reducing the dataset to FSA Eligible items should simply be a matter of filtering based on a string search.

# Add product_name and order_id back to the data frame.
df <- df |> add_column(product_name = amazon_combined$product_name) |>
    add_column(order_id = amazon_combined$order_id) |>
    select(cust_id, order_id, order_date, product_name, currency, total_amount)

fsa_keywords <- regex("face\\smask|FSA\\sHSA|N95\\smask|thermometer|first\\said|bandage|sunscreen|light\\stherapy|medical|
              brace|sanitizer|tylenol|\\sadvil\\s|covid|therapy", ignore_case = TRUE) 
fsa_candidates <- df |> filter(str_detect(product_name, fsa_keywords))  # filter dataset for FSA Eligible matches
message("Number of FSA Eligible items found: ", nrow(fsa_candidates))

Generalize FSA Products by Category: In the Filter for FSA Eligibility section above, we included about 15 keywords for pattern detection, but we actually want to group them into about 4 or 5 clean, broad buckets (e.g., putting all masks and sanitizers into a “Personnel Protective Equipment” category). We will use case_when() logic to help us in this regard. The fsa_category column generated by this step now permits us to represent the products in a pulblic facing dataset. The first 10 observations of this FSA-eligible filter result are tabulated.

fsa_candidates <- fsa_candidates |>
  mutate(
    fsa_category = case_when(
      # Bucket 1: PPE & Prevention
      str_detect(product_name, regex("face\\smask|N95\\smask|sanitizer|covid", ignore_case = TRUE)) ~ "PPE & Prevention",
      
      # Bucket 2: Medical Devices
      str_detect(product_name, regex("thermometer|brace|light\\stherapy|therapy|medical", ignore_case = TRUE)) ~ "Medical Devices",
      
      # Bucket 3: First Aid & OTC Meds
      str_detect(product_name, regex("first\\said|bandage|tylenol|advil", ignore_case = TRUE)) ~ "First Aid & OTC",
      
      # Bucket 4: General FSA/HSA
      str_detect(product_name, regex("FSA\\sHSA|sunscreen", ignore_case = TRUE)) ~ "General FSA/HSA",
      
      # Default: If it doesn't match above, label it NA
      TRUE ~ NA_character_
    )
  ) |> select(cust_id, order_id, order_date, product_name, fsa_category, currency, total_amount)

# Prepare for tabulation
df <- fsa_candidates |> select(-order_id, -product_name)

kable(df[1:10, ]) |> kable_styling(bootstrap_options = c("striped", "bordered"), full_width = FALSE) #display fsa-eligible observations

Price Obfuscation: For the final variable in our key variable selection, we generate random uniform random noise (+/25%) to the total_amount variable. This should mitigate the risk of a bad faith actor attempting to indirectly identify item purchased based on item price.

# modify 'total_amount` from it's original value by a randomly selected +/- 25% margin
fsa_candidates <- fsa_candidates |> mutate(total_amount = as.numeric(total_amount) * runif(n(), 0.75, 1.25)) |>
    mutate_at(vars(total_amount), funs(dollar(.)))    # add currency symbol

Development & Test

During this phase, the project moved from the "Sandbox" (EDA) into a structured R pipeline. This involved developing three distinct script modules and a validation test known as the Accuracy Audit.

1. Implementing the Scrubber Logic (Privacy & Sanitization)

The sanitization script module satisfies the privacy requirements (FR-02).

Identity Mapping: The drive_id fields is converted into unique customer ID. This allows for data grouping (e.g., "Customer A" vs "Customer B") without revealing the user’s actual identity.
Temporal Generalization: High-fidelity order_date timestamps were "bucketed" into Month/Year formats to prevent the mining of specific shopping habits while retaining the ability to calculate annual FSA totals.
Price Obfuscation: The total_amount fields underwent "jittering" – adding random uniform noise to the price – to protect the exact transaction totals, still permitting the filtered data to be sorted on order orelative magnitude.

2. FSA Eligibility Testing (Tidyverse Transformation)

The heart of the project is the transformation logic that identifies eligible items. This script module utilizes stringr and dplyr to perform "fuzzy matching" on product descriptions.

The Mapping Engine: A dictionary of keywords was built based on IRS Publication 502. The script scans the product_name column for matches (e.g., "Bandage," "Sunscreen," "Monitor," "Saline").
Filtering Logic: The script ignores non-medical categories identified in the EDA (e.g., Electronics, Grocery) and extracts only items with a "True" match status.

3. Customer Report Generation

The final stage of the pipeline transforms the filtered and sanitized data into a professional, audit-ready document. To create the final report, our system uses an automated version of a web browser to ‘print’ the data into a professional, high-quality PDF.

Format & Layout: The report is generated as a branded PDF in landscape orientation. This layout was specifically chosen to provide ample room for long product descriptions while maintaining a clean, readable structure for financial review.
Engine & Libraries: The report is built using a combination of kableExtra for complex table styling and the pagedown package to handle the HTML-to-PDF conversion. This allows for fine-grained control over typography (Cambria/Lora) and CSS-based "Zebra Striping" for better row readability.
Professional Branding: Each report is dynamically generated with the JKLM Data Analytics logo, a synchronized "Report Generated" timestamp, and professional headers.
Disclaimer Injection: Per requirement FR-04, every report includes a hard-coded footer disclaimer, reminding users that the output is for informational purposes and requires manual verification against IRS rules.

4. Verification Testing

To ensure the integrity of the project, every feature was tested against the requirements defined in the initial project plan. This systematic approach ensures that the final pipeline is not just functional, but compliant and reliable.

Verification Framework: The core of this effort is the Functional Requirements Verification Matrix (see below). Each requirement (labeled FR-01 through FR-05) was assigned a specific Evaluation Method, ranging from manual inspection to automated code audits.
Summary of Results: The Verification Testing phase concluded with a 100% Pass Rate. By documenting the "Objective Quality Evidence" (OQE) for each step, we have created a transparent audit trail that proves the project is ready for real-world application.

	A	B	C	D	E	F
1	Functional Requirements Verification Matrix: Transform Amazon Order History for FSA Reimbursements
2	Requirement Number	Name	Criteria	Evaluation Method	Objective Quality Evidence (OQE)	Test Result
3	FR-01	Search Criteria	The schema must include a flag or category that maps the item description against the standard list of FSA-eligible categories. The mapping logic shall yield product names / order id that identify FSA-Eligible Items.	Inspection Accuracy Audit	Successful query execution yielding only mapped FSA categories; visual inspection of the schema. Comparison report between Query Output and IRS Pub 502 / Amazon Item Pages.	Pass
4	FR-02	Data Sanitization (Privacy)	The system must automatically apply hashing logic to item names and generalize dates/pricing to protect user privacy before the data is made available for querying.	Test / Code Review	Code review of the digest library implementation; inspection of the pre-query dataset to confirm PII is masked and hashed.	Pass
5	FR-03	Output Format	The final curated dataset must be exported in a tabulated .pdf format for seamless end-user retrieval.	Inspection	Presence of a valid, formatted .pdf file upon pipeline completion.	Pass
6	FR-04	Accuracy Disclaimer	The final data output shall display a prominent disclaimer stating that the product’s identification of HSA/FSA eligible items is for informational purposes only. The notice must explicitly state that the project does not guarantee that items meet the requirements defined in IRS Publication 502 and that final verification and user discretion are required before submitting for reimbursement.	Inspection	Visual confirmation of the disclaimer text embedded within the final output/report.	Pass
7	FR-05	Documentation & Archival	The project must generate a final technical report and a web-published summary (hosted on jklmdata.net) that is archived in both GitHub and Google Workspace.	Inspection	Live URLs to the jklmdata.net summary, the GitHub repository, and the Google Workspace shared drive containing the final technical report.	Pass

FSA Eligibility Customer Report

The FSA Eligibility Customer Report is the final deliverable of the #Project-FSA pipeline. This document transforms a "messy" raw Amazon order history into a structured, audit-ready summary of potential reimbursement candidates.

What’s Inside the Report?

This report provides a curated view of transactions identified by the mapping engine as FSA-eligible. You can interact with the live report below. Key features include:

Categorical Grouping: Purchases are automatically organized into IRS-aligned categories—such as First Aid and OTC and PPE and Prevention—allowing users to quickly identify related expenses.
Privacy-Preserving Identifiers: To maintain data security, the report utilizes privacy protected Customer IDs (e.g., cust_1) and generalized Order Dates (Month-Year), ensuring individual shopping habits remain protected while remaining useful for annual tax reporting.
Transactional Detail: Each entry retains its original Order ID and Product Name for traceability, alongside the Total Amount and Currency for precise reimbursement tracking. Note: The product name in this example was masked for privacy protection.
Professional Branding & Compliance: The document features full JKLM Data Analytics branding and includes a required Accuracy Disclaimer to ensure users understand the informational nature of the automated audit.

fsa_candidates_report Download

Validation Analysis

This section details the validation of the preliminary FSA-eligible candidate summary to ensure alignment with current IRS guidelines. Our review process evaluated the accuracy of the proposed candidates, carefully distinguishing between auto-eligible items and those requiring a Letter of Medical Necessity (LMN). In addition to validating the initial list, we conducted a targeted spot-check of the broader Amazon combined dataset. This secondary analysis was designed to identify additional high-value FSA keywords, surface missed product categories, and ensure maximum capture of eligible items across the dataset.

Customer Report Validation

For this activity, we compared each product name identified in the FSA-Eligible Candidate summary – a total of 85 items – against reputable FSA eligibility lists and AI-assisted search tools. The validation results (tabulated below) reveal the tool achieved a 67% success rate in identifying items that qualify for automatic FSA reimbursement.

A root-cause analysis of the error rate revealed that the keywords producing the largest number of false positives included brace (11), therapy (6), medicine (3), and massage (3). Context analysis of the brace term showed that the word bracelet (an unintended match) appeared in the search results 9 times. The term therapy was assessed to be too broad, appearing 17 times, often flagging general wellness items that require a Letter of Medical Necessity (LMN) rather than being auto-eligible. Furthermore, the term medicine was deemed superfluous, as valid FSA-eligible products containing this word are usually captured by other, more specific keywords.

In conclusion, it is assessed that the tool’s success rate could improve to approximately 80% if we: (1) update the matching logic to eliminate medicine as a keyword; (2) restrict brace and therapy to match only as standalone words; and (3) require secondary, qualifying keywords (e.g., physical therapy or cold therapy) when the term therapy is encountered.

Total Candidates	Not Eligible Misclassified	Not Eligible Personal Care	Error Rate	FSA-Eligible Default	FSA-Eligible w/ LMN	Success Rate
85	24	4	32.9%	46	11	67.1%

Spot-Check Product Names from Amazon Order History

For this activity, we spot-checked a random selection of the tidy Amazon Order History to identify missed products and refine our keyword search criteria. We randomly sampled 1% of the order history, repeating this process three separate times (the R script is depicted below). As a result of this effort, we identified three additional terms to add to our search parameters: surgical grade, ergonomic, and fitness. Because these terms frequently flag dual-purpose items or products requiring a Letter of Medical Necessity (LMN), they will be tagged for secondary manual review rather than automatic eligibility.

# create a subset of randomly selected  rows from the original dataset (without replacement)

spot_check <- amazon_tidy |> sample_frac(size = 0.01)

# output the data to a spreadsheet

write_csv(spot_check, "C:/Users/laree/Downloads/FSA_spot-check.csv")

Conclusion: Data-Driven Financial Clarity

This project successfully demonstrates how modern data science workflows can transform "grimy," unstructured consumer data into a high-utility, privacy-compliant financial tool. By moving from raw Amazon exports to a categorized PDF report, this project bridges the gap between digital footprints and tax-ready documentation.

Disclaimer

I am a Data Analyst, not a CPA. This software is provided for informational and record-keeping purposes only. It does not constitute professional tax advice. Always verify your final numbers with a qualified tax professional before filing with the IRS.

Acknowlegments

IRS Publication 502: Medical and Dental Expenses Explains the itemized deduction for medical and dental expenses that you claim on your tax form.

The author acknowledges the use of Google Gemini during the preparation of this report. The tool was utilized strictly as an assistive technology to review and optimize the accompanying codebase, and to help refine the language, structure, and readability of the manuscript. The underlying engineering research, experimental design, and technical conclusions remain the entirely original work of the author, who takes full responsibility for the accuracy and integrity of the final document.

Maintained by JKLM Data Analytics

JKLM Data Analytics

JKLM Data Analytics

Project: Transform Amazon Order History for FSA Reimbursements

Executive Summary

Objective

Key Questions

Functional Requirements

Project Scope & Deliverables

Project Definition

Prepare the Environment

Exploratory Data Analysis (EDA)

Development & Test

Documentation

Conclusion

Installation & Setup

Prerequisites

Obtain the Project Files

Initialize the Project

Configure the Environment

A. Handle Environment Variables ( .Renviron )

B. Restore the Package Library

C. Initialize the Session ( 00_setup.R )

Implementation

R Packages

Ingest the Data

Exploratory Data Analysis

1. Data Profiling & Cleaning

2. Key Variable Selection

3. Sanitization Strategy

Development & Test

1. Implementing the Scrubber Logic (Privacy & Sanitization)

2. FSA Eligibility Testing (Tidyverse Transformation)

3. Customer Report Generation

4. Verification Testing

FSA Eligibility Customer Report

What’s Inside the Report?

Validation Analysis

Customer Report Validation

Spot-Check Product Names from Amazon Order History

Conclusion: Data-Driven Financial Clarity

Disclaimer

Acknowlegments

JKLM Data Analytics, LLC

Useful Links

Contact Info

© Copyright 2026. All Rights Reserved by valid themes

Privacy Policy – Terms & Conditions

A. Handle Environment Variables ( `.Renviron` )

C. Initialize the Session ( `00_setup.R` )