Thursday, November 10, 2022

Combine data from multiple Excel files or multiple Excel sheets

From time to time we (or maybe just me 😕) end up compiling data / information on multiple Excel sheets or files with the same format.  For example - a sales person may use an Excel sheet to input orders for each store, then consolidate all the order sheets together to submit a master order.  In the past I would take the time to combine the sheets together in one file, then link them together and create a total sheet.  This way I would have individual data for record.  I thought it would be interesting create a case to combine multiple Excel sheets or files together and sum all the information.

There are two separate tasks for this project - 

1. load data from multiple sheets within the same Excel file, and combine the data together and summarize the result.


2. load data from multiple Excel files within the folder / file directory, and combine the data together and summarize the result. 

Lessons learned from doing this exercise -

  • Excel stores all numbers as floats internally.
  • When there is Null value (NaN) in the dataset, read_excel cannot convert the data type to integer while reading in the data.  Need to remove the Null values before converting the data type.
  • Sometimes there are easier ways to accomplish a task.  Apparently Microsoft Excel implemented this feature since Excel for Mac 2011, and available for Excel for Microsoft 365.
    • Microsoft's implementation isn't perfect because it doesn't include everything I want after combing the data, but it is much easier than writing Python codes after couple of trial & error.

Data files - 

  • Example files location: here.
  • To combine multiple sheets in an Excel file, use Multi_sheet.xlsx.
  • To combine multiple Excel files, create a folder and copy single-1.xlsx, single-2.xlsx, single-3.xlsx files into the folder.  Update the variable dirname to reflect the file directory in the Python code / Jupyter notebook.
Python Jupyter notebook file -
Possible future improvement -
  • Convert to Google Colab format for easy online usage.

No comments:

Post a Comment

Google Data Analytics capstone project

Google Data Analytics capstone project

I decided to use blog as the venue to publish my Google Data Analytics learning because I wasn't sure how I can upload the findings, the...