From time to time we (or maybe just me 😕) end up compiling data / information on multiple Excel sheets or files with the same format. For example - a sales person may use an Excel sheet to input orders for each store, then consolidate all the order sheets together to submit a master order. In the past I would take the time to combine the sheets together in one file, then link them together and create a total sheet. This way I would have individual data for record. I thought it would be interesting create a case to combine multiple Excel sheets or files together and sum all the information.
There are two separate tasks for this project -
1. load data from multiple sheets within the same Excel file, and combine the data together and summarize the result.
Lessons learned from doing this exercise -
- Excel stores all numbers as floats internally.
- When there is Null value (NaN) in the dataset, read_excel cannot convert the data type to integer while reading in the data. Need to remove the Null values before converting the data type.
- Sometimes there are easier ways to accomplish a task. Apparently Microsoft Excel implemented this feature since Excel for Mac 2011, and available for Excel for Microsoft 365.
- Microsoft's implementation isn't perfect because it doesn't include everything I want after combing the data, but it is much easier than writing Python codes after couple of trial & error.
- Example files location: here.
- To combine multiple sheets in an Excel file, use Multi_sheet.xlsx.
- To combine multiple Excel files, create a folder and copy single-1.xlsx, single-2.xlsx, single-3.xlsx files into the folder. Update the variable dirname to reflect the file directory in the Python code / Jupyter notebook.
- Convert to Google Colab format for easy online usage.
No comments:
Post a Comment