Tuesday, July 5, 2022

Comparing two home's electricity usage

We are looking into solar panel cost because electricity cost has been increasing, and our wonderful provider SCE has had a few blackouts in the past 12 months more than what I would care to remember.  SCE provides our home's electricity usage data, up to 13 months, for download from their site.  I figured it's a good project to review our electricity usage and see if I can make any sense out of the data.  After all, what's the point of doing data analytics if we can't apply it to our daily life?

Data download - I got ours from SCE site under data sharing & download menu on the left of the screen.  They provided options to download CSV or XML format.  I used the CSV format since I am familiar with it.  SCE website

After reviewing the data, I noticed a few things about the data -

1. The usage data has 13 rows of header information that include address, start date and duration.  Each date worth of data divided into 24 hourly sections.  Between date sections is separated with an empty row for spacing, date information and header column.


2. Any household with Net Energy Metering would have a section with usage and a section with electricity production.  (credit: SCE.com)


3. Each date-hour usage has the date, from / to hour and usage on the left most column.

Steps to clean up the data -

Before working on the file, I made a copy of the worksheet to keep as the backup original data.  The header information, as well as filler information between each date section, are not necessary to the analysis.  I deleted those rows and added a filter to remove unnecessary filler information.  With the filter, I deleted all rows not showing dates.  For customers with NEM, a quick solution is to add a column and remark solar production with "S" for easy filtering in the future.  I will need to figure out how to import the data properly with R script in the future.

I wasn't sure how Power BI or Tableau would treat date / time information for sorting purpose so I created additional columns for date, hour, month, weekday from column A.  I later learned that Power BI can use the month from data / time data to organize / display information.  Tableau Public is able to use the data / time data directly so I didn't need to create any additional columns for date / time / weekday.

Since the time period is the same for both households, I consolidated the usage data into one file and noted accordingly.  This way it is easier to plot the graphs using the same axis for both households.

Power BI Desktop visualization -

This visual compares the total usage between two households.

 

This visual compares the average usage between two households.


Tableau Public visualization -

This visual compares the electricity usage between two households, and the average usages for each household.


This visual shows the total usage hourly comparison between two households.


Conclusion based on available information -

  • Home 1 uses more electricity in total and in the summer (June - September).  Home 2 uses more electricity during other months in the year (October - May).  Home 2 uses more power in the morning (7AM - noon).
  • Home 1 full year usage is 10,547KW.  Home 2 full year usage is 9,014KW.

Assumption made - 

  • Data downloaded from SCE is accurate and we don't have anyone stealing electricity from our homes.

Additional information I would like to have for further analysis - 

  • I wish SCE would provide two separate files for the "delivered" portion of the electricity if the household has NEM.  I think it will be easier to load data for analysis but I will continue looking into other methods of cleaning / transforming the data.

Lessons learned from doing this exercise - 

  • "Average" measurements in Power BI can be deceiving.  The first sets  of average daily usage visuals were wrong because the average function used total number of data as the numerator in the calculation, instead of days in a month or number of "Sundays."  I created a table to do the calculation separately to ensure I understood / found the issue.

Possible additional work in the future - 

  • Create same type of visuals from Tableau Public and Power BI Desktop.
  • Analyze the data and get solar quotes to understand breakeven.
  • Investigate R or Python scripts to import SCE data and process portion automatically instead of manual adjustment.
    • Updated 7/6/22 - I practiced using R to clean the downloaded CSV file.  
    • R script to load SCE's NEM CSV
      • This method will add a column "remark" and denote the usage to be either "DLVR" or "RCV."  "DLVR" is "delivered" or as SCE noted as delivered to the home.  "RCV" is "Received" or as SCE noted as received from the home. 
      • Originally I assumed each day has 24 hours except I forgot about daylight saving, which added/subtracted an hour on those two days in the year.  Therefore I had to add a new column and use a different method to clean the data.

Tools I used

Data wrangling - Microsoft Excel

Visualization - Microsoft Power BI Desktop and Tableau Public

Date range - 2021 June through 2022 May (12 months)


Google Data Analytics capstone project

Google Data Analytics capstone project

I decided to use blog as the venue to publish my Google Data Analytics learning because I wasn't sure how I can upload the findings, the...