My data explorer journey: May 2023

Thursday, May 18, 2023

Getting inventory info for open orders

My wife is really good at giving me Excel challenges to say the least. 😅

She's been spending a lot of time reviewing her open orders to locate available inventory from different warehouse locations. Her company's system apparently does not add up available inventory from different warehouses for order coordinator to process the open orders, resulting late shipment and missed sales. She has been watching Kevin Stratvert and Leila Gharani on Excel over the past couple of years and wants to link her files with Power Pivot to show the info. Somehow her file keeps telling her that there is missing relationship, and the quantity shows the total inventory, regardless of product # or warehouse location.

I receive the three reports - product report, inventory report, and the order report and jump right in. I figure this can't be that complicated because any companies dealing with physical goods and multiple warehouses will have the same challenge.

Reviewing the three reports to identify their unique identifier, if any, and their associated fields is the first thing before any analysis.

Product report -

Primary key / unique identifier - product #.

Order report -

Primary key / unique identifier - there isn't one. It seems we are missing some information to use this report directly as a database table. There should be another table in the system showing what products are contained in each order.

Inventory report -

Primary key / unique identifier - there isn't one. We can certain create an unique identifier using the combination of product # and storage location ID, but we will also need to create a storage location report to include all the unique storage locations.

Traditional database tables must follow certain requirements - each column in a row can only contain one value; no duplicate rows; each row must have a primary key. Two of these reports (order and inventory) do not follow the table requirement, so they were likely created with combination of tables to display the necessary information. That said, to achieve our goal to display the inventory information next to the order quantity, we can use Excel's Pivot Table and vlookup function to accomplish this task.

I generated the sample reports using ChatGPT and provided the fields I want to include, thus the same number of products in the product report are in the inventory report. We can skip loading the product report considering the inventory report is a more complete version. However, we need to reorganize the inventory report so each product only has one row, and list out each storage location and quantity in a different column for easy matching. This can be done by using using Excel Pivot Table.

(If you're not familiar with Pivot Table, Microsoft support has a basic tutorial here.)

I want to say that I'm done considering the bulk of the issue is trying to display the inventory information next to the open order, except it's not. 😬 The challenge is trying to automate the process as much as possible, so order coordinator can review the open orders and figure out how to process.

*side note* Considering how much progress Microsoft has made in the last few years to incorporate other providers into its ecosystem, I was hoping that I could easily connect to my files through Google Drive for demonstration purpose... Well, it's not. One provider charges $700+/year for 3 license for Power Pivot to connect to Google Drive. Microsoft has more work to do to expand their free connectivity with other providers!

With Power Pivot, we can either start with a new Excel file and link the two report files, or we can start with one of the report files and link in the other one. Our goal is to ensure easy update / report refresh so we must start a new Excel file and link the two report files. This way we can re-run the reports and save over old files every time to refresh the data for updated data. This should work in Microsoft share point or within a Windows network drive setup.

To connect the files, go to Power Pivot ribbon on top and click Manage on the leftmost Power Pivot ribbon. It will open up the Power Pivot for Excel in a separate window.

Power Pivot ribbon

Power Pivot for Excel

If you do not have Power Pivot on the top ribbon, please see this Microsoft link for instruction.

Click the icon From Other Sources, and scroll down to import data from an Excel file and click Next >.

Click Browse and select the order_report_sample file and click open. Click Next >. Make sure the file is closed before you import or Power Pivot for Excel will ask you to close the file.

Select "Use first row as column headers." because our first row is the column headers, then click Next >.

Select the shipment$ table and click "Finish" to load the table.

The inventory report should be loaded. Click "Close" to review the data.

The data should be loaded.

To get the pivot table data for the inventory, I use Power Query to transform the data and make it a table for the pivot table. Go to Data ribbon and click "Get Data" and select "Launch Power Query Editor."

Click "New Source," then "File," then "Microsoft Excel" to link to the inventory report file.

Select the "inventory" sheet and click OK.

I would change the query name so I know it's a query. In this case I change it to query_inventory and press Enter.

I want to create a pivot table with multiple storage locations as new columns, instead of the current format showing different storage location under the same columns. The value column will be the quantity. To create this pivot table, select the Storage Location ID column, then click Pivot Column in the Transform ribbon.

Select quantity as the value column, check the advanced tab to ensure the quantity is added together.

The resulting pivot table is what we need - each product # has multiple columns showing inventory at each storage location (A001, A002, A003, A004).

Click "Close & Load" under the Home ribbon to return back to Excel.

A new sheet with name "query_inventory" is created with the same information. To add a total_inventory column for each product #, type "Total_Inventory" next to A004 and press Enter

In the cell below, add a formula "=sum(c2..f2)" and press Enter to add up the inventory quantity from all locations. Query_inventory sheet was created as a table so formula is automatically copied to the cells below. Noticed that even the formula is different from what I typed because it's been converted to table reference.

To create the relationship between query_inventory and shipment table, we need the query_inventory to exist in Power Pivot data model. To do it, we can simply click "Add to Data Model" in the Power Pivot ribbon, and you will see a new tab "Table_query_inventory" created in Power Pivot for Excel.

Power Pivot for Excel new tab created!

The next step is to review and establish the relationship between Table_query_inventory and shipment table. The easiest way is to click the Diagram View in Power Pivot for Excel, and drag the Product # from Table_query_inventory to the Product # of the shipment table.

Drag Product # from Table_query_inventory to Product # in shipment

Relationship established!

The last step is to create a pivot table from Power Pivot. Click the PivotTable and select PivotTable from the selection. A new pop up "Create PivotTable would show up. Select New Worksheet and press OK.

I want the pivot table to show Order #, Product #, Product Description, Total Inventory, inventory from different locations, then order quantity for the product #. To organize the pivot table that way, we follow the order as such -

drag Order # down to Rows
drag Product # down to Rows
drag Product Description down to Rows
drag Total Inventory from Table_query_inventory down to Rows
drag A001, A002, A003, A004 to Rows, in that order, below Total Inventory
drag Order Quantity to Values

The resulting pivot table doesn't look quite right, but it's easy to fix!

Right click inside the pivot table, say around Red -T-Shirt, and click the Pivot Table Option.

Click the Display tab of the PivotTable Options window, the select "Classic PivotTable layout (enables dragging of fields in the grid), then click OK.

You should have a pivot table similar to what I have here now.

Data refresh -

This is something interesting that I discover while trying to see how the data refresh is done. Originally I thought if I do a refresh all within Pivot Table or Power Pivot window, all the data would get updated. Unfortunately this is NOT the case. The query is updated when we do a refresh all at pivot table or within Power Pivot window, but the table inside the Power Pivot window is not updated until we do another refresh all. I'm not sure why this is the case as I would expect the updated query to be fed into Power Pivot table and the pivot table accordingly.

That said, to be safe, please go to Query ribbon and manually refresh the query data before refreshing pivot table data. This will ensure the pivot data is updated. Furthermore, we probably want to change the pivot table options so that the data is updated every time the worksheet is open.

Conclusion

This project took me longer than I expected. I spent a lot of time trying to figure out the correct database layout, which isn't what was the original ask. I also spent a lot of time trial-and-error with Power Pivot and Power Query, until I found this short article on Microsoft site "How Power Query and Power Pivot work together"...

Lesson learned - evaluate what is being asked and provide the quickest, easiest solution. It doesn't have to the best solution because the user doesn't have all the time in the world to wait for that great solution. 😞

Misc info:

Possible next steps - create a report using Power Query or PowerBI, or Python.

Power Query Query Settings pane - I believe by default the Query Settings is hidden. To show this pane, go to View ribbon and click Query Settings. Then the Query Settings pane will show up. It is like a history pane showing all the steps that have been applied to this query.

Sample data created by ChatGPT.

Tuesday, May 2, 2023

So I asked ChatGPT and Bard to analyze the data for me...

In case you've lived under a rock for the last 5+ months, you probably heard about OpenAI's chatGPT and Google's Bard Large Language Model (LLM) generative AI models. ChatGPT's data is limited up to September of 2021, and as of this writing it is not connected to the Internet to pull information. Google's Bard was recently (April 21, 2023) updated to have the ability to code (generation / debug / explanation) (article).

I played with chatGPT when my wife asked me about it when it first came out. I have read about Natural Language Programming (NLP) a few years back but I haven't learned about the recent advancement. I didn't think it would be that unique considering what I knew about NLP and our ability to google for an answer, and the limited data up to September of 2021. The surprise came only after I used it for coding help - Excel, Python, PowerBI, etc. It was able to code, or generate lines of code for specific request! It saved me time from reading all the google search results / stack overfow codes to determine if a search result is the correct solution to my issue. The key to get what you are looking for by return, you may need to provide certain structure, or information, or example, or prompt engineering, to get chatGPT to generate the intended reply.

So, what is prompt to use?

please conduct a data analysis of the data from the attached link and provide a 4 paragraph summary - https://grouplens.org/datasets/movielens/1m/

From ChatGPT -

As an AI language model, I do not have access to the attached link. However, I can provide a general summary of the MovieLens 1M dataset provided by GroupLens and discuss potential insights that can be gained from conducting data analysis on the dataset.

The MovieLens 1M dataset contains movie ratings and demographic information for approximately 1 million users, along with information on 3,900 movies. The dataset includes information on user demographics such as age, gender, and occupation, as well as movie information such as genre, release year, and title. Ratings are provided on a scale of 1 to 5, with higher ratings indicating greater enjoyment of the movie.

By analyzing the MovieLens 1M dataset, one can gain valuable insights into user preferences for movies and identify trends and patterns in movie ratings. For example, one could examine the distribution of ratings across different genres of movies to identify which genres are more popular among users. Additionally, one could use clustering techniques to group users with similar tastes and identify common characteristics of those groups. These insights can be valuable for movie producers and distributors in understanding which types of movies are likely to be successful among different segments of the population.

Moreover, the MovieLens 1M dataset can be used to develop recommendation systems for movies. By analyzing the ratings of individual users and identifying similar users with similar ratings, one can develop personalized recommendations for users based on their individual tastes. This can improve the user experience on movie streaming platforms and increase engagement with the platform.

In conclusion, the MovieLens 1M dataset is a valuable resource for conducting data analysis on user preferences for movies. The dataset provides rich information on movie ratings and user demographics, and can be used to develop insights into user preferences, identify trends and patterns in movie ratings, and develop recommendation systems for movies. These insights can be valuable for movie producers and distributors in developing successful movies and improving the user experience on movie streaming platforms.

To say I'm disappointed is an understatement. I guess ChatGPT still does not connect to the internet, and did not have history information from the internet of this old dataset.

Since Bard's recent update showed it is now capable of connecting to the internet, I figured why not give it a try as well.

Using the same prompt, this is from Bard -

Sure, I can do that. Here is a 4 paragraph summary of the data from the attached link:

The MovieLens 1M dataset is a collection of 1 million movie ratings from 6,000 users. The dataset includes information about the users, the movies, and the ratings. The users are identified by a unique ID number, and the movies are identified by a unique ID number and a title. The ratings are on a scale of 1 to 5 stars.

The data analysis of the MovieLens 1M dataset revealed some interesting insights about movie ratings. For example, the average rating for all movies in the dataset is 3.5 stars. The most popular movies in the dataset are those that have been rated highly by many users. These movies include The Shawshank Redemption, The Godfather, and The Dark Knight. The movies that are most highly rated in the dataset are those that have been rated highly by a small number of users. These movies include The Silence of the Lambs, Schindler's List, and Pulp Fiction.

The data analysis also revealed some trends in movie ratings over time. For example, the average rating for movies has increased over time. This is likely due to the fact that movies are becoming more and more popular. The popularity of different genres of movies has also changed over time. For example, comedies were more popular in the 1980s, while dramas were more popular in the 1990s.

The MovieLens 1M dataset is a valuable resource for anyone who wants to learn more about movie ratings. It is a large and diverse dataset that can be used to answer a wide variety of questions about movies.

Here are some additional insights that can be gained from the MovieLens 1M dataset:

The most popular movies are typically action, comedy, or drama films.

The most highly rated movies are typically drama films.

The average rating for movies has increased over time.

The popularity of different genres of movies has changed over time.

The most popular movies among men are typically action, comedy, or thriller films.

The most popular movies among women are typically drama, comedy, or romance films.

The most highly rated movies among men are typically drama films.

The most highly rated movies among women are typically drama films.

Wow. That's good for basic data analysis. I need to step up my game otherwise companies can use AI to conduct analysis with obvious feedback. With prompt engineering, you can ask ChatGPT / Bard to provide explicit assumptions used, or include your specific assumptions in your prompt so AI takes that into account when preparing your specific answer.

I feel that Bard has a slight advantage over ChatGPT when you want to feed external data to AI to conduct the data analysis, assuming you are OK that OpenAI or Google get to review your data. If you are not OK with someone else seeing your data - then you're stuck with using AI for coding support for the time being. 😓

You can sign up ChatGPT here, and Bard here.

My data explorer journey

Thursday, May 18, 2023

Getting inventory info for open orders

Tuesday, May 2, 2023

So I asked ChatGPT and Bard to analyze the data for me...

Google Data Analytics capstone project

Google Data Analytics capstone project

Report Abuse

Labels