Using the ORDERS and ORDER_ITEMS datasets, analyze monthly sales data for the past 3 years. Identify key growth trends (YoY, MoM), seasonality patterns (e.g., holiday spikes, summer dips), and top-performing products by revenue and quantity sold. The output should be a report or notebook including: 1) Line charts visualizing sales trends over time. 2) Bar charts for top product performance. 3) A summary of identified seasonality. 4) A sales forecast for the next 6 months using a suitable time series model (e.g., SARIMA or Prophet), presented with a 95% confidence interval and a brief explanation of the model choice and key assumptions.
I'll help you analyze the monthly sales data from the ORDERS and ORDER_ITEMS datasets. Let me first explore the available data sources to understand what we're working with.
Great! I can see the ORDERS and ORDER_ITEMS tables in the nova_glow database. Let's examine the structure of these tables to understand what data we have available.
Column | Type | Nullable | Primary |
---|---|---|---|
ORDER_ID | VARCHAR(16777216) | No | No |
CUSTOMER_ID | VARCHAR(16777216) | No | No |
ORDER_DATE | TIMESTAMP_NTZ(9) | No | No |
ORDER_STATUS | VARCHAR(16777216) | No | No |
SHIPPING_STREET_ADDRESS | VARCHAR(16777216) | No | No |
SHIPPING_CITY | VARCHAR(16777216) | No | No |
SHIPPING_STATE | VARCHAR(16777216) | No | No |
SHIPPING_ZIP_CODE | VARCHAR(16777216) | No | No |
SHIPPING_METHOD | VARCHAR(16777216) | No | No |
PAYMENT_METHOD_USED | VARCHAR(16777216) | No | No |
PROMO_CODE_USED | VARCHAR(16777216) | No | No |
SUBTOTAL_AMOUNT | FLOAT | No | No |
DISCOUNT_AMOUNT | FLOAT | No | No |
SHIPPING_COST | FLOAT | No | No |
TAX_AMOUNT | FLOAT | No | No |
TOTAL_ORDER_AMOUNT | FLOAT | No | No |
Column | Type | Nullable | Primary |
---|---|---|---|
ORDER_ITEM_ID | VARCHAR(16777216) | No | No |
ORDER_ID | VARCHAR(16777216) | No | No |
PRODUCT_ID | VARCHAR(16777216) | No | No |
QUANTITY | NUMBER(38,0) | No | No |
PRICE_PER_UNIT_AT_PURCHASE | FLOAT | No | No |
TOTAL_ITEM_PRICE | FLOAT | No | No |
Now, let's also check the products table since we'll need it to identify top-performing products:
Column | Type | Nullable | Primary |
---|---|---|---|
PRODUCT_ID | VARCHAR(16777216) | No | No |
PRODUCT_NAME | VARCHAR(16777216) | No | No |
CATEGORY | VARCHAR(16777216) | No | No |
SUPPLIER_NAME | VARCHAR(16777216) | No | No |
COST_PRICE | FLOAT | No | No |
SELLING_PRICE | FLOAT | No | No |
DESCRIPTION | VARCHAR(16777216) | No | No |
DATE_ADDED | DATE | No | No |
Let's get a sample of the data to understand the date ranges and format:
EARLIEST_DATE | LATEST_DATE |
---|---|
2021-02-14 08:10:34.000 | 2023-12-31 14:18:01.000 |
Perfect! Now I have a good understanding of the data. The date range spans from February 2021 to December 2023, which gives us almost 3 years of data as requested. Let's create a notebook to analyze this data.
Now, let's start building our analysis notebook. I'll begin with an introduction and then load the necessary data.
# Monthly Sales Analysis (2021-2023) with Forecast
This notebook analyzes the monthly sales data...
# Import necessary libraries
import pandas as pd
import numpy as np
import altair as alt
from dat...
Let me try again with a simpler import statement:
# Import necessary libraries
import pandas as pd
import numpy as np
import altair as alt
from dat...
Let's try again with a simpler approach:
# Import necessary libraries
import pandas as pd
import numpy as np
import altair as alt
from dat...
Now, let's load the data from the database:
# Load orders data
orders_query = """
SELECT
ORDER_ID,
CUSTOMER_ID,
ORDER_DATE,
...
It seems we didn't get any results with the 'Completed' status filter. Let's check what order statuses are available:
# Check available order statuses
status_query = """
SELECT DISTINCT ORDER_STATUS
FROM orders
"""
...
Now let's load the orders with the correct status ('Delivered' instead of 'Completed'):
# Load orders data with correct status
orders_query = """
SELECT
ORDER_ID,
CUSTOMER_ID,...
Now let's load the order items data:
# Load order items data
order_items_query = """
SELECT
oi.ORDER_ITEM_ID,
oi.ORDER_ID,
...
Now, let's prepare the data for our monthly sales analysis:
# Data preparation for monthly sales analysis
orders_df['order_date'] = pd.to_datetime(orders_df[...
Let's visualize the monthly sales trend:
# Visualize monthly sales trend
monthly_sales_chart = alt.Chart(monthly_sales).mark_line(point=Tr...
Now, let's calculate Year-over-Year (YoY) and Month-over-Month (MoM) growth rates:
# Calculate Year-over-Year (YoY) growth
monthly_sales['year_month_num'] = monthly_sales['year'] *...
Let's visualize the YoY and MoM growth rates:
# Visualize YoY growth
# Filter out NaN values and extreme outliers for better visualization
yoy_...
Now, let's analyze the top-performing products by revenue and quantity:
# Analyze top-performing products by revenue
top_products_revenue = order_items_df.groupby(['prod...
# Display top 10 products by quantity
print("Top 10 Products by Quantity:")
top_products_quantity...
Let's visualize the top products by revenue and quantity:
# Visualize top 10 products by revenue
top_10_revenue = top_products_revenue.head(10).copy()
top_...
Now, let's analyze the sales by category:
# Analyze sales by category
category_sales = order_items_df.groupby('category').agg(
total_re...
Let's visualize the sales by category:
# Visualize sales by category
category_revenue_chart = alt.Chart(category_sales).mark_bar().encod...
Now, let's analyze the seasonality patterns in the sales data:
# Analyze seasonality patterns
# Group by month (regardless of year) to see monthly patterns
mont...
Let's visualize the monthly seasonality patterns:
# Visualize monthly seasonality patterns
monthly_pattern_chart = alt.Chart(monthly_pattern).mark_...
Now, let's analyze the quarterly patterns:
# Analyze quarterly patterns
orders_df['quarter'] = ((orders_df['month'] - 1) // 3) + 1
quarterly...
Let's visualize the quarterly patterns:
# Visualize quarterly patterns
quarterly_chart = alt.Chart(quarterly_pattern).mark_bar().encode(
...
Now, let's prepare the data for time series forecasting:
# Prepare data for time series forecasting
# We'll use the monthly sales data
forecast_data = mon...
Let's perform time series decomposition to understand the seasonality better:
# Perform time series decomposition
# We need at least 2 full cycles for seasonal decomposition, ...
Let's visualize the decomposition:
# Visualize the decomposition
# Reset index to use date as a column
decomp_viz_data = decomp_data...
Now, let's build a SARIMA model for forecasting:
# Build a SARIMA model for forecasting
# SARIMA parameters
# p, d, q: Non-seasonal parameters (AR...
Let's generate the forecast for the next 6 months:
# Generate forecast for the next 6 months
forecast_steps = 6
forecast = model_fit.get_forecast(st...
Let's visualize the forecast:
# Visualize the forecast
# Prepare data for visualization
historical_data = forecast_data.reset_i...
Now, let's add a summary of our findings:
# Summary of Sales Analysis and Forecast
## Key Growth Trends
### Year-over-Year (YoY) Growth
-...
# Conclusion and Recommendations
Based on our comprehensive analysis of Nova Glow's sales data f...
I've completed a comprehensive analysis of the monthly sales data for Nova Glow from 2021 to 2023, as requested. Here's a summary of what I've done:
-
Data Analysis and Visualization:
- Created line charts showing monthly sales trends over the three-year period
- Calculated and visualized Year-over-Year (YoY) and Month-over-Month (MoM) growth rates
- Identified and visualized seasonality patterns through monthly and quarterly analysis
- Created bar charts for top-performing products by revenue and quantity sold
- Analyzed sales performance by product category
-
Seasonality Identification:
- Identified peak sales months (November, October, December) and low sales months (February, April, January)
- Confirmed the holiday season spike in Q4 (October-December)
- Discovered a secondary peak during summer months (July-September)
- Used time series decomposition to separate trend, seasonal, and residual components
-
Sales Forecast:
- Built a SARIMA (1,1,1)x(1,1,1,12) model to forecast sales for the next 6 months
- Provided forecasted values with 95% confidence intervals
- Visualized the forecast alongside historical data
- Explained the model choice and key assumptions
-
Key Findings and Recommendations:
- Documented the exceptional growth trajectory (300% YoY from 2021-2022, 100% YoY from 2022-2023)
- Identified top-performing products and categories
- Provided strategic recommendations for inventory planning, product development, marketing, operations, and growth opportunities
The notebook is organized in a logical flow, starting with data exploration, moving through trend analysis, seasonality identification, product performance, and ending with the forecast and recommendations. All visualizations are interactive, allowing for deeper exploration of the data.
Is there any specific aspect of the analysis you'd like me to explain in more detail or any additional analysis you'd like me to perform?