Your cart is currently empty!
Could I be a Data Analyst with just Python?
This is a tough question to answer, but the truth is: Yes, you can become a great Data Analyst by learning just Python, combined with the right mindset.
A programming language on its own isn’t enough to solve problems. What really matters is your ability to identify a problem, and then use the tools you know (like Python) to find a solution. That’s where the mindset of a Data Analyst kicks in.

For example, a common challenge for Data Analysts is working with large amounts of data, often with many variables, or even unorganised and messy data, and needing to analyse it quickly. And, normally when we start we use Excel or similar.
This is where Python becomes a game changer. While tools like Excel are helpful, they can struggle or even crash when you’re dealing with big datasets (tables). Python, on the other hand, can handle large volumes of data efficiently, and with just a few lines of code, you can filter, clean, transform, and visualise data at scale.
Other tools and languages like SQL are also great for analysing data. But when you’re starting out, you may not have access to a SQL server, a clean and organised database, or enterprise tools like SAS or Qlik, which often require expensive licenses. And if you’re just experimenting or learning, these costs might not be worth it.
Python is free, works across all major platforms (Windows, Mac, Linux), and thanks to platforms like ChatGPT, you can even test your Python code directly in your browser.
Another reason why learning Python is a smart investment in learning is its strong presence in modern data platforms. Many cloud technologies use Python to transform, automate, and store data. For example, PySpark is a Python-based interface for Apache Spark (a powerful engine used for handling big data). This means Python is not just a stepping stone, it’s a long-term tool you’ll continue to use throughout your career.
So, where do you start?
If you want to go further, feel free to join one of our free webinars, we regularly talk about topics like this and more.
We also offer courses that help you implement this knowledge step by step — including how to build ETL processes in Python, work with Data Warehouses, and even use Machine Learning to create forecasts and predictions.
No technical background is required — all you need is a curious mind and real-world problems that need data-powered solutions.
Start by learning the basics of data manipulation in Python. Focus on libraries like pandas
and lumpy for working with tables and matplotlib
or seaborn
for visualisation.
I will leave you this code to start to create a sample dataset to play with:
#=============================================================================
# Code Generated by: Ramon Zamorano
# Exclusively for Excel in BI Workshop & Training Attendees
# Purpose: Generate 500,000 Rows of Sample Sales Data for Analysis
# =============================================================================
# Copyright © 2025 Excel in BI Limited. All Rights Reserved.
# =============================================================================
# For training and workshop use only.
# Contact: contact@excelinbi.com | www.excelinbi.com
# =============================================================================
import pandas as pd
import numpy as np
# Define number of sample rows (100 rows)
num_rows = 100 # 100 rows
# Generate sample data
np.random.seed(42)
order_ids = np.arange(1000000, 1000000 + num_rows)
dates = pd.date_range(start="2020-01-01", periods=num_rows, freq="min").strftime("%Y-%m-%d %H:%M:%S")
products = np.random.choice(["Laptop", "Phone", "Tablet", "Monitor", "Printer"], num_rows)
categories = np.random.choice(["Electronics", "Accessories", "Office Supplies"], num_rows)
subcategories = np.random.choice(["Computers", "Mobile", "Furniture", "Stationery"], num_rows)
customer_ids = np.random.randint(1000, 9999, num_rows)
regions = np.random.choice(["North America", "Europe", "Asia", "South America"], num_rows)
countries = np.random.choice(["USA", "Germany", "India", "Brazil", "Australia"], num_rows)
cities = np.random.choice(["New York", "Berlin", "Mumbai", "Sao Paulo", "Sydney"], num_rows)
quantities = np.random.randint(1, 10, num_rows)
unit_prices = np.round(np.random.uniform(50, 1000, num_rows), 2)
discounts = np.round(np.random.uniform(0, 0.5, num_rows), 2)
total_sales = quantities * unit_prices * (1 - discounts)
profits = np.round(total_sales * np.random.uniform(0.1, 0.3), 2)
# Create DataFrame
df = pd.DataFrame({
"Order ID": order_ids,
"Date": dates,
"Product": products,
"Category": categories,
"Subcategory": subcategories,
"Customer ID": customer_ids,
"Region": regions,
"Country": countries,
"City": cities,
"Quantity": quantities,
"Unit Price": unit_prices,
"Discount": discounts,
"Total Sales": total_sales,
"Profit": profits
})
# Define output paths
csv_path = r'C:\...Your Folder..\sample_sales_data.csv'
xlsx_path = r'C:\...Your Folder..\sample_sales_data.xlsx'
# Save as CSV (no compression)
df.to_csv(csv_path, index=False)
print(f"CSV file saved successfully: {csv_path}")
# Save as Excel (split into multiple sheets)
chunk_size = 1_048_576 # Excel sheet limit
num_sheets = (num_rows // chunk_size) + 1
with pd.ExcelWriter(xlsx_path, engine='xlsxwriter') as writer:
for i in range(num_sheets):
start_row = i * chunk_size
end_row = min(start_row + chunk_size, num_rows)
df.iloc[start_row:end_row].to_excel(writer, sheet_name=f'Sheet{i+1}', index=False)
print(f"Excel file saved successfully: {xlsx_path}")