How to Use the OpenAI API with Python: A Step-by-Step Guide for Beginners
The OpenAI API is the entry point for integrating models like GPT-4o into your own applications. Unlike using ChatGPT from a browser, the API gives you full control: you choose the model, the context, the output format, and you can automate any workflow you can imagine.
This guide covers everything you need to go from zero to making real API calls with Python, understanding what each parameter does, and avoiding the most common mistakes.
Prerequisites
You need:
- Python 3.8 or higher installed
- An account at platform.openai.com with API credits
- Basic Python knowledge (knowing what a function and a dictionary are is enough)
No prior experience with APIs or language models is required.
Step 1: Create Your API Key
Go to platform.openai.com/api-keys and generate a new API key. Store it somewhere safe — OpenAI won't show it to you again after creation.
Never put your API key directly in your code. Use environment variables instead:
export OPENAI_API_KEY="sk-..."
On Windows (PowerShell):
$env:OPENAI_API_KEY="sk-..."
Or use a .env file with the python-dotenv library, which we'll cover later.
Step 2: Install the Official Library
OpenAI maintains an official Python library. Install it with pip:
pip install openai
If you're working in a project with dependencies, add it to your requirements.txt:
openai>=1.0.0
Version 1.0 of the library (released in 2023) significantly changed the interface from earlier versions. If you find tutorials using openai.ChatCompletion.create(...), that's outdated code — the current syntax is different.
Step 3: Your First API Call
This is the minimum code to make a real API call:
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Explain what a transformer is in 3 sentences."}
]
)
print(response.choices[0].message.content)
Run it and you should see a model response in your terminal. If you see an authentication error, check that the environment variable is correctly set.
Understanding the Messages Structure
The messages parameter is an array of objects with two fields: role and content. The possible roles are:
system: global instructions for the model (its "personality" or context)user: the user's messageassistant: previous model responses (to maintain context in conversations)
Example with a system prompt:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a technical assistant specialized in Python. Always respond with code examples."
},
{
"role": "user",
"content": "How do I sort a list of dictionaries by a specific key?"
}
]
)
The system prompt is one of the most powerful tools in the API: it defines model behavior for the entire conversation without the user seeing it.
Key Parameters You Need to Know
model
Specifies which model to use. The most relevant ones currently, according to the official OpenAI documentation:
gpt-4o: the most capable model for general usegpt-4o-mini: faster and more economical, good for simpler tasksgpt-3.5-turbo: the cheapest option, useful for prototyping
temperature
Controls response randomness. Value between 0 and 2:
0: deterministic and conservative responses (ideal for code or structured data)1: default behavior, balanced2: more creative and variable responses
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
messages=[{"role": "user", "content": "Extract the name and email from this text: ..."}]
)
For data extraction or code generation tasks, use temperature=0 for consistent results.
max_tokens
Limits the maximum length of the response. One token equals roughly 0.75 words in English. If you don't specify it, the model can use all available context tokens.
response = client.chat.completions.create(
model="gpt-4o",
max_tokens=500,
messages=[{"role": "user", "content": "Summarize this article in 3 bullet points."}]
)
response_format
You can ask the model to respond directly in JSON:
response = client.chat.completions.create(
model="gpt-4o",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": "Always respond with valid JSON."
},
{
"role": "user",
"content": "Give me data for a fictional person with fields: name, age, city."
}
]
)
import json
data = json.loads(response.choices[0].message.content)
print(data["name"])
This is especially useful when you need to process the model's output with code.
Managing Context in Conversations
The API has no memory between calls. To maintain a coherent conversation you need to send the full history with each request:
history = [
{"role": "system", "content": "You are a helpful assistant."}
]
def chat(message):
history.append({"role": "user", "content": message})
response = client.chat.completions.create(
model="gpt-4o",
messages=history
)
reply = response.choices[0].message.content
history.append({"role": "assistant", "content": reply})
return reply
print(chat("Hi, my name is Alex."))
print(chat("What's my name?")) # The model will remember
Keep in mind that each model has a context limit (the maximum number of tokens it can process at once). For GPT-4o that limit is 128,000 tokens — enough for very long conversations.
Using Environment Variables with python-dotenv
For real projects, manage credentials with a .env file:
pip install python-dotenv
Create a .env file at the root of your project:
OPENAI_API_KEY=sk-...
Add .env to your .gitignore to avoid pushing credentials to GitHub:
.env
And load it in your script:
from dotenv import load_dotenv
import os
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
Handling Common Errors
API calls can fail for several reasons. Handle the most frequent errors:
from openai import OpenAI, RateLimitError, AuthenticationError, APIError
client = OpenAI()
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
except AuthenticationError:
print("Invalid or missing API key.")
except RateLimitError:
print("Rate limit exceeded. Wait a few seconds and retry.")
except APIError as e:
print(f"API error: {e}")
The most common error for new users is AuthenticationError — almost always caused by a misconfigured environment variable or an API key with no credits.
Tracking Token Usage and Cost
Each API call has a cost based on the model and number of tokens processed. You can check current pricing at openai.com/api/pricing.
The response object includes token usage for each call:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is machine learning?"}]
)
print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total: {response.usage.total_tokens}")
To avoid unexpected charges, set a monthly spending limit at platform.openai.com/settings/organization/billing.
Complete Example: Text Summarization Script
Here's a functional script you can use as a starting point for your projects:
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
def summarize_text(text: str, points: int = 3) -> str:
response = client.chat.completions.create(
model="gpt-4o",
temperature=0,
max_tokens=300,
messages=[
{
"role": "system",
"content": f"Summarize the text in exactly {points} key points. Be concise."
},
{
"role": "user",
"content": text
}
]
)
return response.choices[0].message.content
if __name__ == "__main__":
sample_text = """
Transformers are a neural network architecture introduced in 2017
in the paper 'Attention is All You Need'. They use attention mechanisms
to process data sequences in parallel, making them far more efficient
than previous recurrent networks. They are the foundation of models
like GPT, BERT, and virtually all modern LLMs.
"""
result = summarize_text(sample_text, points=3)
print(result)
Next Steps
With this foundation you can build anything from simple automation scripts to full applications. The next concepts worth exploring are:
- Streaming: receiving the response token by token (like ChatGPT does) using
stream=True - Function calling: letting the model call functions in your code to connect it with external data
- Embeddings: converting text into vectors for semantic search — the foundation of RAG systems
The official API documentation covers all of these topics with up-to-date examples.