How to Use the OpenAI API with Python: A Step-by-Step Guide for Beginners

The OpenAI API is the entry point for integrating models like GPT-4o into your own applications. Unlike using ChatGPT from a browser, the API gives you full control: you choose the model, the context, the output format, and you can automate any workflow you can imagine.

This guide covers everything you need to go from zero to making real API calls with Python, understanding what each parameter does, and avoiding the most common mistakes.

Prerequisites

You need:

  • Python 3.8 or higher installed
  • An account at platform.openai.com with API credits
  • Basic Python knowledge (knowing what a function and a dictionary are is enough)

No prior experience with APIs or language models is required.

Step 1: Create Your API Key

Go to platform.openai.com/api-keys and generate a new API key. Store it somewhere safe — OpenAI won't show it to you again after creation.

Never put your API key directly in your code. Use environment variables instead:

export OPENAI_API_KEY="sk-..."

On Windows (PowerShell):

$env:OPENAI_API_KEY="sk-..."

Or use a .env file with the python-dotenv library, which we'll cover later.

Step 2: Install the Official Library

OpenAI maintains an official Python library. Install it with pip:

pip install openai

If you're working in a project with dependencies, add it to your requirements.txt:

openai>=1.0.0

Version 1.0 of the library (released in 2023) significantly changed the interface from earlier versions. If you find tutorials using openai.ChatCompletion.create(...), that's outdated code — the current syntax is different.

Step 3: Your First API Call

This is the minimum code to make a real API call:

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Explain what a transformer is in 3 sentences."}
    ]
)

print(response.choices[0].message.content)

Run it and you should see a model response in your terminal. If you see an authentication error, check that the environment variable is correctly set.

Understanding the Messages Structure

The messages parameter is an array of objects with two fields: role and content. The possible roles are:

  • system: global instructions for the model (its "personality" or context)
  • user: the user's message
  • assistant: previous model responses (to maintain context in conversations)

Example with a system prompt:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a technical assistant specialized in Python. Always respond with code examples."
        },
        {
            "role": "user",
            "content": "How do I sort a list of dictionaries by a specific key?"
        }
    ]
)

The system prompt is one of the most powerful tools in the API: it defines model behavior for the entire conversation without the user seeing it.

Key Parameters You Need to Know

model

Specifies which model to use. The most relevant ones currently, according to the official OpenAI documentation:

  • gpt-4o: the most capable model for general use
  • gpt-4o-mini: faster and more economical, good for simpler tasks
  • gpt-3.5-turbo: the cheapest option, useful for prototyping

temperature

Controls response randomness. Value between 0 and 2:

  • 0: deterministic and conservative responses (ideal for code or structured data)
  • 1: default behavior, balanced
  • 2: more creative and variable responses
response = client.chat.completions.create(
    model="gpt-4o",
    temperature=0,
    messages=[{"role": "user", "content": "Extract the name and email from this text: ..."}]
)

For data extraction or code generation tasks, use temperature=0 for consistent results.

max_tokens

Limits the maximum length of the response. One token equals roughly 0.75 words in English. If you don't specify it, the model can use all available context tokens.

response = client.chat.completions.create(
    model="gpt-4o",
    max_tokens=500,
    messages=[{"role": "user", "content": "Summarize this article in 3 bullet points."}]
)

response_format

You can ask the model to respond directly in JSON:

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "Always respond with valid JSON."
        },
        {
            "role": "user",
            "content": "Give me data for a fictional person with fields: name, age, city."
        }
    ]
)

import json
data = json.loads(response.choices[0].message.content)
print(data["name"])

This is especially useful when you need to process the model's output with code.

Managing Context in Conversations

The API has no memory between calls. To maintain a coherent conversation you need to send the full history with each request:

history = [
    {"role": "system", "content": "You are a helpful assistant."}
]

def chat(message):
    history.append({"role": "user", "content": message})

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=history
    )

    reply = response.choices[0].message.content
    history.append({"role": "assistant", "content": reply})

    return reply

print(chat("Hi, my name is Alex."))
print(chat("What's my name?"))  # The model will remember

Keep in mind that each model has a context limit (the maximum number of tokens it can process at once). For GPT-4o that limit is 128,000 tokens — enough for very long conversations.

Using Environment Variables with python-dotenv

For real projects, manage credentials with a .env file:

pip install python-dotenv

Create a .env file at the root of your project:

OPENAI_API_KEY=sk-...

Add .env to your .gitignore to avoid pushing credentials to GitHub:

.env

And load it in your script:

from dotenv import load_dotenv
import os
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

Handling Common Errors

API calls can fail for several reasons. Handle the most frequent errors:

from openai import OpenAI, RateLimitError, AuthenticationError, APIError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)

except AuthenticationError:
    print("Invalid or missing API key.")
except RateLimitError:
    print("Rate limit exceeded. Wait a few seconds and retry.")
except APIError as e:
    print(f"API error: {e}")

The most common error for new users is AuthenticationError — almost always caused by a misconfigured environment variable or an API key with no credits.

Tracking Token Usage and Cost

Each API call has a cost based on the model and number of tokens processed. You can check current pricing at openai.com/api/pricing.

The response object includes token usage for each call:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

print(f"Input tokens: {response.usage.prompt_tokens}")
print(f"Output tokens: {response.usage.completion_tokens}")
print(f"Total: {response.usage.total_tokens}")

To avoid unexpected charges, set a monthly spending limit at platform.openai.com/settings/organization/billing.

Complete Example: Text Summarization Script

Here's a functional script you can use as a starting point for your projects:

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

def summarize_text(text: str, points: int = 3) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        temperature=0,
        max_tokens=300,
        messages=[
            {
                "role": "system",
                "content": f"Summarize the text in exactly {points} key points. Be concise."
            },
            {
                "role": "user",
                "content": text
            }
        ]
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    sample_text = """
    Transformers are a neural network architecture introduced in 2017
    in the paper 'Attention is All You Need'. They use attention mechanisms
    to process data sequences in parallel, making them far more efficient
    than previous recurrent networks. They are the foundation of models
    like GPT, BERT, and virtually all modern LLMs.
    """

    result = summarize_text(sample_text, points=3)
    print(result)

Next Steps

With this foundation you can build anything from simple automation scripts to full applications. The next concepts worth exploring are:

  • Streaming: receiving the response token by token (like ChatGPT does) using stream=True
  • Function calling: letting the model call functions in your code to connect it with external data
  • Embeddings: converting text into vectors for semantic search — the foundation of RAG systems

The official API documentation covers all of these topics with up-to-date examples.