AI models: choosing and estimating your consumption

Each call to an AI model consumes credits. Without clear reference points, it is difficult to anticipate how many an automation will use, especially when it is triggered on hundreds or thousands of records.

This article guides you step by step to estimate your consumption before deploying, choose the right model, and validate your results in the logs.

Contents

1 · Understand what you are consuming 2 · Calculate your estimated consumption 3 · Choose the right model 4 · Set the maximum number of tokens 5 · Test on a sample before deploying 6 · Validate in the logs

Understand what you are consuming

Before calculating anything, ask yourself three simple questions about your automation:

How large is what I send to the AI?

Everything you send, including the prompt, the instructions, and the content to analyze, is counted in tokens. The longer it is, the more it consumes.

What you sendApproximate tokens

A short prompt (simple instruction)~50 to 100 tokens

A short email (150·200 words)~300 tokens

One page of text (~400 words)~500 tokens

A 10 page contract~5 000 tokens

In French, a text consumes 10 to 15% more tokens than its equivalent in English.

How long is the expected response?

The response generated by the AI is also counted in tokens. The longer and more detailed the response you ask for, the more it consumes.

What the AI returnsApproximate tokens

One word or one extracted value (for example, category)~5 to 20 tokens

A short reply suggestion (3·4 lines)~100 tokens

A structured summary (10·15 lines)~300 tokens

A full report or detailed summary~1 000 tokens and more

In general, the response represents 20 to 40% of the total volume. Prefer short and structured responses to save your credits.

How many times will my scenario run?

Each trigger of your automation consumes credits. Total monthly consumption depends directly on the number of executions.

Credits / executionExecutions / monthTotal consumed

105 cr. (email GPT·4o Mini)10 emails / day = 300 / month31 500 credits

105 cr. (email GPT·4o Mini)100 emails / day = 3 000 / month315 000 credits

50 cr. (Mistral 7B extraction)500 records / month25 000 credits

Tip: start by testing on a small volume before deploying your automation across all of your data.

Once these three questions have been answered, you have all the information you need to estimate your monthly consumption. Move to the next step to apply the calculation formula.

Calculate your estimated consumption

Each exchange with the AI includes two token billed flows: what you send (input) and what the model generates in return (output). TimeTonic automatically converts these tokens into credits and displays the result in the log of each execution.

What is a token?

A token corresponds to about 4 characters of text, or about 3/4 of a word. Everything you send to the AI is counted in tokens: the prompt, the instructions, and the content to analyze. The generated response is also counted in tokens. The longer it is, the more it consumes.

Orders of magnitude: 1 short email ≈ 300 tokens · 1 page of text ≈ 500 tokens · 1 contract of 10 pages ≈ 5 000 tokens

In the logs of each execution, you can directly read:

InputCredits consumed for the prompt and the content sent to the AI: instructions, data, attached files

OutputCredits consumed for the response generated by the model: produced text, extracted data, summary

TotalInput + Output = actual execution cost · this is the value deducted from your monthly quota

Concrete example: Suggested reply to an email

An employee receives a customer email and wants the AI to automatically suggest a reply. Here is what happens behind the scenes.

Step 1 → What is sent to the AI (input tokens)

Input

The AI role (system prompt)
Ex. "You are a sales assistant. You write professional and concise replies."

~50 tokens

Input

The question / instruction
Ex. "Here is a received email: $email_content. Suggest a professional reply in 3·4 lines."

~50 tokens

Input

The content of the received email
Ex. a 200 word customer email asking for a quote

~200 tokens

Step 2 → What the AI returns (output tokens)

Output

The generated reply suggestion
Ex. a reply text of 3·4 lines (~80 words)

~100 tokens

Result with GPT·4o Mini

Input300 tokens

= 45 credits

Output100 tokens

= 60 credits

Total per execution

= 105 credits

Perspective:
With 1 000 000 credits / month (PRO plan) that means about 9 500 suggested email replies per month.
With 10 000 credits / month (FREE plan) that means about 95 suggestions per month.

These figures are indicative. Actual consumption depends on the exact length of your prompts, the processed content, and the generated response. Always test on a representative sample before deploying at scale.

Choose the right model

Choosing the model is the factor that has the biggest impact on your consumption. TimeTonic offers models that are directly accessible through your AI credits: no provider subscription, no API key management. This is often the simplest and most cost·effective solution to get started and scale.

Why choose TimeTonic credits first?

→ No provider subscription to sign up for and no API key to manage

→ Two independent monthly quotas: one for private AI, one for public cloud AI, on a single billing line

→ The models available through your credits cover the vast majority of common use cases

→ Credits automatically reset every month based on your plan

→ Additional credit packs are available from the Business plan if you need more volume during the month

→ If your quota is exhausted before the end of the month, you can use your own API key (BYOK) to keep your automations running without waiting for the renewal

Available models by source and indicative cost

📏 The estimates below are indicative. Always test on a representative sample before deploying at scale. 1 credit = 1 000 tokens · 1 page ≈ 500 tokens

Total context: maximum volume the model can process in a single execution, with input and output combined. This is the value shown in parentheses in the model selector in the TimeTonic interface.
Max read / reply capacity: billing rates per million tokens in input and output. Expressed in K (thousands) or M (millions).

☁️ Public cloud AI → TimeTonic creditsFrom the START plan · OpenAI · Mistral · Anthropic · Google

ModelTotal
contextMax read
capacityMax reply
capacityShort
email10 page
documentExpense report
(image)Best for

Gemini 2.0 Flash Lite

Cost·effective

1M75K300K~45 cr.~1 350 cr.→High volumes, simple repetitive tasks, processing many records

GPT·OSS 20B

Cost·effective

131K40K150K~30 cr.~900 cr.→Simple repetitive tasks, classification, short rewriting

DeepSeek V3.1

Cost·effective

32.7K150K750K~75 cr.~2 250 cr.→Data extraction, reasoning, good French support

Mistral Small

Standard

128K350K560K~175 cr.~5 250 cr.→Native French, public sector, long documents in French

GPT·4o Mini

Standard📎 Vision📄 OCR

128K150K600K~105 cr.~3 150 cr. *~105 cr.Versatile, reliable, image analysis and text extraction from document photos

GPT·4.1

Premium📎 Vision📄 OCR

1.048M2M8M~1 000 cr.~30 000 cr. *~1 000 cr.Complex documents, long deeds, legal or financial analysis from photos

Claude 3.5 Sonnet

Premium📎 Vision📄 OCR

200K3M15M~1 500 cr.~45 000 cr. *~1 500 cr.Nuanced analysis, long and detailed writing, complex instructions from photos

Mistral Large

Premium📎 Vision📄 OCR

128K2M6M~1 000 cr.~30 000 cr. *~1 000 cr.Advanced French, contracts, reports, image analysis in French

⭐ Private & secure AI → TimeTonic creditsFrom the PRO plan · Sovereign data · Hosting in France

ModelTotal
contextMax read
capacityMax reply
capacityShort
email10 page
documentExpense report
(image)Best for

Mistral 7B Instruct

Cost·effective

32.7K100K100K~50 cr.~1 500 cr.→Repetitive text, simple generation, high volume, sensitive data

Llama 3.1 8B Instruct

Cost·effective

131K100K100K~50 cr.~1 500 cr.→Simple extraction, fast classification, sovereign data triage

GPT·OSS 20B

Cost·effective

131K40K150K~30 cr.~900 cr.→Light reasoning, qualification, short analysis on internal data

Mistral Small 3.2 24B

Cost·effective📎 Vision📄 OCR

128K90K280K~100 cr.~2 900 cr. *~100 cr.Native French, simple image analysis, text extraction, sensitive data

Llama 3.3 70B Instruct

Premium

131K670K670K~335 cr.~10 050 cr.→Complex text documents, deeds, contracts, reports on sensitive data

Qwen 2.5 VL 72B

Premium📎 Vision📄 OCR

32.7K910K910K~455 cr.~13 650 cr. *~455 cr.OCR specialized: precise extraction from scanned documents, ID cards, invoices, bank details

📎 Vision: the model can analyze an attached image and understand its content
📄 OCR: the model can extract structured text from document photos (invoices, ID cards, receipts)
* 10 page document: estimate valid only for images (photos of pages). For PDF files, use the action Process a document with MistralAI OCR in two steps: text extraction via OCR, then analysis by the model of your choice.
Total context = combined input + output, shown in parentheses in the model selector · K = thousands of tokens · M = millions of tokens · 1 page ≈ 500 tokens · 1 short email ≈ 300 tokens

🔑 Public cloud AI: Personal API key (BYOK)All plans · 0 TimeTonic credit · Provider billing

With your own API key (ChatGPT or Mistral AI), you get access to all models offered by these providers, including GPT·5, GPT·4.1, o3, o4, Mistral Large, Pixtral 12B, etc., without consuming TimeTonic credits. Billing is handled directly on your provider account.

This option is relevant if you already have an active subscription with a provider or if you want access to very recent models not yet available through TimeTonic credits.

To compare provider prices: OpenAI · Mistral AI

How to choose quickly?

⭐ Private & secure AI → Hosting in France

Simple task, high volume

Mistral 7B Instruct, Llama 3.1 8B, GPT·OSS 20B

Extraction, analysis, images

Mistral Small 3.2 24B 📎 Vision · 📄 OCR

Complex documents, deeds

Llama 3.3 70B Instruct

Powerful OCR → cards, bank details, invoices

Qwen 2.5 VL 72B 📎 Vision · 📄 OCR

☁️ Public cloud AI → TimeTonic credits

Simple task, high volume

Gemini 2.0 Flash Lite, GPT·OSS 20B, DeepSeek V3.1

Native French, public sector

Mistral Small

Extraction, analysis, images

GPT·4o Mini 📎 Vision · 📄 OCR

Complex task, critical quality

GPT·4.1, Claude 3.5 Sonnet, Mistral Large 📎 Vision · 📄 OCR

Sensitive data / sovereignty

Switch to private & secure AI → hosting in France

In most cases, models available through TimeTonic credits offer an excellent quality / cost ratio, with no provider subscription.

Compare models available from providers: OpenAI Models (ChatGPT) · Mistral AI Models

Set the maximum number of tokens

Once you have chosen your model, adjust the Maximum number of tokens field in the configuration of your Ask an AI action. This parameter limits the length of the generated response → and therefore directly your credit consumption.

Maximum number of tokens field in the Ask an AI action configuration

Limit to what is strictly necessary

For classification or short extraction, set a low limit. A 100 token response costs 10 times less than a 1 000 token response. If you are extracting a category or an amount, 50 tokens are more than enough.

Adapt the limit to the task type

Calibrate by scenario type, not globally.

JSON extraction50 to 200 tokens are enough

Reply suggestion100 to 300 tokens

Summary or report500 to 1 000 tokens

Refine your prompt to reduce input tokens

A short and precise prompt consumes fewer input tokens. Avoid repetition and unnecessary explanations in your question.

Good practice: ask the AI to reply in JSON with short keys → this reduces both the response length and makes mapping easier in your TimeTonic fields.

💡 Simple rule: set the maximum number of tokens to double what you expect as a response. If you want a 3 line reply (~60 words), set the limit to 150 tokens. You keep a margin without wasting credits.

Test on a sample before deploying

Before deploying your automation across all of your data, run it on a small and representative sample. The goal is to validate the quality of the result and measure actual consumption.

Choose 5 to 10 representative cases

Vary the profiles: short documents, long ones, with or without attachments, edge cases. The sample must cover the diversity of your actual data.

Check the quality of the result

Does the model produce the expected answer? If not, adjust the role, the question, or switch models before continuing.

Record actual consumption in the logs

Compare the input and output tokens you observe with your estimates. Adjust your token setting and your model choice if necessary.

Once your sample has been validated, you can deploy at scale with confidence → and with a reliable estimate of your monthly consumption.

Validate and monitor in the logs

Estimates are a good starting point, but nothing replaces observing your actual consumption. After each execution, TimeTonic records the exact number of consumed credits: input and output, model by model, row by row.

This is the most reliable way to validate your estimates, compare two models on the same task, and refine your choice before scaling up.

What you see in the logs

For each execution: the model used, the number of input tokens, the number of output tokens, and the total consumed credits.

Compare two models

Test the same task with Mistral Small and GPT·4o Mini for example, then compare the actual cost in the logs. You can then choose with full awareness.

Refine before scaling

Once the execution cost has been validated on a real sample, multiply it by your monthly volume for a reliable projection of your consumption.

View consumption in the logs →

Search

Let us guide you!

Understand what you are consuming

Calculate your estimated consumption

Concrete example: Suggested reply to an email

Choose the right model

Available models by source and indicative cost

Set the maximum number of tokens

Test on a sample before deploying

Validate and monitor in the logs