AI models: choosing and estimating your consumption
Each call to an AI model consumes credits. Without clear reference points, it is difficult to anticipate how many an automation will use, especially when it is triggered on hundreds or thousands of records.
This article guides you step by step to estimate your consumption before deploying, choose the right model, and validate your results in the logs.
50 cr. (Mistral 7B extraction)500 records / month25 000 credits
Tip: start by testing on a small volume before
deploying your automation across all of your
data.
Once these three questions have been answered, you have all the information
you need to estimate your monthly consumption. Move to the next step
to apply the calculation formula.
2
Calculate your estimated consumption
Each exchange with the AI includes two token billed flows:
what
you send
(input) and what the model generates in return (output).
TimeTonic automatically converts these tokens into credits and displays
the result in the log
of each execution.
What is a token?
A token corresponds to about 4 characters of text, or about
3/4
of a word. Everything you
send to the AI is counted in tokens: the prompt, the instructions,
and
the content to analyze.
The generated response is also counted in tokens. The longer it is,
the
more it consumes.
Orders of magnitude:
1 short email ≈ 300 tokens · 1 page of text ≈ 500 tokens · 1 contract
of 10 pages ≈ 5 000 tokens
In the logs of each execution, you can directly read:
InputCredits consumed for the prompt and the content sent to the AI: instructions, data, attached files
OutputCredits consumed for the response generated by the model: produced text, extracted data, summary
TotalInput + Output = actual execution cost · this is the value deducted from your monthly quota
Concrete example: Suggested reply to an email
An employee receives a customer email and wants the AI to automatically suggest
a reply. Here is what happens behind the scenes.
Step 1 → What is sent to the AI (input tokens)
Input
The AI role (system prompt) Ex. "You are a sales assistant. You write professional and concise replies."
~50 tokens
Input
The question / instruction Ex. "Here is a received email: $email_content. Suggest a professional reply in 3·4 lines."
~50 tokens
Input
The content of the received email Ex. a 200 word customer email asking for a quote
~200 tokens
Step 2 → What the AI returns (output tokens)
Output
The generated reply suggestion Ex. a reply text of 3·4 lines (~80 words)
~100 tokens
Result with GPT·4o Mini
Input300 tokens
= 45 credits
Output100 tokens
= 60 credits
Total per execution
= 105 credits
Perspective:
With 1 000 000 credits / month (PRO plan) that means
about 9 500 suggested email replies per month.
With 10 000 credits / month (FREE plan) that means about
95 suggestions per month.
These figures are indicative. Actual consumption depends on the
exact length of your prompts, the processed content, and the generated
response. Always test on a representative sample before
deploying at scale.
3
Choose the right model
Choosing the model is the factor that has the biggest impact on your consumption.
TimeTonic offers models that are directly accessible through your AI
credits: no provider subscription, no API key management. This is
often
the simplest and most cost·effective solution
to get started and scale.
Why choose TimeTonic credits first?
→
No provider subscription to sign up for and no API key to manage
→
Two independent monthly quotas: one for private AI,
one for public cloud AI, on a single billing line
→
The models available through your credits cover the vast
majority of common use cases
→
Credits automatically reset every month
based on your plan
→
Additional credit packs are available from
the Business plan if you need more volume during
the month
→
If your quota is exhausted before the end of the month, you can
use your own API key (BYOK) to keep your automations running
without waiting for the renewal
Available models by source and indicative cost
📏 The estimates below are indicative. Always test on
a representative sample before deploying at scale.
1 credit = 1 000 tokens · 1 page ≈ 500 tokens
Total context: maximum
volume the model can process in a single execution, with input
and output combined. This is the value shown in parentheses in
the model selector in the TimeTonic interface. Max read / reply capacity:
billing rates per million tokens in input and output.
Expressed in K (thousands) or M (millions).
☁️ Public cloud AI → TimeTonic creditsFrom the START plan · OpenAI · Mistral · Anthropic · Google
32.7K100K100K~50 cr.~1 500 cr.→Repetitive text, simple generation, high volume, sensitive data
Llama 3.1 8B Instruct
Cost·effective
131K100K100K~50 cr.~1 500 cr.→Simple extraction, fast classification, sovereign data triage
GPT·OSS 20B
Cost·effective
131K40K150K~30 cr.~900 cr.→Light reasoning, qualification, short analysis on internal data
Mistral Small 3.2 24B
Cost·effective📎 Vision📄 OCR
128K90K280K~100 cr.~2 900 cr. *~100 cr.Native French, simple image analysis, text extraction, sensitive data
Llama 3.3 70B Instruct
Premium
131K670K670K~335 cr.~10 050 cr.→Complex text documents, deeds, contracts, reports on sensitive data
Qwen 2.5 VL 72B
Premium📎 Vision📄 OCR
32.7K910K910K~455 cr.~13 650 cr. *~455 cr.OCR specialized: precise extraction from scanned documents, ID cards, invoices, bank details
📎 Vision: the model can analyze an attached image and understand
its content
📄 OCR: the model can extract structured text from document
photos (invoices, ID cards, receipts)
* 10 page document: estimate valid only for
images (photos of pages).
For PDF files, use the action
Process a document with MistralAI OCR
in two steps: text extraction via OCR, then analysis by the
model of your choice.
Total context = combined input + output, shown in parentheses
in the model selector · K = thousands of tokens · M = millions
of tokens · 1 page ≈ 500 tokens · 1 short email ≈ 300 tokens
🔑 Public cloud AI: Personal API key (BYOK)All plans · 0 TimeTonic credit · Provider billing
With your own API key (ChatGPT or Mistral AI), you get access
to all models offered by these providers, including GPT·5,
GPT·4.1, o3, o4, Mistral Large, Pixtral 12B, etc.,
without consuming TimeTonic credits.
Billing is handled directly on your provider account.
This option is relevant if you already have an active subscription
with a provider or if you want access to
very recent models not yet available through TimeTonic credits.
Once you have chosen your model, adjust the
Maximum number of tokens field in the configuration of
your Ask an AI action. This parameter limits
the length of the generated response → and therefore directly your credit
consumption.
Limit to what is strictly necessary
For classification or short extraction, set a
low limit.
A 100 token response costs 10 times less than a 1 000 token response.
If you are extracting a category or an amount, 50 tokens are
more than enough.
Adapt the limit to the task type
Calibrate by scenario type, not globally.
JSON extraction50 to 200 tokens are enough
Reply suggestion100 to 300 tokens
Summary or report500 to 1 000 tokens
Refine your prompt to reduce input tokens
A short and precise prompt consumes fewer input tokens.
Avoid repetition and unnecessary explanations in
your question.
Good practice: ask the AI to reply
in JSON with short keys → this reduces both the
response length and makes mapping easier in your
TimeTonic fields.
💡 Simple rule: set the maximum number of tokens
to double what you expect as a response. If you want a
3 line reply (~60 words), set the limit to 150 tokens. You
keep a margin without wasting credits.
5
Test on a sample before deploying
Before deploying your automation across all of your data,
run it on a small and representative sample. The goal
is to validate the quality of the result and measure actual consumption.
1
Choose 5 to 10 representative cases
Vary the profiles: short documents, long ones, with or
without attachments, edge cases. The sample must
cover
the diversity of your actual data.
2
Check the quality of the result
Does the model produce the expected answer? If not, adjust
the role, the question, or switch models before continuing.
3
Record actual consumption in the logs
Compare the input and output tokens you observe with
your estimates. Adjust your token setting and your
model choice if necessary.
Once your sample has been validated, you can deploy at scale
with confidence → and with a reliable estimate of your monthly consumption.
6
Validate and monitor in the logs
Estimates are a good starting point, but nothing replaces
observing your actual consumption. After each execution,
TimeTonic records the exact number of consumed credits:
input and output, model by model, row by row.
This is the most reliable way to
validate your estimates,
compare two models on the same task, and
refine your choice before scaling up.
What you see in the logs
For each execution: the model used, the number of input tokens,
the number of output tokens, and the total consumed
credits.
Compare two models
Test the same task with Mistral Small and GPT·4o Mini for
example, then compare the actual cost in the logs. You can then
choose
with full awareness.
Refine before scaling
Once the execution cost has been validated on a real
sample, multiply it by your monthly volume for a reliable
projection of your consumption.