- A portal is slow or rate‑limited to log into, and you run many tasks against it.
- You get bursts of work for the same login (e.g. 30 requests three times a day).
- You want to control the order/concurrency in which a portal’s tasks run.
You do not need a dedicated container reserved 24/7. Idle containers are automatically
reclaimed after a few minutes of inactivity and a fresh one is started on the next burst —
so you only consume capacity while you’re actually using it.
Quick start
For the common case, the defaults are exactly what you want — just add"is_dedicated": true to your normal /inference request. That gives you one warm,
logged‑in container for the portal, with your tasks served one after another on it.
max_parallelism and per_login_parallelism are optional and both default to 1, so
the request above behaves identically to simply:
"is_dedicated": false (or omit it) for a normal, non‑dedicated run.
Request fields
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
is_dedicated | bool | false | no | Opt into dedicated mode. Set to true to use a warm, logged‑in container. |
max_parallelism | int | 1 | no | Maximum number of warm containers for this portal (service‑wide cap across all logins). |
per_login_parallelism | int | 1 | no | Maximum number of containers a single login may use before its extra tasks round‑robin onto those containers. |
unique_parameter_names | list[str] | [] | no | Which input parameters identify a “login” / queue. Each name must be one of the keys in input_parameters. See Controlling the queue. |
How it works
- A “login” is a value of your unique parameters. Whatever you list in
unique_parameter_namesbecomes the identity of a login (and its queue). Withunique_parameter_names: ["username"], each distinctusernameis its own login. per_login_parallelismcontrols how many warm containers one login may spread its tasks across. With the default1, a login uses a single warm container and its tasks run one at a time on it.max_parallelismis the portal‑wide ceiling on warm containers across all logins.- Idle reaping. A warm container that has had no work for a few minutes is shut down and its capacity returns to the shared pool. The next task for that login starts a fresh container (which logs in again). Smaller idle windows reclaim capacity faster; larger ones re‑login less often.
- Queue‑and‑return. If there’s no free capacity when your request arrives, the request is accepted and queued (not rejected, not blocked). It runs as soon as capacity frees up. Requests for a given portal are served in first‑in, first‑out order.
Examples
One login, a burst of tasks
Run many tasks for a single login on one warm container, served in order:username: the first request logs in, and the next 29 reuse
that warm session, running one after another.
To let that single login run a few tasks in parallel, raise both limits:
Multiple unique logins, isolated
Mark the login parameter as unique, and each login gets its own warm container — they don’t interfere and can run in parallel up tomax_parallelism:
beta@example.com, gamma@example.com, etc. Each distinct
username is a separate login with its own warm, logged‑in container, up to 5 containers
total for the portal.
More logins than capacity
Suppose the portal can hold at most 4 warm containers and you send 5 different logins withmax_parallelism: 5:
- The first 4 logins each get a warm container and start immediately.
- The 5th request is accepted and queued (status
queued). It is not an error. - It runs as soon as a container frees up — for example when one of the first 4 logins sits idle long enough to be reclaimed, returning a slot to the pool.
Controlling the queue
The value of your unique parameters is the queue key. By choosing what you mark asunique_parameter_names, you decide how requests are grouped onto warm containers — this lets
you schedule a portal’s traffic however your logic needs.
unique_parameter_names must reference parameters that exist in input_parameters. The queue
key is read from input_parameters, so the parameter you use as a queue key has to be declared
in the automation and sent in the request.-
One queue per login (default isolation). Mark the login parameter as unique. Each login
gets its own warm, logged‑in container.
-
One shared queue for the whole portal. Send no unique parameters
(
"unique_parameter_names": []). Every request funnels into a single warm container group and runs serialized — useful when a portal must never run two sessions at once. -
Custom grouping (different logins on one queue). Designate a dedicated routing parameter
(say
queue_key) in the automation and give the requests you want serialized together the samequeue_keyvalue — regardless of which login they use:Send another request withusername: ["beta@example.com"]but the samequeue_key: ["batch-1"], and both share one queue / container and run one at a time. Use a differentqueue_keyvalue to put work on a separate queue.
Things to take special care of
unique_parameter_namesmust be a subset ofinput_parameters. Names not present in yourinput_parametersare rejected.max_parallelismabove the portal’s real capacity is unreachable — the surplus requests queue rather than running concurrently.- Request limits are bounded server‑side. A request‑supplied
max_parallelismis clamped to a platform maximum. An admin‑configured portal (see below) is not clamped. - Limits are fixed per active portal. The first request that opens a portal’s reservation
sets its
max_parallelism/per_login_parallelism; later requests reuse those values until the portal goes fully idle and is reclaimed, after which a new burst can specify fresh limits. use_proxycannot be combined withis_dedicated. A request with both set is rejected.- Admin configuration takes precedence. If a portal is configured as dedicated on the
Optexity side for your account, it runs dedicated with those limits even if your request sends
"is_dedicated": false— you can opt in per request, but you cannot opt out of a portal that is configured dedicated.
FAQ
Does a queued request ever fail just because the portal is busy? No. When there’s no free capacity, the request is accepted with statusqueued and runs when a
slot frees up. It is never rejected for capacity reasons.
What happens to my warm session when there’s a lull?
After a few minutes with no work, the container is reclaimed and its capacity returns to the
pool. Your next request starts a fresh container and logs in again — which is why your
automation must handle the logged‑out state.
How do I run several logins at the same time?
Mark the login as a unique parameter and set max_parallelism to at least the number of
concurrent logins you need (and ensure the portal has that much capacity).