Summary
After the recent agroportal sync, GET /users?include=all returns within ~10s
when security is disabled but takes ~15 minutes when security is enabled.
Root cause is in User#admin? combined with the new serialize_filter
introduced for lastLoginAt: admin? is invoked once per serialized user,
and each call re-loads the requesting user's role on the same instance —
an N+1 association reload pattern.
A second, related issue: the auth middleware loads the requesting user with
User.attributes(:all), which includes inverse attributes (:provisionalClasses)
that fan out across the entire ProvisionalClass graph on every authenticated
request.
Reproduction
With security enabled and ~21k users in the triplestore:
time curl -s -H 'Authorization: apikey token=YOUR_APIKEY' \
'https://data.bioontology.org/users?include=all' >/dev/null
→ ~15 minutes to complete.
With security disabled, the same call returns in ~10s.
Root cause analysis
1. User#admin? is called once per serialized user
The new serialize_filter (models/users/user.rb:49) added during the agroportal
sync calls Thread.current[:remote_user]&.admin? per User instance:
def self.show_lastLoginAt?(attrs)
unless Thread.current[:remote_user]&.admin?
return attrs - [:lastLoginAt]
end
return attrs
end
Without security: Thread.current[:remote_user] is nil and &.admin?
short-circuits. Cost: nanoseconds.
With security: admin? runs once per user being serialized — 21,155 invocations
on the same Thread.current[:remote_user] instance.
2. admin? is not idempotent
User#admin? calls bring(role: [:role]) unconditionally, and Goo's bring
nullifies @role and re-loads the association every call (see
goo/lib/goo/base/resource.rb:179-198). The result is a classic N+1
association reload pattern: the same role on the same
Thread.current[:remote_user] instance is reloaded once for each user
being serialized, all serial, before the response can finish.
3. Auth middleware loads inverse attributes
Security::Authorization#authorized? (lib/ontologies_linked_data/security/authorization.rb:105-107)
loads the requesting user with:
.include(LinkedData::Models::User.attributes(:all))
User.attributes(:all) includes :provisionalClasses, which is an inverse
attribute (inverse: { on: :provisional_class, attribute: :creator }).
Loading it for one user requires scanning the full ProvisionalClass graph
to find every class with that creator. For active users this adds significant
cost to every authenticated request.
(There's also a latent bug here: APIKEYS_FOR_AUTHORIZATION is declared
but never written to, so the auth load runs on every request even for the
same apikey. Out of scope for this fix.)
Diagnostic timeline
Confirmed via instrumented timing on a real triplestore (21,155 users):
load = 9703.4ms ← User.where.include(...).all
check_access = 17.6ms
filter = 0.1ms
serialize = ~15 min ← completed eventually; exact timing not captured
Commenting out show_lastLoginAt? in filter_attributes made the request
return in ~10s, isolating the regression to the per-user admin? call.
Fix
Two narrow changes:
models/users/user.rb — make admin? idempotent
Guard bring with bring? at both the User and Role levels. The auth
middleware loads :role shallowly (User#role only, not nested
{role: [:role]}), so we must also check whether the Role objects' own
:role attribute has been loaded:
def admin?
return false unless persistent?
if bring?(:role) || (instance_variable_defined?(:@role) && Array(@role).any? { |r| r.bring?(:role) })
bring(role: [:role])
end
return false if role.empty?
role.map { |r| r.role }.include?(LinkedData::Models::Users::Role::ADMIN)
end
Effect: per-user admin? calls become no-ops after the first; one role
load per request instead of N+1.
security/authorization.rb — exclude inverse attrs from auth load
attrs_to_load = LinkedData::Models::User.attributes(:all) - LinkedData::Models::User.attributes(:inverse)
user = LinkedData::Models::User.where(apikey: apikey).include(attrs_to_load).first
Effect: removes inverse fanout from every authenticated request. Comment
in code includes a TODO noting this could be tightened further to an
explicit allowlist (:username, :customOntology, role: [:role] are the
only attrs middleware/access-control reads on every request).
Related
- Underlying
bring non-idempotency in goo is a footgun affecting other
code paths beyond User#admin?. Worth filing separately against
ncbo/goo so other bring(...) callers benefit.
APIKEYS_FOR_AUTHORIZATION cache is declared-but-never-populated.
Separate, independent fix.
- The original UI behavior of calling
/users?include=all during signup
(in bioportal_web_ui) is a separate problem: even with this fix the
endpoint is ~10s for 21k users. Pagination on /users is planned;
the UI should switch to scoped queries.
Tests
Adds test_admin_with_shallow_role_load to test/models/user/test_user.rb
covering the auth-middleware load shape and the idempotency contract.
Summary
After the recent agroportal sync,
GET /users?include=allreturns within ~10swhen security is disabled but takes ~15 minutes when security is enabled.
Root cause is in
User#admin?combined with the newserialize_filterintroduced for
lastLoginAt:admin?is invoked once per serialized user,and each call re-loads the requesting user's role on the same instance —
an N+1 association reload pattern.
A second, related issue: the auth middleware loads the requesting user with
User.attributes(:all), which includes inverse attributes (:provisionalClasses)that fan out across the entire ProvisionalClass graph on every authenticated
request.
Reproduction
With security enabled and ~21k users in the triplestore:
→ ~15 minutes to complete.
With security disabled, the same call returns in ~10s.
Root cause analysis
1.
User#admin?is called once per serialized userThe new
serialize_filter(models/users/user.rb:49) added during the agroportalsync calls
Thread.current[:remote_user]&.admin?per User instance:Without security:
Thread.current[:remote_user]isniland&.admin?short-circuits. Cost: nanoseconds.
With security:
admin?runs once per user being serialized — 21,155 invocationson the same
Thread.current[:remote_user]instance.2.
admin?is not idempotentUser#admin?callsbring(role: [:role])unconditionally, and Goo'sbringnullifies
@roleand re-loads the association every call (seegoo/lib/goo/base/resource.rb:179-198). The result is a classic N+1association reload pattern: the same role on the same
Thread.current[:remote_user]instance is reloaded once for each userbeing serialized, all serial, before the response can finish.
3. Auth middleware loads inverse attributes
Security::Authorization#authorized?(lib/ontologies_linked_data/security/authorization.rb:105-107)loads the requesting user with:
User.attributes(:all)includes:provisionalClasses, which is an inverseattribute (
inverse: { on: :provisional_class, attribute: :creator }).Loading it for one user requires scanning the full ProvisionalClass graph
to find every class with that creator. For active users this adds significant
cost to every authenticated request.
(There's also a latent bug here:
APIKEYS_FOR_AUTHORIZATIONis declaredbut never written to, so the auth load runs on every request even for the
same apikey. Out of scope for this fix.)
Diagnostic timeline
Confirmed via instrumented timing on a real triplestore (21,155 users):
Commenting out
show_lastLoginAt?infilter_attributesmade the requestreturn in ~10s, isolating the regression to the per-user
admin?call.Fix
Two narrow changes:
models/users/user.rb— makeadmin?idempotentGuard
bringwithbring?at both the User and Role levels. The authmiddleware loads
:roleshallowly (User#role only, not nested{role: [:role]}), so we must also check whether the Role objects' own:roleattribute has been loaded:Effect: per-user
admin?calls become no-ops after the first; one roleload per request instead of N+1.
security/authorization.rb— exclude inverse attrs from auth loadEffect: removes inverse fanout from every authenticated request. Comment
in code includes a TODO noting this could be tightened further to an
explicit allowlist (
:username, :customOntology, role: [:role]are theonly attrs middleware/access-control reads on every request).
Related
bringnon-idempotency in goo is a footgun affecting othercode paths beyond
User#admin?. Worth filing separately againstncbo/gooso otherbring(...)callers benefit.APIKEYS_FOR_AUTHORIZATIONcache is declared-but-never-populated.Separate, independent fix.
/users?include=allduring signup(in
bioportal_web_ui) is a separate problem: even with this fix theendpoint is ~10s for 21k users. Pagination on
/usersis planned;the UI should switch to scoped queries.
Tests
Adds
test_admin_with_shallow_role_loadtotest/models/user/test_user.rbcovering the auth-middleware load shape and the idempotency contract.