Skip to content

/users endpoint slow with security enabled due to per-instance User#admin? call re-loads fanout #285

@alexskr

Description

@alexskr

Summary

After the recent agroportal sync, GET /users?include=all returns within ~10s
when security is disabled but takes ~15 minutes when security is enabled.
Root cause is in User#admin? combined with the new serialize_filter
introduced for lastLoginAt: admin? is invoked once per serialized user,
and each call re-loads the requesting user's role on the same instance —
an N+1 association reload pattern.

A second, related issue: the auth middleware loads the requesting user with
User.attributes(:all), which includes inverse attributes (:provisionalClasses)
that fan out across the entire ProvisionalClass graph on every authenticated
request.

Reproduction

With security enabled and ~21k users in the triplestore:

time curl -s -H 'Authorization: apikey token=YOUR_APIKEY' \
  'https://data.bioontology.org/users?include=all' >/dev/null

→ ~15 minutes to complete.

With security disabled, the same call returns in ~10s.

Root cause analysis

1. User#admin? is called once per serialized user

The new serialize_filter (models/users/user.rb:49) added during the agroportal
sync calls Thread.current[:remote_user]&.admin? per User instance:

def self.show_lastLoginAt?(attrs)
  unless Thread.current[:remote_user]&.admin?
    return attrs - [:lastLoginAt]
  end
  return attrs
end

Without security: Thread.current[:remote_user] is nil and &.admin?
short-circuits. Cost: nanoseconds.

With security: admin? runs once per user being serialized — 21,155 invocations
on the same Thread.current[:remote_user] instance.

2. admin? is not idempotent

User#admin? calls bring(role: [:role]) unconditionally, and Goo's bring
nullifies @role and re-loads the association every call (see
goo/lib/goo/base/resource.rb:179-198). The result is a classic N+1
association reload pattern: the same role on the same
Thread.current[:remote_user] instance is reloaded once for each user
being serialized, all serial, before the response can finish.

3. Auth middleware loads inverse attributes

Security::Authorization#authorized? (lib/ontologies_linked_data/security/authorization.rb:105-107)
loads the requesting user with:

.include(LinkedData::Models::User.attributes(:all))

User.attributes(:all) includes :provisionalClasses, which is an inverse
attribute (inverse: { on: :provisional_class, attribute: :creator }).
Loading it for one user requires scanning the full ProvisionalClass graph
to find every class with that creator. For active users this adds significant
cost to every authenticated request.

(There's also a latent bug here: APIKEYS_FOR_AUTHORIZATION is declared
but never written to, so the auth load runs on every request even for the
same apikey. Out of scope for this fix.)

Diagnostic timeline

Confirmed via instrumented timing on a real triplestore (21,155 users):

load           =  9703.4ms  ← User.where.include(...).all
check_access   =    17.6ms
filter         =     0.1ms
serialize      =  ~15 min   ← completed eventually; exact timing not captured

Commenting out show_lastLoginAt? in filter_attributes made the request
return in ~10s, isolating the regression to the per-user admin? call.

Fix

Two narrow changes:

models/users/user.rb — make admin? idempotent

Guard bring with bring? at both the User and Role levels. The auth
middleware loads :role shallowly (User#role only, not nested
{role: [:role]}), so we must also check whether the Role objects' own
:role attribute has been loaded:

def admin?
  return false unless persistent?
  if bring?(:role) || (instance_variable_defined?(:@role) && Array(@role).any? { |r| r.bring?(:role) })
    bring(role: [:role])
  end
  return false if role.empty?
  role.map { |r| r.role }.include?(LinkedData::Models::Users::Role::ADMIN)
end

Effect: per-user admin? calls become no-ops after the first; one role
load per request instead of N+1.

security/authorization.rb — exclude inverse attrs from auth load

attrs_to_load = LinkedData::Models::User.attributes(:all) - LinkedData::Models::User.attributes(:inverse)
user = LinkedData::Models::User.where(apikey: apikey).include(attrs_to_load).first

Effect: removes inverse fanout from every authenticated request. Comment
in code includes a TODO noting this could be tightened further to an
explicit allowlist (:username, :customOntology, role: [:role] are the
only attrs middleware/access-control reads on every request).

Related

  • Underlying bring non-idempotency in goo is a footgun affecting other
    code paths beyond User#admin?. Worth filing separately against
    ncbo/goo so other bring(...) callers benefit.
  • APIKEYS_FOR_AUTHORIZATION cache is declared-but-never-populated.
    Separate, independent fix.
  • The original UI behavior of calling /users?include=all during signup
    (in bioportal_web_ui) is a separate problem: even with this fix the
    endpoint is ~10s for 21k users. Pagination on /users is planned;
    the UI should switch to scoped queries.

Tests

Adds test_admin_with_shallow_role_load to test/models/user/test_user.rb
covering the auth-middleware load shape and the idempotency contract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions