Security: Patch for rate-limiting in edx-platform

This is a security announcement for rate-limiting functionality within edxapp (LMS and CMS).

There are two parts:

  1. New code that properly reports the IP address of the caller: edx-platform/ip.py at master · openedx/edx-platform · GitHub
  2. A configuration change you will need to apply in order for the patch to be effective.

IMPORTANT: It may not be safe to just patch/upgrade without carefully choosing a configuration. While the defaults may work for some deployments, they may not be appropriate for yours, and could cause large amounts of normal traffic to be rejected.

The quick version

If all of the proxies you have in front of the LMS/CMS (such as nginx, caddy, an ELB, etc.) are on the same network as each other (i.e. traffic between them is on private-range IPs), then you can probably get away with configuring CLOSEST_CLIENT_IP_FROM_HEADERS as an empty list, and you may be able to set REST_FRAMEWORK.NUM_PROXIES to the number of proxies. However, incorrect configuration of NUM_PROXIES may result in incomplete protection or rejecting a large number of requests with 400 or 429 errors.

If instead you use a CDN such as Cloudflare, Akamai, Fastly, etc. (or may in the future), or you have low risk tolerance… there is no quick version. Read on.

The vulnerability

A number of views within edxapp, such as the password reset view, have rate-limiting to prevent abuse. Some of the rate-limiting is based on IP address. However, determination of the “true” client IP address is not possible to do correctly with defaults, and until now edxapp has not provided a way to configure client IP determination.

Fundamentally, the problem is that edxapp does not know how many proxies you have put it behind, and did not have a way for you to tell it. Instead, it assumed that the leftmost IP address in the X-Forwarded-For header is the client’s IP. However, the caller can always fake this by sending an X-Forwarded-For header of their own. This allows all rate-limiting to be bypassed. Consequences could include brute-forcing of logins, spamming users with password reset attempts, and other moderate-severity issues.

I’ll refer to the X-Forwarded-For header as “XFF” in the remainder of this announcement. For more background information on XFF and rate-limiting, see these two recent posts:

The fix

The patch adds new code that analyzes the XFF and determines the “best” choice of client IP. It also changes how REMOTE_ADDR is overwritten, although that overwrite is likely to be removed in the future.

You will need to configure the new CLOSEST_CLIENT_IP_FROM_HEADERS setting in order to support correct usage of the django-ratelimit package. The change to REMOTE_ADDR incidentally patches the ratelimitbackend package that was used in Maple (but is not used since then) and any other code that relies on that override.

You will separately have to configure the django-rest-framework package’s throttling feature via its NUM_PROXIES setting; the patch contains a small bit of code to permit this.

Unfortunately, the state of IP address determination across the industry is really quite poor, and almost all tools that deal with XFF (and IP determination in general) do so incorrectly. Furthermore, about half of the commonly used proxies used in web services also handle XFF incorrectly, provide bad information in headers, or allow spoofing by default. Some of the complexity of upgrading to this patch is a result of this situation. In order to configure edxapp (or any web application) correctly, you will need to understand some of the minutiae of your proxies and be on the lookout for pitfalls.

Patching

If you are not using the latest master or open-release/maple.master branch, it is recommended that you first update to the following foundational commits:

  • master: c3bc68a & 813b403
  • Maple: e2b863db (a combined backport of the above)

These add metrics code to XForwardedForMiddleware that will assist you in configuration by reporting the length of the IP chain observed in actual requests. If you do not have access to these metrics, you can instead temporarily patch edx-platform to log the counts.

Before patching, you will need to configure CLOSEST_CLIENT_IP_FROM_HEADERS. Documentation on how to do this is contained within the patch itself, in the module docstring of openedx.core.djangoapps.util.ip and in the setting annotations for CLOSEST_CLIENT_IP_FROM_HEADERS.

You will also want to configure django-rest-framework’s NUM_PROXIES setting (under REST_FRAMEWORK). In the new module’s docstring, there’s a description of how to use a proxy count for looking up the client IP in the XFF (for deployments that can’t rely on a trusted header.) Even if you don’t use that approach for CLOSEST_CLIENT_IP_FROM_HEADERS, you’ll still need to know it for NUM_PROXIES, since DRF has very limited configurability here. A special note: The nginx settings in the openedx/configuration repo currently have nginx either overwrite the header or pass it unchanged. (This is expected to change in a future release.) If nginx is overwriting it, you can say that you have one proxy (as far as XFF is concerned); if it is passing it unchanged, you will instead have to subtract 1 from the expected proxy count.

Finally, you can upgrade to the latest commits. For reference, the PRs adding them are:

Once you set your configuration and upgrade, it is recommended that you monitor for increased rates of HTTP 4XX errors. This may indicate a misconfiguration.

(Suggestions on how to simplify the configuration and simplify or streamline the documentation are also welcome.)

Additional precautions

These settings are sensitive to changes in your network configuration over time. Adding, substituting, or removing a proxy may expose you to risk of either over-aggressive rate-limiting or loss of rate-limiting protection. It may be wise to add a mention of these settings to your network configuration documentation. Monitoring on HTTP 429 errors may alert you of either a thwarted attack or of an uncoordinated network configuration change. If most of your users are not using (external) proxies to reach your site, then the best option is to monitor the average value of the ip_chain.external.count metric for some high-traffic, low-sensitivity URL such as /dashboard. If it deviates substantially from 1, then you are likely misconfigured. (This is not guaranteed to catch all misconfigurations, however.)

If users can bypass any of your proxies, then they may be able to evade any rate-limiting or other security-sensitive uses of IP addresses. One possibility is to have your CDN send a secret-valued header so that your application (or nginx, etc.) can refuse all requests that do not bear the header. Depending on your risk tolerance, it may be insufficient to just check that an inbound request is coming from your CDN’s IP ranges, as an attacker could set up their own instance of the same CDN, and may be able to send their traffic through that node instead with altered headers. (See discussion of re-fronting attack in the adam-p.ca post.)

When looking at logs or metrics, you are also likely to see some requests with a smaller than usual XFF header. While this may be a sign that traffic is bypassing your proxy, you should also expect some normal traffic from load balancer health checks, inter-service communication, and other internal traffic.

1 Like

@Tim_McCormack I know its been long since you posted this, but recenly when I was running Open edX master with tutor nightly, I came across ratelimit error when I shut down Redis. and I had to turn Redis on again in order to get away with ratelimit error.
Do you may be know if this is relevant (since Redis is kinda of proxy?)

Hmm… I think the state information for at least one of the ratelimiting packages is held in memcache; maybe another one uses Redis, and gets confused if Redis is not available. Or, some page loads take longer when Redis is off… but I’m not aware of the libraries limiting concurrent in-flight requests by the same user (possible, though.)

Other than that speculation, sorry, doesn’t ring a bell. (Can you say more about redis and proxies? I didn’t understand that part.)

Based on your note and after doing a bit of research it seems yes Django Ratelimit will use whichever Cache backend is configured (see ref below). and By deafault I guess Django uses Memcache. And in our context where tutor uses Redis by default; I don’t think tutor would cascade to Memcache, there will be just incorrect cache backend settings.

To test for my hpythoesis I would just need to set cache settings when I turn of Redis and if then ratelimit error is disappeard then it should be the case.

Don’t worry about that, may be I was missing concepts up. Your speculation above looks correct.

Ref:

After activating django-ratelimit, you should ensure that your cache backend is setup to be both persistent and work across multiple deployment worker instances (for instance UWSGI workers). Read more in the Django docs on caching.

https://django-ratelimit.readthedocs.io/en/stable/#quickstart