How to debug a SMTPServerDisconnected which doesn't also have any error code?

I need to get off google workspaces as my email sending platform, so I’m trying out free trials for paid email sending services. The first one I’m trying is SendGrid.

My working configuration was:

RUN_SMTP: true
SMTP_HOST: smtp.gmail.com
SMTP_PASSWORD: The Password
SMTP_PORT: 587
SMTP_USERNAME: username@gmail.com
SMTP_USE_SSL: false
SMTP_USE_TLS: true

Sendgrid has instructions on how to manually telnet to their SMTP server and send an email using the username “apikey” and a base64ed API key string as the password. I was able to successfully complete those instructions and see my mail received by the destination address.

I translate this into Open edX it is like this:

RUN_SMTP: true
SMTP_HOST: smtp.sendgrid.net
SMTP_PASSWORD: TheLongBase64PasswordIUsedWhenDoingItManually
SMTP_PORT: 587
SMTP_USERNAME: apikey
SMTP_USE_SSL: false
SMTP_USE_TLS: true

(Yes, the username is supposed to literally be “apikey”.) The problem of course is that these emails don’t send. I see the following error in my tutor palm logs:

lms-worker-1         | 2024-05-07 21:37:23,822 INFO 12 [edx.celery.task] [user None] [ip None] tasks.py:288 - Preparing to send email 83 to 3 recipients as subtask b24cd731-5ac5-4c33-a19a-08ba12576744 for instructor task 1020: context = {'course_title': 'TestCourse', 'course_root': '/courses/course-v1:myOrg+AClass+v1/', 'course_language': 'en', 'course_url': 'https://myserver.com/courses/course-v1:myOrg+AClass+v1/', 'course_image_url': 'https://myserver.com/asset-v1:myOrg+AClass+v1+type@asset+block@images_course_image.jpg', 'course_end_date': '', 'account_settings_url': 'https://myserver.com/account/settings', 'email_settings_url': 'https://myserver.com/dashboard', 'logo_url': 'https://myserver.com/theming/asset/images/logo.png', 'platform_name': 'myOrg', 'year': 2024}, status=SubtaskStatus<{'task_id': 'b24cd731-5ac5-4c33-a19a-08ba12576744', 'attempted': 0, 'succeeded': 0, 'failed': 0, 'skipped': 0, 'retried_nomax': 0, 'retried_withmax': 0, 'state': 'QUEUING'}>, time=2024-05-07 21:37:23.822291
mysql-1              | 2024-05-07T21:37:23.837269Z 68 [Warning] [MY-013360] [Server] Plugin mysql_native_password reported: ''mysql_native_password' is deprecated and will be removed in a future release. Please use caching_sha2_password instead'
lms-worker-1         | 2024-05-07 21:37:23,843 INFO 12 [edx.celery.task] [user None] [ip None] tasks.py:476 - BulkEmail ==> Task: 604bbdf5-4ca0-4322-9ebd-04307339d475, SubTask: b24cd731-5ac5-4c33-a19a-08ba12576744, EmailId: 83, TotalRecipients: 3
lms-worker-1         | 2024-05-07 21:37:23,930 INFO 12 [edx.celery.task] [user None] [ip None] tasks.py:730 - Task b24cd731-5ac5-4c33-a19a-08ba12576744: Successfully sent to 0 users; failed to send to 0 users (and skipped 0 users)
lms-worker-1         | 2024-05-07 21:37:23,931 WARNING 12 [edx.celery.task] [user None] [ip None] tasks.py:755 - Task b24cd731-5ac5-4c33-a19a-08ba12576744: email with id 83 not delivered due to transient error Connection unexpectedly closed, retrying send to 3 recipients in 63.297611954401276 seconds (with max_retry=5)
lms-worker-1         | 2024-05-07 21:37:23,932 INFO 12 [edx.celery.task] [user None] [ip None] subtasks.py:519 - Preparing to update status for subtask b24cd731-5ac5-4c33-a19a-08ba12576744 for instructor task 1020 with status SubtaskStatus<{'task_id': 'b24cd731-5ac5-4c33-a19a-08ba12576744', 'attempted': 0, 'succeeded': 0, 'failed': 0, 'skipped': 0, 'retried_nomax': 0, 'retried_withmax': 1, 'state': 'RETRY'}>
lms-worker-1         | 2024-05-07 21:37:23,937 INFO 12 [edx.celery.task] [user None] [ip None] subtasks.py:575 - Task output updated to {"action_name": "emailed", "attempted": 0, "failed": 0, "skipped": 0, "succeeded": 0, "total": 3, "duration_ms": 153, "start_time": 1715117843.7811036} for subtask b24cd731-5ac5-4c33-a19a-08ba12576744 of instructor task 1020
lms-worker-1         | 2024-05-07 21:37:23,963 INFO 12 [edx.celery.task] [user None] [ip None] tasks.py:784 - Task b24cd731-5ac5-4c33-a19a-08ba12576744: email with id 83 caused send_course_email task to retry again.
lms-worker-1         | 2024-05-07 21:37:23,964 INFO 12 [edx.celery.task] [user None] [ip None] tasks.py:314 - BulkEmail ==> _send_course_email completed in : 0.1217036247253418 for task : b24cd731-5ac5-4c33-a19a-08ba12576744 with recipient count: 3
lms-worker-1         | 2024-05-07 21:37:23,964 WARNING 12 [edx.celery.task] [user None] [ip None] tasks.py:339 - Send-email task b24cd731-5ac5-4c33-a19a-08ba12576744 for email 83: being retried
lms-worker-1         | 2024-05-07 21:37:23,964 INFO 1 [celery.worker.strategy] [user None] [ip None] strategy.py:161 - Task lms.djangoapps.bulk_email.tasks.send_course_email[b24cd731-5ac5-4c33-a19a-08ba12576744] received
lms-worker-1         | 2024-05-07 21:37:23,968 INFO 12 [celery.app.trace] [user None] [ip None] trace.py:131 - Task lms.djangoapps.bulk_email.tasks.send_course_email[b24cd731-5ac5-4c33-a19a-08ba12576744] retry: Retry in 63.297611954401276s: SMTPServerDisconnected('Connection unexpectedly closed')

The key error seems to be SMTPServerDisconnected('Connection unexpectedly closed') There’s only one hit for that error on the forums, but that doesn’t seem to be the same because it has a SMTPSenderRefused error, with an SMTP status code, which would be useful for diagnosing what’s going wrong. But I don’t see that anywhere, nor any SMTP status code.

How can I diagnose such a "Connection unexpectedly closed error when I don’t get any error code?

(Because I can’t contact SendGrid support if I don’t even have an SMTP error code…
Also FWIW I can confirm I can “telnet smtp.sendgrid.net 587” just fine from the server, so it doesn’t seem to be a basic connectivity/routing issue.)

Do you have your own domain name (and an address in that domain) that you’ve verified in SendGrid?
As far as I know SendGrid won’t send as a Gmail address so you should have your own domain name that you control the DNS for.
Since you’ll be in control of your domain’s DNS you should also set SPF/DKIM/DMARC records to ensure proper delivery, otherwise recipient’s spam filters are likely to just blast all your emails.

Have you also updated the tutor config CONTACT_EMAIL value to be your new (not Gmail) email address?
By default your DEFAULT_FROM_EMAIL = CONTACT_EMAIL so if you haven’t updated this then SendGrid is probably trying to send as your old Gmail address which isn’t verified on SendGrid, or even if you “can” verify it, it’s likely to fail a DMARC check anyway (Single Sender Verification | Twilio)

Disclaimer: never used sendgrid myself so I’m just trying to piece something together for you, I might be way off :slight_smile:

Yes, I have my own domain. I probably shouldn’t have used gmail as a shorthand, but I’m actually using google workspaces as the email provider, but it just still uses smtp.gmail.com and is subject to all the gmail rate limits etc. I corrected the original post.

I’ve set up all the SPF/DKIM/DMARC stuff and it shows up as verified in their UI:

And just in general I wouldn’t exact those sorts of problems to prevent me from even connecting to the SMTP server either.

I also tried using a username of the base64-encoded “apikey” (YXBpa2V5) like they make you use during manual sending tests, but that didn’t help either, same error.

I did already have a CONTACT_EMAIL set, but it was actually set to info@mydomain.com rather than a noreply@mydomain.com. So I updated that and that doesn’t seem to have made a difference.

However, I just tried sending an email via replying to a test forum post, and that generated this much more helpful error message:

lms-worker-1         | 2024-05-08 14:44:47,567 ERROR 11 [edx_ace.channel.django_email] [user 5] [ip 127.0.0.1] django_email.py:70 - Connection unexpectedly closed
lms-worker-1         | Traceback (most recent call last):
lms-worker-1         |   File "/openedx/venv/lib/python3.8/site-packages/edx_ace/channel/django_email.py", line 68, in deliver
lms-worker-1         |     mail.send()
lms-worker-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/mail/message.py", line 284, in send
lms-worker-1         |     return self.get_connection(fail_silently).send_messages([self])
lms-worker-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/mail/backends/smtp.py", line 102, in send_messages
lms-worker-1         |     new_conn_created = self.open()
lms-worker-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/mail/backends/smtp.py", line 69, in open
lms-worker-1         |     self.connection.login(self.username, self.password)
lms-worker-1         |   File "/opt/pyenv/versions/3.8.15/lib/python3.8/smtplib.py", line 737, in login
lms-worker-1         |     (code, resp) = self.auth(
lms-worker-1         |   File "/opt/pyenv/versions/3.8.15/lib/python3.8/smtplib.py", line 640, in auth
lms-worker-1         |     (code, resp) = self.docmd("AUTH", mechanism + " " + response)
lms-worker-1         |   File "/opt/pyenv/versions/3.8.15/lib/python3.8/smtplib.py", line 430, in docmd
lms-worker-1         |     return self.getreply()
lms-worker-1         |   File "/opt/pyenv/versions/3.8.15/lib/python3.8/smtplib.py", line 403, in getreply
lms-worker-1         |     raise SMTPServerDisconnected("Connection unexpectedly closed")
lms-worker-1         | smtplib.SMTPServerDisconnected: Connection unexpectedly closed
lms-worker-1         | 2024-05-08 14:44:47,570 INFO 11 [celery.app.trace] [user None] [ip None] trace.py:131 - Task lms.djangoapps.discussion.tasks.send_ace_message[f646a378-fde7-4d99-9ac8-316e23a7284c] succeeded in 0.3072065089363605s: None

The thing that’s specifically suspicious about that message to me is the self.docmd("AUTH", mechanism + " " + response) line. In Sendgrid’s manual sending tests you need to send an “AUTH LOGIN” string.

Do you think that python line would create an “AUTH LOGIN”, or something else (which consequently SendGrid would reject)?