-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmap: invalid argument #8114
Comments
Which storage are you using? Was the application running 2.22 before the restart? |
this Kubernetes cluster is running in OpenStack and we are using cinder storage for prometheus volumes. Application is running 2.22 before restart. I think the problem here is now if something happens to virtualmachine (kubernetes node) where the prometheus is running. |
Probably /prometheus/chunks_head/000486 is an empty file. That should not happen using posix storage as we create them atomically. |
yeah its empty file. There are currently two ways to fix this issue what we have been using 1) delete all prometheus datas and start from the beginning 2) delete this single chunks_head file. (in both cases we restart prometheus after that) Could it be a solution that prometheus could handle these empty files itself automatically (like just remove/ignore them)? |
Turns out the "atomic write" that we did was not enough, faulty disk and abrupt shutdown combined can still cause this issue. The fix for this is already merged in master (#8061) which does a read repair. It might make sense to do a patch release v2.22.1 with it maybe, @brancz WDYT? (Should we be patching older versions? This issue exists from v2.19.x onwards) |
I am refreshing this issue quite many times per day and waiting for fix :) |
Should be fixed with #8061 |
As 2.22 is small, I think we can fix it here, I don't feel the need for a 2.21 point release. |
@codesome yeah but we use prometheus-operator, so I would not like to compile release myself |
I'll put cutting a patch release on my list for next week. |
We might want to include #8104, as that can cause scraping of Prometheus itself to fail. |
Fixed by #8061 |
This should prevent the problem (mmap: invalid argument) outlined in this issue: prometheus/prometheus#8114
What did you do? I am running prometheus operator in kubernetes cluster. However, we see daily that somewhere in our clusters the prometheus starts to crashloop. The reason for that is always the same "mmap: invalid argument".
Example:
What did you expect to see? I expect that prometheus does not crash so often.
What did you see instead? Under which circumstances?
Environment Kubernetes
running inside kubernetes, host OS is debian buster. The used docker image is quay.io/prometheus/prometheus:v2.22.0
prometheus, version 2.22.0 (branch: HEAD, revision: 0a7fdd3)
build user: root@6321101b2c50
build date: 20201015-12:29:59
go version: go1.15.3
platform: linux/amd64
Prometheus configuration file: https://gist.github.com/zetaab/12bc84b99dc54ccd72ed01d32ab0077d
Logs:
We are running chaosmonkey in our Kubernetes clusters, which means that we are testing highly availability of the applications by removing single nodes regurarly two times a week. And it looks like that can be one of the issues, so it seems that prometheus cannot handle such situations. Cloud native applications should handle situations like this.
The text was updated successfully, but these errors were encountered: