This is a stretch, but after days of trying to figure it out, I’ve made zero progress so figured I might as well ask here.
We’re in the middle of migrating our company’s infrastructure for one of our products from Azure to AWS. Part of this has involved migrating our Kafka cluster (housing 100s of millions of events) to MSK. Now we need to index all of that data into the new ES instance using Logstash.
While MirrorMaker was able to saturate our old cluster’s gigabit connection to migrate the Kafka data over, we’re only able to index that data into ES at a rate of ~150/min. We realistically need to be indexing at a rate of 50000/min, or at least 10000/min. Our resource utilization on ES is sitting at ~3% for CPU and RAM on both the master and data nodes. Sharding is configured to be identical to our old setup, and Logstash has been configured exactly the same too.
The only difference between the two clusters is that the one on AWS is using never versions of everything. I’d appreciate any input that anyone might have.