Talk Maintenance

May not have the same views as stan but looks like we’ll also need to plan for a major upgrade too. 2.1 is out and we’re drastically behind.

Just something else to have on our radar. In the mean time looks like sidekiq can be pulled out of the stack as its own container and deployed outside the main system. If we make talk into a docker swarm then we can dedicate a few temporary vms just for the message queue and take them out of the swarm when its done processing.

Deployment script

Borrowing from my devops talk repository:

#!/bin/bash
# build.sh - create a docker swarmmode cluster then deploy a service stack
# Copyright (c)2017 Dwight Spencer <[email protected]>, All Rights Reserved.

ENVIRONMENT=${ENVIRONMENT:-"stage"}
MAXWORKERS=${MAXWORKERS:-"5"}
MAXMASTERS=${MAXMASTERS:-"3"}
SERVICES=${SERVICES:-"services.yml"}

alias swarm="docker swarm"
alias machine="docker-machine"
alias compose="docker stack deploy --docker-compose"

dmcreate() {
  machine create --engine-storage-driver overlay 2 -d generic $@
}

connect() {
  local max=$1
  local name=$2
  local command=$3
  local token=$4
  local master=${5:-""}

  for x in `seq 1 ${max}`; do
    eval $(machine env ${ENVIRONMENT}.${name}-${x})
    RHOST=$(machine ip active)
    swarm ${command} --advertise-addr $RHOST --listen-addr $RHOST --token ${token} ${master}
  done
}

#vpc=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 | jq -r ".[].VpcId")
#aws ec2 create-subnet --vpc-id $vpc --cidr-block 10.0.0.0/24
#aws ec2 create-subnet --vpc-id $vpc --cidr-block 10.0.1.0/24
#igateway=$(aws ec2 create-internet-gateway --vpc-id $vpc | jq -r ".[].InternetGatewayId")
#ngateway=$(aws ec2 create-nat-gateway --vpc-id $vpc | jq -r ".[].NatGatewayId")
#iroute=$(aws ec2 create-route-table --vpc-id $vpc | jq -r ".[].RouteTableId")
#aws ec2 create-route --route-table-id $iroute --destination-ipv4-cidr-block 0.0.0.0/0 --gateway-id $igateway
#aws ec2 security group ...

for x in `seq 1 ${MAXMASTERS}`; do dmcreate ${ENVIRONMENT}.master-$x; done
for x in `seq 1 ${MAXWORKERS}`; do dmcreate ${ENVIRONMENT}.worker-$x; done

eval $(docker-machine env ${ENVINONMENT}.master-1)

export master_ip=$(docker-machine ip active)
swarm join --advertise-addr ${master_ip} --listen-addr ${master_ip}

export manager_token=$(swarm join-token manager -q)
export worker_token=$(swarm join-token worker -q)

connect ${MAXMASTER} "master" "init" ${master_token} ""
connect ${MAXWORKER} "worker" "join" ${worker_token} "${master_ip}"

compose services.yml ${ENVIRONMENT}

Looked with Stan at lunch, there were 450,000 queue jobs waiting because SMTP authentication kept failing.

Turns out the IP address somehow changed, so SmartPost was no longer allowing us to send.

If you get old emails, sorry, but it’s slightly out of reach to clear them out.

5 Likes

Whatever it takes to flush that toilet just do it.

1 Like

Down to 400,000 enqueued already.

… 300,000

… 200,000 (looks like ~100,000/hour)

1 Like

That’s spammer’s numbers right there.

1 Like

Got a bunch of Summary emails. (not complaining) Looks like its working!

Screenshot_20180606-153248_Samsung Internet

Something is working again, and I’ve received some email notifications as well so, yay y’all did it? Maybe?

Turns out the IP address somehow changed

That’s the very nature of AWS when VMs get turned off. We now have a script that would auto update cloudflare when/if there is a change. So if we’re able to have them verify via DKIM and/or host name then great if not… well that means we need to setup some Elastic IPs and Loadballencers which that gets massively costly on their own.

That is not the nature of EC2, with EC2 you have Elastic IPs and Network Interfaces. Either of those resources can be assigned to any EC2 instance. If you destory an instance or swap to a new one, you can easily move the Elastic IP or Network Interface.

EC2 does not autogenerate new ip addresses when VMs get turned off.

To Quote AWS themselves[¹]:

When you stop/start your instance, the IP address will change. Unfortunately there is no way to avoid this scenario in EC2. If you reboot the instance (within the Operating System, or via the console), it will keep the same IP addresses. Unfortunately it is not possible for us to reassign the address to your instance as that address would have been released back into the pool used by other EC2 instances.

If you want to avoid this issue in future, depending on your needs:
If you only need a fixed public IP address, you can assign an Elastic IP address to your instance. Further information on Elastic IP addresses is available from here.
If you need both public and private IP addresses to remain the same throughout the lifetime of the instance, you can launch your instance in VPC instead. The private IP address assigned to an instance in VPC remains with the instance through to termination.

Thus the states of an IP can be:

  • Reboot instance: keeps IP Address
  • Stop then Start instance: new IP Address on startup
  • Shutdown instance: loses IP Address
  • Terminate instance: loses IP Address

Therefore anything operation that makes the VM go into ACPI state S4, S5, S6 will drop the address, anything like S0, S1, or S3 would keep the address.

That is outdated inaccurate information from before EC2 used their VPC configuration.

ok, so how did the IP address change when the vm was turned off?

You’re right, that would definitely be why. Great catch. I was very confused :slight_smile:

We should plan another migration to move them to Elastic IP’s.

@Draco why was this moved to Members Only?

Because we are talking about internal addresses and server names and the public IMO does not need to be privy to it … we could split it into a private convo rather than move the whole thread… it was just getting rather member only type

3 Likes

We’re out of the trenches, Talk notifications should be 100% back in order, you should receive them immediately after they’re sent.

The missing images I think are just… gone. Sorry.

4 Likes

I’d say more like /c/Infrastructure instead.

Would you say these things are resolved? It seems to be acting normally (at least to me, without looking at any logs or anything) …

I marked it as resolved here: https://talk.dallasmakerspace.org/t/talk-maintenance/37390/32?u=lukestrickland

2 Likes

Closed in helpdesk: closed by denzuko on 2018-06-07