Talk Maintenance

denzuko · June 6, 2018, 2:42pm

May not have the same views as stan but looks like we’ll also need to plan for a major upgrade too. 2.1 is out and we’re drastically behind.

Just something else to have on our radar. In the mean time looks like sidekiq can be pulled out of the stack as its own container and deployed outside the main system. If we make talk into a docker swarm then we can dedicate a few temporary vms just for the message queue and take them out of the swarm when its done processing.

Deployment script

Borrowing from my devops talk repository:

#!/bin/bash
# build.sh - create a docker swarmmode cluster then deploy a service stack
# Copyright (c)2017 Dwight Spencer <[email protected]>, All Rights Reserved.

ENVIRONMENT=${ENVIRONMENT:-"stage"}
MAXWORKERS=${MAXWORKERS:-"5"}
MAXMASTERS=${MAXMASTERS:-"3"}
SERVICES=${SERVICES:-"services.yml"}

alias swarm="docker swarm"
alias machine="docker-machine"
alias compose="docker stack deploy --docker-compose"

dmcreate() {
  machine create --engine-storage-driver overlay 2 -d generic $@
}

connect() {
  local max=$1
  local name=$2
  local command=$3
  local token=$4
  local master=${5:-""}

  for x in `seq 1 ${max}`; do
    eval $(machine env ${ENVIRONMENT}.${name}-${x})
    RHOST=$(machine ip active)
    swarm ${command} --advertise-addr $RHOST --listen-addr $RHOST --token ${token} ${master}
  done
}

#vpc=$(aws ec2 create-vpc --cidr-block 10.0.0.0/16 | jq -r ".[].VpcId")
#aws ec2 create-subnet --vpc-id $vpc --cidr-block 10.0.0.0/24
#aws ec2 create-subnet --vpc-id $vpc --cidr-block 10.0.1.0/24
#igateway=$(aws ec2 create-internet-gateway --vpc-id $vpc | jq -r ".[].InternetGatewayId")
#ngateway=$(aws ec2 create-nat-gateway --vpc-id $vpc | jq -r ".[].NatGatewayId")
#iroute=$(aws ec2 create-route-table --vpc-id $vpc | jq -r ".[].RouteTableId")
#aws ec2 create-route --route-table-id $iroute --destination-ipv4-cidr-block 0.0.0.0/0 --gateway-id $igateway
#aws ec2 security group ...

for x in `seq 1 ${MAXMASTERS}`; do dmcreate ${ENVIRONMENT}.master-$x; done
for x in `seq 1 ${MAXWORKERS}`; do dmcreate ${ENVIRONMENT}.worker-$x; done

eval $(docker-machine env ${ENVINONMENT}.master-1)

export master_ip=$(docker-machine ip active)
swarm join --advertise-addr ${master_ip} --listen-addr ${master_ip}

export manager_token=$(swarm join-token manager -q)
export worker_token=$(swarm join-token worker -q)

connect ${MAXMASTER} "master" "init" ${master_token} ""
connect ${MAXWORKER} "worker" "join" ${worker_token} "${master_ip}"

compose services.yml ${ENVIRONMENT}

LukeStrickland · June 6, 2018, 5:06pm

Looked with Stan at lunch, there were 450,000 queue jobs waiting because SMTP authentication kept failing.

Turns out the IP address somehow changed, so SmartPost was no longer allowing us to send.

If you get old emails, sorry, but it’s slightly out of reach to clear them out.

Brian · June 6, 2018, 5:27pm

Whatever it takes to flush that toilet just do it.

StanSimmons · June 6, 2018, 5:28pm

Down to 400,000 enqueued already.

… 300,000

… 200,000 (looks like ~100,000/hour)

malcolmputer · June 6, 2018, 7:04pm

That’s spammer’s numbers right there.

JOwen · June 6, 2018, 8:49pm

Got a bunch of Summary emails. (not complaining) Looks like its working!

Nate · June 6, 2018, 8:51pm

Screenshot_20180606-153248_Samsung Internet

Something is working again, and I’ve received some email notifications as well so, yay y’all did it? Maybe?

denzuko · June 6, 2018, 9:14pm

Turns out the IP address somehow changed

That’s the very nature of AWS when VMs get turned off. We now have a script that would auto update cloudflare when/if there is a change. So if we’re able to have them verify via DKIM and/or host name then great if not… well that means we need to setup some Elastic IPs and Loadballencers which that gets massively costly on their own.

LukeStrickland · June 6, 2018, 9:16pm

That is not the nature of EC2, with EC2 you have Elastic IPs and Network Interfaces. Either of those resources can be assigned to any EC2 instance. If you destory an instance or swap to a new one, you can easily move the Elastic IP or Network Interface.

EC2 does not autogenerate new ip addresses when VMs get turned off.

denzuko · June 6, 2018, 9:34pm

To Quote AWS themselves[¹]:

When you stop/start your instance, the IP address will change. Unfortunately there is no way to avoid this scenario in EC2. If you reboot the instance (within the Operating System, or via the console), it will keep the same IP addresses. Unfortunately it is not possible for us to reassign the address to your instance as that address would have been released back into the pool used by other EC2 instances.

If you want to avoid this issue in future, depending on your needs:
If you only need a fixed public IP address, you can assign an Elastic IP address to your instance. Further information on Elastic IP addresses is available from here.
If you need both public and private IP addresses to remain the same throughout the lifetime of the instance, you can launch your instance in VPC instead. The private IP address assigned to an instance in VPC remains with the instance through to termination.

Thus the states of an IP can be:

Reboot instance: keeps IP Address
Stop then Start instance: new IP Address on startup
Shutdown instance: loses IP Address
Terminate instance: loses IP Address

Therefore anything operation that makes the VM go into ACPI state S4, S5, S6 will drop the address, anything like S0, S1, or S3 would keep the address.

LukeStrickland · June 6, 2018, 9:36pm

That is outdated inaccurate information from before EC2 used their VPC configuration.

denzuko · June 6, 2018, 9:37pm

ok, so how did the IP address change when the vm was turned off?

LukeStrickland · June 6, 2018, 9:43pm

You’re right, that would definitely be why. Great catch. I was very confused

We should plan another migration to move them to Elastic IP’s.

LukeStrickland · June 6, 2018, 9:50pm

@Draco why was this moved to Members Only?

Draco · June 6, 2018, 9:51pm

Because we are talking about internal addresses and server names and the public IMO does not need to be privy to it … we could split it into a private convo rather than move the whole thread… it was just getting rather member only type

LukeStrickland · June 6, 2018, 9:52pm

We’re out of the trenches, Talk notifications should be 100% back in order, you should receive them immediately after they’re sent.

The missing images I think are just… gone. Sorry.

denzuko · June 7, 2018, 8:14pm

I’d say more like /c/Infrastructure instead.

John_Marlow · June 7, 2018, 8:37pm

Would you say these things are resolved? It seems to be acting normally (at least to me, without looking at any logs or anything) …

LukeStrickland · June 7, 2018, 8:38pm

I marked it as resolved here: https://talk.dallasmakerspace.org/t/talk-maintenance/37390/32?u=lukestrickland

denzuko · June 11, 2018, 8:31pm

Closed in helpdesk: closed by denzuko on 2018-06-07

github.com/Dallas-Makerspace/tracker

Issue: [Maintenance] DMS Talk server maintence window

opened by denzuko on 2018-05-30

closed by denzuko on 2018-06-07

Calendar Event
Procedure
Create snapshot
Take down Talk's vm and attach storage to recovery vm
Rekey instance and validate sshd_config
Reattach Storage and Restart VM
Prune docker
Setup...

CR/Maintence Committee/3dFab Committee/Automotive Committee/Black Smithing Committee/Classroom Committee/Creative Arts Committee/Digital Media Committee/Electronics Committee/Fired Arts Committee/Hatcher's Armory Committee/Infrastructure Committee/Jewelry Committee/Laser Committee/Logistics Committee/Machine Shop Committee/Metal Shop Committee/Public Relations Committee/Science Committee/Software Committee/VCC Committee/VECTOR Committee/Wood Shop Priority/HIGH System/Talk