about:benjie

Random learnings and other thoughts from an unashamed geek

Backup Your Logs to S3

| Comments

We’ve got an auto-scaling EC2 setup, so servers can appear and disappear at will, each with random IP addresses. We want to store the apache error/access logs (or system logs) from these servers for later debugging and analysis, but nothing else on them matters (they’re basically read only, except the logging). How should we store these?

Seems like a simple problem, there must be a simple solution? Here are the existing options:

  • syslog-ng or rsyslogd logging to a central location.
    • Pros: centralized logs, easy to analyse, near to real time
    • Cons: Requires extra dedicated server (costs). Have to extend storage or upload to S3 periodically. PITA setup from a security POV unless you know the IP addresses in advance (you can set up your own CA and generate a number of SSL keys and distribute these to the servers on startup and use TLS encrypted communications, but I really CBA with all that hassle - not to mention that last time I checked Amazon Linux didn’t support these servers OOTB so I’d have to install them from a CentOS RPM or similar)
  • Message queuing
    • (I don’t think this is a perfect match)
  • Hadoop cluster
    • Pros: awesome data crunching ability
    • Cons: we don’t have 10million users yet, I think this is a bit heavy handed (not to mention expensive)
  • Scribe from Facebook
    • Cons: too much setup, requires specific server heirarchy.
  • logg.ly and splunk
    • Cons: too expensive/untested/unknown for now
  • Something else

I chose “something else” - a custom solution using logrotate to upload the logs to S3 on a per server basis. Here’s how I did it (and how you can do it too) You’ll need:

  • /usr/local/bin/s3uploadfile download - a script that uploads a file to S3…
  • /etc/init.d/s3logrotate download - an init script to trigger logrotate to be called (twice) on server shutdown
  • /usr/local/bin/s3logrotated download - a callback script to call when a log has been rotated (process the logrotate message and calls s3uploadfile to upload the log to S3)
  • /root/.s3env - a simple bash script to export your s3 credentials for s3uploadfile to work
/root/.s3env
1
2
3
4
#!/bin/bash
export S3BUCKETNAME='MyBucketNameHere'
export AWS_ACCESS_KEY_ID='MyKeyHere'
export AWS_SECRET_ACCESS_KEY='MySecretHere'

Step 1: Create your dedicated server log S3 bucket (if you haven’t already)

Step 2: Download and install the scripts above to the locations above, including /root/.s3env and then make them all executable (chmod +x [file])

Step 3: Modify the relevant logrotate scripts to use s3logrotated callback and add s3logrotate to the server rc*.d scripts. When configuring logrotate: don’t use dateext, do use compress, do use delaycompress, do use sharedscripts. Here’s what I use:

Commands I run to configure logrotate/etc, REQUIRE MODIFICATION
1
2
3
4
5
6
7
# This is what I used, you will need to customize this for your own servers
sed -i /etc/logrotate.conf -e 's/^dateext$/#dateext/;s/^weekly$/daily/;s/^rotate 4$/rotate 15/;s/^#compress$/compress/'
sed -i /etc/logrotate.d/httpd -e 's/^\([ \t]*endscript\)/\t\/usr\/local\/bin\/s3logrotated ".2.gz" "\$1"\n\1/'
sed -i /etc/logrotate.d/syslog -e 's/^\([ \t]*sharedscripts\)/\1\n    delaycompress/;s/^\([ \t]*endscript\)/\t\/usr\/local\/bin\/s3logrotated ".2.gz" "\$1"\n\1/'
# This is the centos command to add the s3logrotate script (it contains the levels configuration within it) - 
# you'll want to update-rc.d or similar on debian/ubuntu hosts.
chkconfig --add s3logrotate

Note that the ".2.gz" is because of compress and delaycompress - it is the first compressed log file. This is why we need /etc/init.d/s3logrotate to run “logrotate -f” twice on server shutdown.

Step 4: Analytics I suppose, but I leave that as an exercise for the reader. I hope this helps someone - if it does, let me know in the comments. If you want any help setting it up (or understanding how it works) then just shoot me an email or leave me a comment.

Comments