R Nifty References

Downloading R packages from CRAN

 download.packages(pkgs=c('gganimate','json'), destdir=getwd(),repos=c(PUBLIC_CRAN='https://cran.rstudio.com'))

Validating your R current repos configuration

options('repos')

Installing Packages from custom CRAN repo

install.packages(pkgs=c('gganimate','json'), destdir=getwd(),repos=c(PUBLIC_CRAN='https://cran.rstudio.com'))

AWS Handy Commands

Cloudwatch

VPC Flow Log Query Options

VPC Flowlog Query
[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport = 80 || dstport = 8080, protocol, packets, bytes, start, end, action, logstatus]
[version, accountid, interfaceid, srcaddr, dstaddr, srcport, dstport = 80 || dstport = 8080, protocol, packets, bytes >= 400, start, end, action = REJECT, logstatus]
[version, accountid, interfaceid, srcaddr=10.249.104.136, dstaddr, srcport, dstport = 443, protocol, packets, bytes >= 400, start, end, action, logstatus]

Cloud Insight Log Search Queries

By timestamp
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html#CWL_QuerySyntax-commnds

fields @timestamp, @message
| fields tomillis(@timestamp) as millis
| filter millis > 1663315976300  and millis < 1663316694000

fields @timestamp, @message
| sort @timestamp asc
| fields tomillis(@timestamp) as millis
| filter millis > 1663315976300  and millis < 1663316699000



By regular expression
fields @timestamp, @message
| filter @message like /curl -qL -o packer.zip/
| sort @timestamp desc




fields @timestamp, @message
| fields tomillis(@timestamp) as millis
| filter millis > 1669369259333 and millis < 1669369260333

System Manager

SSM access alternative for SSH to EC2s (SSHD config)

Linux-Ubuntu
host i-* mi-*
     ProxyCommand bash -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
	 
Windows
host i-* mi-*
     ProxyCommand PowerShell -Command "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"

Parameter Store Query (aws cli)

 aws ssm get-parameter –name /path/to/rstudio/config –query ‘Parameter.Value’ –output text

Setting up DNS server Fedora Server 37

1. Install your dnsmasq package

dnf install dnsmasq

2. Configure your /etc/hosts file with the dns hostname entries:

[root@fs37 ~]# cat /etc/hosts
# Loopback entries; do not change.
# For historical reasons, localhost precedes localhost.localdomain:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# See hosts(5) for proper format and other examples:
# 192.168.1.10 foo.mydomain.org foo
# 192.168.1.13 bar.mydomain.org bar
192.168.1.182 dns001.richard.localdomain fs37.richard.localdomain freeipa.richard.localdomain dns001 fs37 freeipa

[root@fs37 ~]#

3. Enable you dnsmasq with the right settings:

[root@fs37 ~]# grep -v -e ^$ -e ^# /etc/dnsmasq.conf
server=8.8.8.8
server=8.8.4.4
local=/richard.localdomain/
user=dnsmasq
group=dnsmasq
interface=lo
listen-address=127.0.0.1,192.168.1.182
bind-interfaces
cache-size=5000
conf-dir=/etc/dnsmasq.d,.rpmnew,.rpmsave,.rpmorig
[root@fs37 ~]#

Note 1: I’m using Google’s DNS servers as my external DNS resolver.

Note 2: My local domain name is specified as local=/richard.localdomain/

Note 3: By default, if you don’t specify listen-address options. It will only listen to localhost only.

4. Enable and start the dnsmasq service.

systemctl enable dnsmasq
systemctl start dnsmasq

5. Lastly, make sure you test your DNS resolution and take note you have oepend DNS ports in the firewall.

[root@fs37 ~]# host fs37
fs37.richard.localdomain has address 192.168.1.182
[root@fs37 ~]# host 192.168.1.182
182.1.168.192.in-addr.arpa domain name pointer dns001.richard.localdomain.
182.1.168.192.in-addr.arpa domain name pointer fs37.richard.localdomain.
182.1.168.192.in-addr.arpa domain name pointer freeipa.richard.localdomain.
182.1.168.192.in-addr.arpa domain name pointer dns001.
182.1.168.192.in-addr.arpa domain name pointer fs37.
182.1.168.192.in-addr.arpa domain name pointer freeipa.
[root@fs37 ~]#

AWS RDS Native Restore database from a full and differential backup file created in on-prem SQL Server 2016

Performing backup and restore using native SQL Server 2016 back from on-prem. This is a scenario required when the database tables without primary keys.

Important

Things to note of when creating RDS is the DB engine major version in on-prem and RDS must match.

Also, it is important to make sure that the IAM role to S3 bucket and Config Options in RDS is setup properly to allow S3 backup file restoration.

Demonstration

Restoring database from a full backup file.

EXECUTE msdb.dbo.rds_restore_database 
@restore_db_name='RICHARDTESTDB', 
@s3_arn_to_restore_from='arn:aws:s3:::richardaik-s3bucket-test/backup/2022-04-22-FULL.bak',
@with_norecovery=1;

Check the full database restore status completion to allow diff restore next:

exec msdb.dbo.rds_task_status @db_name='RICHARDTESTDB'

Restoring diff backup with the norecovery state enabled still. This will keep the database still in recovery state to further accept future diff restores.

EXECUTE msdb.dbo.rds_restore_database 
@restore_db_name='RICHARDTESTDB', 
@s3_arn_to_restore_from='arn:aws:s3:::richardaik-s3bucket-test/backup/2022-04-22-DIFF.bak',
@with_norecovery=1,
@type = 'DIFFERENTIAL';

Validating diff  restore task has completed:

exec msdb.dbo.rds_task_status @db_name='RICHARDTESTDB'

Kicked off diff restore as a cutover state

Restored diff with status completed.

EXECUTE msdb.dbo.rds_restore_database 
@restore_db_name='RICHARDTESTDB', 
@s3_arn_to_restore_from='arn:aws:s3:::richardaik-s3bucket-test/backup/2022-04-22-DIFF.bak',
@with_norecovery=0,
@type = 'DIFFERENTIAL';

Useful Links

iMac BootCamp Windows Virtualization to install WSL2

My iMAC late 2013 virtualization appeared as disabled when I boot directly into it.

Firmware is the latest, VMX is enabled.

EFI or entering into BIOS is not an option since this is not a PC hardware.

After reading every possible options online I found Build5Nines I like his clear cut enterprise approach and explanation.

I have rebooted into Windows via iMAC first and indeed “Virtualisation” is enabled.

Virtualization can only be Enabled by booting into iMac first. Then use starting Windows by going to System Preferences -> Startup Disk->BOOTCAMP

Tagging resources in AWS cli

I have written my script with bash shell called “tag_-resource.sh”

#!/bin/bash

#
#       The  following create-tags example adds (or overwrites) two tags for an
#       AMI and an instance. One of the tags has a key (webserver) but no value
#       (value  is set to an empty string). The other tag has a key (stack) and
#       a value (Production).
#
#          aws ec2 create-tags \
#              --resources ami-1a2b3c4d i-1234567890abcdef0 \
#              --tags Key=webserver,Value=   Key=stack,Value=Production

RESOURCE_ID=${1}
aws ec2 create-tags --resource ${RESOURCE_ID} --tags Key=map-migrated,Value=d-server-tag123 Key=aws-migration-project-id,Value=ABC11101

EBS Volume Tagging


aws ec2 describe-volumes  --query "Volumes[*].{ID:VolumeId,Tag:Tags,VolType:VolumeType,VolState:State}"
for i in `aws ec2 describe-volumes --output text --query "Volumes[*].{ID:VolumeId,Tag:Tags}" | awk '{print $1}'`; do ./tag_-resource.sh $i; done

VPC Endpoint Tagging

aws ec2 describe-vpc-endpoints --query "VpcEndpoints[*].{vpceID:VpcEndpointId,ServiceName:ServiceName,Tags:Tags}" --output text
aws ec2 describe-vpc-endpoints --query "VpcEndpoints[*].{vpceID:VpcEndpointId}" --output text
for i in `aws ec2 describe-vpc-endpoints --query "VpcEndpoints[*].{vpceID:VpcEndpointId}" --output text`; do echo $i; ./tag_-resource.sh $i;done

Lambda Tagging

for i in `aws lambda list-functions --query "Functions[*].{FunctionArn:FunctionArn}" --output text`; do aws lambda list-tags --resource $i; done
aws lambda list-functions --query "Functions[*].{FunctionArn:FunctionArn}" --output text
for i in `aws lambda list-functions --query "Functions[*].{FunctionArn:FunctionArn}" --output text`; do echo $i; ./tag_curtin-resource-lambdaonly.sh $i; done

S3 Bucket Tagging

for i in aws s3 ls | awk '{print $3}'; do echo $i; ./tag_curtin-resource-s3only.sh $i; done
for i in aws s3 ls | awk '{print $3}'; do echo $i; aws s3api get-bucket-tagging --bucket $i; done

How to mount a multiuser SMB share with Kerberos authentication tested on Windows Server 2016 and Amazon Linux (RHEL7 equiv)

I have a problem scenario in an environment where there needs to be an integration with RHEL7 and AD as well as AWS storage gateway file server to allow hybrid cloud setup. So this is a fairly complex troubleshooting that took me 3 working days of reading, debugging, testing aka reverse engineering how existing on-prem only environment was setup by original infrastructure engineer…

1. Create the SPNs for the Linux server in the AD server.

setspn -A host/azlinux001.rakdomain.local@RAKDOMAIN.LOCAL azlinux001
setspn -A host/azlinux001@RAKDOMAIN.LOCAL azlinux001
setspn -L azlinux001

2. Generate the kerberos keytab file in AD. Copy it to azlinux001 host as /etc/krb5.keytab.
!!!IMPORTANT!!! @DOMAIN.NAME is CASE-SENSITIVE
!!!NOTE!!! If the svc-linux-krb service account isn’t there make sure to create it in the AD.

ktpass /princ host/azlinux001.rakdomain.local@RAKDOMAIN.LOCAL /out azlinux001-krb5.keytab /crypto All /ptype KRB5_NT_PRINCIPAL -desonly /mapuser RAKDOMAIN\svc-linux-krb +rndPass +setupn +setpass +answer

3. Backup the existing default keytab, sssd.conf and krb5.conf file.

cp -av /etc/krb5.keytab /etc/krb5.keytab.bak
# cp -av /etc/sssd/sssd.conf /etc/sssd/sssd.conf.bak
# cp -av /etc/krb5.conf/etc/krb5.conf.bak

4. Create another copy of the krb5.keytab in /root directory and merge it with the new keytab generated in step 2.

[root@azlinux001 ~]# cat /etc/krb5.conf
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

includedir /var/lib/sss/pubconf/krb5.include.d/
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = RAKDOMAIN.LOCAL
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
 pkinit_anchors = /etc/pki/tls/certs/ca-bundle.crt
 default_ccache_name = KEYRING:persistent:%{uid}

[realms]
 RAKDOMAIN.LOCAL = {
  kdc = adserver.rakdomain.local
  admin_server = adserver.rakdomain.local
 }

[domain_realm]
 .rakdomain.local = RAKDOMAIN.LOCAL
 rakdomain.local = RAKDOMAIN.LOCAL
 aliashostname = RAKDOMAIN.LOCAL
[root@azlinux001 ~]#

5. Configuration that allows kerberos ticket generation based on principal present in /etc/krb5.keytab without passing it to kinit command

[root@azlinux001 ~]# cat /etc/krb5.conf
# Configuration snippets may be placed in this directory as well
includedir /etc/krb5.conf.d/

includedir /var/lib/sss/pubconf/krb5.include.d/
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = RAKDOMAIN.LOCAL
 dns_lookup_realm = false
 dns_lookup_kdc = false
 ticket_lifetime = 24h
 renew_lifetime = 7d
 forwardable = true
 rdns = false
 pkinit_anchors = /etc/pki/tls/certs/ca-bundle.crt
 default_ccache_name = KEYRING:persistent:%{uid}

[realms]
 RAKDOMAIN.LOCAL = {
  kdc = adserver.rakdomain.local
  admin_server = adserver.rakdomain.local
 }

[domain_realm]
 .rakdomain.local = RAKDOMAIN.LOCAL
 rakdomain.local = RAKDOMAIN.LOCAL
 aliashostname = RAKDOMAIN.LOCAL
[root@azlinux001 ~]#

6. Add the sssd.conf file with the following configuration to support multiuser kerberos authentication:

[domain/default]
krb5_canonicalize = false
ldap_id_use_start_tls = false
ldap_access_order = expire
enumerate = True
ldap_schema = rfc2307bis
ldap_force_upper_case_realm = True
ldap_user_principal = userPrincipalName
krb5_realm = 2K8R2DOMAIN.GSS
krb5_server = ADSERVER.RAKDOMAIN.LOCAL
krb5_kpasswd = ADSERVER.RAKDOMAIN.LOCAL
ldap_uri = ldap://ADSERVER.RAKDOMAIN.LOCAL
ldap_user_home_directory = unixHomeDirectory
auth_provider = krb5
ldap_user_object_class = user
ldap_group_object_class = group
ldap_account_expire_policy = ad
access_provider = ldap
cache_credentials = True
chpass_provider = krb5
ldap_search_base = dc=RAKDOMAIN,dc=LOCAL
id_provider = ldap
ldap_sasl_mech = GSSAPI
ldap_sasl_authid = host/azlinux001.rakdomain.local@RAKDOMAIN.LOCAL


7. Request the kerberos ticket
[root@azlinux001 ~]# kinit -V -k AZLINUX001$
Using default cache: persistent:0:0
Using principal: AZLINUX001$@RAKDOMAIN.LOCAL
Authenticated to Kerberos v5
[root@azlinux001 ~]#

7. Request the kerberos ticket

[root@azlinux001 ~]# kinit -V -k AZLINUX001$
Using default cache: persistent:0:0
Using principal: AZLINUX001$@RAKDOMAIN.LOCAL
Authenticated to Kerberos v5
[root@azlinux001 ~]#

8. Set the request-key utility. The -t option allows to the use of CNAME FQDNS to mount the SMB share.

[root@azlinux001 ~]# cat /etc/request-key.d/cifs.spnego.conf
create cifs.spnego * * /usr/sbin/cifs.upcall -K /etc/krb5.keytab -t %k
create dns_resolver * * /usr/sbin/cifs.upcall -K /etc/krb5.keytab-t %k
[root@azlinux001 ~]#

9. Mount the file share

[root@azlinux001 ~]# mount -vv //sgw-4922d720.rakdomain.local/rak-testbucket-002 /mnt/dmp/ -o sec=krb5i,multiuser,vers=3.0
mount.cifs kernel mount options: ip=10.62.48.94,unc=\sgw-4922d720.rakdomain.local\rak-testbucket-002,sec=krb5i,multiuser,vers=3.0,user=root,pass=
[root@azlinux001 ~]#

!!!IMPORTANT!!! SGW-XXXX is your AWS Storage Gateway ID and you MUST mount either with the FQDN or IP Address. DFS namespace shares aren’t supported.

Notes for AD users (Non-root) to access their CIFS shares

  • To destroy kerberos tickets as root user
# kdestroy
  • To initialize/request the kerberos ticket. (AZLINUX001$ is my hostname here.)
# kinit -V -k AZLINUX001$ 
  • To list the kerberos tickets acquired (NOTE: You won’t get any tickets if krb auth fails)
# klist
NOTE: For non-root users with samba authentication. You simply run kinit and enter your AD password.


References

AWS S3 Batch Operations

After having using S3 Batch Operations for 2 weeks now. I love this batch job’s power to copy files and perform other actions via Lambda Invocation. Best practices I found while working on this technology worth noting:

  • The built-in copy function can support up to 5GB. So anything larger you need to use lambda. (See my AWS reference URL)
  • Don’t try to be clever with exception handling in Lambda functions. It will make your life painful when debugging for Python logic error vs boto3 exceptions.(Probably making full use of Lambda exception is a better thing. Blanket generic Exception handling can be hard to figure out problems)
  • The manifest files are using URLLIB.PARSE.QUOTE_PLUS() and URLLIB.PARSE.UNQUOTE_PLUS() method.
  • If you are doing one-off data copy/loading/migrations. S3 bucket inventory list can be counter-productive. Use boto3 or AWS CLI to grab S3 objects.
  • When debugging errors in Cloudwatch, wait…. and keep an eye on multiple streams of logs.
  • Becareful with filename with Unicode encodings. Keep it to simple alphanumeric characters. Generating input file using s3api and aws s3 ls –recursive can be unpredictable with your file name text decoding.
  • Use s3 boto3 to handle Unicode to guarantee your encoding are intact in the first place.
#
# Author: Richard Aik 
# License: GPLv3
# Filename: richardaik_utils.py
#
import json
import base64
import urllib.parse
import re
import csv
import os 
import boto3
import pprint


# Handy for display dict and other custom python objects
def pretty_print_objects(object):
    pp = pprint.PrettyPrinter(width=4, indent=4)
    pp.pprint(object)


# My custom b64 encoded string
def encoder(original_string):
    urlSafeEncodedBytes = base64.urlsafe_b64encode(original_string.encode("utf-8"))
    urlSafeEncodedStr = urlSafeEncodedBytes.decode('utf-8')
    return urlSafeEncodedStr
    
# Checks invalid characters in filepath in Windows especially filename that doesn't like \/:*?"<>| 
# Returns True if Syntax is OK
# Returns False if Syntax is INVALID
def check_filepath_syntax(file_path):
    searchObj = re.search('[:*?|<>]', file_path)
    if searchObj:
        return False
    else:
        return True


# My custom b64 decoded string
def decoder(b64_encoded_string):
    urlSafeDecodedBytes = base64.urlsafe_b64decode(b64_encoded_string)
    urlSafeDecodedStr = urlSafeDecodedBytes.decode('utf-8')
    return urlSafeDecodedStr


# Used for generating manifest filestring
def s3batchops_encoder(original_string):
    return urllib.parse.quote_plus(original_string)
     


# Used for debugging manifest filestring
def s3batchops_decoder(url_plus_string):
    return urllib.parse.unquote_plus(url_plus_string)


# Reads the s3api json output as input and generate a windows filepath format - for sha1sum input
def generate_winos_csv(json_input_filename, manifest_isilon_filename):
    with open(json_input_filename) as json_f, open(manifest_isilon_filename, 'w') as isilon_f:
        s3_objects = json.loads(json_f.read())

        for obj_dict in s3_objects:
            s3Key = obj_dict['Key']
            is_directory = re.search('/$', s3Key)
        
            if is_directory:
                continue
            else:
                searchObj = re.search('etosgw001/(.*)', s3Key)
                isilon_filepath = f"Y:/{searchObj.group(1)}"
                isilon_f.write(f"{isilon_filepath}\n")


def generate_s3_input_json(bucket_name, s3_prefix, json_output_filename):
    s3_client = boto3.client("s3")

    paginator = s3_client.get_paginator('list_objects_v2')
    response_iterator = paginator.paginate(Bucket=bucket_name, Prefix=s3_prefix)

    s3api_alike_json_list = []    
    with open(json_output_filename, 'w') as json_f:

        for page in response_iterator:
            contents = page['Contents']
            for s3_object in contents:
                if check_filepath_syntax(s3_object['Key']):
                    s3api_alike_json_list.append({'Key': s3_object['Key']}  )
    
        json_str = json.dumps(s3api_alike_json_list, indent=4)
        json_f.write(json_str)
    

def generate_s3_manifest_csv(s3_bucketname, json_input_filename, manifest_s3_filename):
    with open(json_input_filename) as json_f, open(manifest_s3_filename, 'w') as s3_f:
        s3_objects = json.loads(json_f.read())

        for obj_dict in s3_objects:
            s3Key = obj_dict['Key']
            encoded_filename = s3_urlplus_filename(s3Key)
            # print(f"{s3_bucketname},{encoded_filename},{s3Key}")
            s3_f.write(f"{s3_bucketname},{encoded_filename}\n")


def parse_s3batch_sha1sum_result(s3batch_results_filename, s3batch_decoded_filename):
    with open(s3batch_results_filename, 'r') as s3batch_f, open(s3batch_decoded_filename, 'w') as s3batch_results_f:
        csv_f = csv.reader(s3batch_f, delimiter=',')
        for row in csv_f:
            s3_filename = decoder(row[1])
            reObj = re.search('^etosgw001/dmp/(.*)', s3_filename)
            s3_root = 'etosgw001/dmp/'
            filename = reObj.group(1)
            sha1hash = row[6]
            print(f"{s3_root}{filename}|{sha1hash}")
            s3batch_results_f.write(f"{s3_root}|{filename}|{sha1hash}\n")


# return directories, filenames 
def list_files_and_directories(directory_name):
    filename_list = []
    directory_list = []

    for dirpath, directories, files in os.walk(directory_name, topdown=False):
        for name in files:
            #print(os.path, os.path.join(dirpath, name))
            filename_list.append(os.path.join(dirpath, name))
        for name in directories:
            #print(os.path, os.path.join(dirpath,name))
            directory_list.append(os.path, os.path.join(dirpath,name))
    return directory_list, filename_list


# Manifest style file with /dirname/url+file+name+ style
def s3_urlplus_filename(original_filename):
    chopped_string_list = original_filename.split('/')
    encoded_string_list = []
    for name in chopped_string_list:
        encoded_string_list.append(urllib.parse.quote_plus(name))
    return "/".join(encoded_string_list)

# Pretty print json string in a file
def pretty_print_json(input_file):
    with open(input_file, 'r') as json_file:
        json_obj = json.loads(json_file.read(), )
        print(json.dumps(json_obj, indent=4, sort_keys=True))




This is the code I used below to call the re-usable functions to generate the manifest files for S3 batch operations easily. It’s also safer on the weird Unicode character in files

#
# Author: Richard Aik
# License: GPLv3
# Filename: richardaik_utils.py
#
import richardaik_utils #--> this is the code I've above
import urllib.parse

project_name = "FOLDER_XYZ"
s3_src_bucketname = "MY_S3_BUCKETNAME"
s3_src_prefix = f"PREFIX_LEVEL1/PREFIX_LEVEL2/{project_name}"
json_filename = f"c:/Users/richard.aik/python/0_input/{project_name}.json"
manifest_filename = f"c:/Users/richard.aik//python/0_output/MANIFEST-{project_name}.csv"
manifest_unquoted_filename = f"c:/Users/richard.aik/python/0_output/DECODED-MANIFEST-{project_name}.csv"
rak_utils.generate_s3_input_json(s3_src_bucketname, s3_src_prefix, json_filename)
rak_utils.generate_s3_manifest_csv(s3_src_bucketname, json_filename, manifest_filename)


# Useful to debug the fallout files with Unicodes
with open(manifest_filename, 'r') as manifest_f, open(manifest_unquoted_filename, 'w') as unquoted_manifest_f:
    for entry in manifest_f.readlines():
        entry_list = entry.split(',')
        bucketname= entry_list[0]
        s3Key = entry_list[1]
        decoded_s3Key = urllib.parse.unquote_plus(s3Key, encoding='utf-8')
        print(f'{decoded_s3Key}')
        unquoted_manifest_f.write(entry + f",{decoded_s3Key}")

AWS File Storage Gateway

At the point of time I’m currently using File Storage Gateway in my customer’s site. Some things I think worth noting to optimize the storage gateway to improve its performance. The use case was to migrate of onprem Isilon and maintain hybrid cloud setup to allow low latency access to the Storage Gateway appliance. Below are the best practices to optimize the performance of AWS FSGW:

  • The root volume must be gp3/io2 with ~3000 IOPS (Depending on use case….).
  • Implement a RAID-0 alike (striped) EBS volumes each with IO2/GP3 and 3000 IOPS(Depending on use case…)
  • Create a CloudWatch dashboard with Disk IOPS
  • Cache disks don’t like Instance Store/Ephemeral NVMe SSDs so don’t use them.
  • File storage gateway stores stuff in root volume, so you need keep an eye on it to resource contentions (I/O, CPU and etc)
  • Use VPC Endpoints (use the aws.S3.<region> interface for file shares and use the aws.SGW.<region> for direct connect (DX) fileshare traffics)
  • FSGW can only do single protocol (NFS/SMB) per EC2 instance
  • DO NOT mount an S3 bucket on multiple FSGW to avoid data corruption
  • FSGW doesn’t do cross replication, so make sure your EC2/VM instance onprem has a secondary instance ready to take over the access on failover) either by DFS or DNS name switchover
I created a very beefy EC2 M6i.8x_large EC2 instance

Generating Random Files in Python

I use this code to generate random data for files to transfer data into S3 Storage Gateway.

import os

def generate_big_random_bin_file(filename,size):
    """
    generate big binary file with the specified size in bytes
    :param filename: the filename
    :param size: the size in Gigabytes
    :return:void
    """
    max_file_bytes = 1073741824
    with open('%s'%filename, 'w+') as fout:
        for i in range(size):
            fout.write(os.urandom(max_file_bytes)) #1
    print('big random binary file with size %f GB generated ok'%size)

# 10GB file
#generate_big_random_bin_file('Y:\\10GB_FILE.dat', 10)

# 100GB file
generate_big_random_bin_file('Y:\\100GB_FILE.dat', 100)