btagenerator.blogg.se

Bucket loader
Bucket loader







bucket loader

The source data could be the same physical set of files, an S3 bucket, a copy of them or whatever. Each worker must also have access to the source from which the TOC The master node streams these TOC events over an SQS queue which isĬonsumed to by one or more workers. file paths) which are candidates for WRITE to theĭestination and subsequently VALIDATED. Responsible for determining a table of contents (TOC) (i.e. This is a multi-threaded Java program that can be launched in two modes master or worker. This program has also been used to copy the previously importedīuckets to secondary 'backup' buckets in under an hour. Over 800k files totaling roughly 600gb in under 8 hours.

bucket loader

In another scenario it was used to import and validate In roughly 16 minutes using 40 ec2 t2.medium instances as workers. For example this has been used to import and validate in S3 over 35k files (11gb total) The speed at which you can import a given file-set into S3 (through yas3fs in this case) is only limited on how much money you "sourceA" and "targetB" could be two S3 buckets, or a file-system to S3 bucket (via an S3 file-system abstraction like yas3fs or s3fs etc).Įven though this is coded with S3 being the ultimate destination it could be used for other targets as well including other shared file-systems. S3-bucket-loader leverages a simple master/worker paradigm to get economies of scale for copying many files from sourceA to targetB. rsyncing or copying from source to destination) quickly became impractical due to the sheerĪmount of time that single-threaded, and even limited multi-threaded copiers would take.

bucket loader

Initial attempts at doing this a traditional way, With the ultimate intent that this bucket be managed going forward via the S3 distributed file-system This project originated out of a need to quickly import (and backup) a massive amount of files (hundreds of gigabytes) into an AWS S3 bucket,









Bucket loader