Skip to main content

Saving data from S3 on RedShift with NodeJS

To save your data in RedShift you first need to download the pg module because in fact RedShift is a Postgres database.
npm install pg
If you have not already done so, install the amazon sdk with the following command:
npm install aws-sdk
We can now write our first lines of code to import these modules into our file:
var AWS = require('aws-sdk');
const { Pool, Client } = require('pg');
Creating a connection pool with the database:
let pool = new Pool({
   user: user,
   host: 'your-endpoint.redshift.amazonaws.com',
   database: database-name,
   password: 'your-password',
   port: 5439,
});
You need to create a table in the RedShift public schema with a column for each property in your json file. Your file must be in RJSON (relaxed json) format, that is, without commas between your objects.

Let's create our table in RedShift with the following sql command:
CREATE TABLE public.product (
   title varchar(256) NOT NULL,
   brand varchar(256) NOT NULL,
   productid int4 NOT NULL,
   salesamount float8 NOT NULL
);
We can use RedShift to copy data from a file in RJSON format directly from S3 into a RedShift table, like this:
let s3Path = 's3://bucket/file.json';
let targetTable = 'nome-da-tabela-no-redshift';
let region = 'aws-region';
let role = 'arn:aws:iam::ACCOUNTID:role/AWS-ROLE-WITH-ACCESS-TO-REDSHIFT';
let json_mode = 'auto';
The COPY command is used to copy data from a file to a table. In our case, we are copying data from a file that is in a S3 bucket.
let copyStatement = [
    'COPY ' + targetTable,
    'FROM ' + quote(s3Path),
    'region ' + quote(region),
    'iam_role ' + quote(role),
    'json ' + quote(json_mode)
].join('\n');
Running the query/command in the database:
pool.query(copyStatement, (err, res) => {
   if (err == undefined) {
       console.log('Data inserted successfully into database');
   } else {
       console.log(err);
   }
       pool.end()
   });

function quote (x) { return "'" + x + "'" }
You can download the file with the complete source code from GitHub.

Comments

Popular posts from this blog

Log Aggregation with ELK stack and Spring Boot

Introduction In order to be able to search our logs based on a key/value pattern, we need to prepare our application to log and send information in a structured way to our log aggregation tool. In this article I am going to show you how to send structured log to ElasticSearch using Logstash as a data pipeline tool and how to visualize and filter log information using Kibana. According to a definition from the Wikipedia website: Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. According to Elasticsearch platform website , Elasticsearch is the heart of the Elastic stack, which centrally stores your data for lightning fast search. The use of Elasticsearch, Kibana, Beats and Logstash as a search platform is commonly known as the ELK stack. Next we are going to start up Elasticsearch, Kibana and Logstash using docker so we can better underst...

Understanding RabbitMQ

Introduction RabbitMQ is a centralized message broker based on the AMQP (Advanced Message Queuing Protocol) protocol, acting as a Middleware between Producers and Consumers of different systems. In a message system, Publishers sends a message to a message broker where messages are consumed some time later by one or more Subscribers. By introducing a message brokeer between systems we are decoupling the sender application from the receiver. In this case the service that is responsible for sending or publishing the message does not need to know about any other service. All it needs to care about is the message and its format. With a message system you send a message to a message broker first and when the consumers or listeners of it become online they can start consuming from the message queue. This means you can keep sending messages without even care if the other application is online or if they had any failures. RabbitMQ Architecture Exchange, queue and bindings are the ...

Selection Sort Explained

Introduction If you are trying to get a remote job in a top IT consulting company, you will definitely fall into a live code exercise where your algorithms, logical thinking and problem solving skills will be tested and you will have to demonstrate a solid knowledge of these concepts. Today I decided to write about a type of sorting algorithm that I found several times in interviews and decided, after studying the approach used, to create an initial solution in the simplest possible way. Understanding the logic As we know, the sort algorithm basically uses three basic principles to sort the items in a list. A comparator, a swap function, and recursion. For this selection sort algorithm I will focus in the first two. Given that we have the following list of numbers: 64, 25, 12, 22, 11, how would we use selection sort to swap and sort the list in an ascending order? The following code from the init function uses two for loops to create a temporary list (line 2) with the r...