Saving data from S3 on RedShift with NodeJS

To save your data in RedShift you first need to download the pg module because in fact RedShift is a Postgres database.

npm install pg

If you have not already done so, install the amazon sdk with the following command:

npm install aws-sdk

We can now write our first lines of code to import these modules into our file:

var AWS = require('aws-sdk');
const { Pool, Client } = require('pg');

Creating a connection pool with the database:

let pool = new Pool({
   user: user,
   host: 'your-endpoint.redshift.amazonaws.com',
   database: database-name,
   password: 'your-password',
   port: 5439,
});

You need to create a table in the RedShift public schema with a column for each property in your json file. Your file must be in RJSON (relaxed json) format, that is, without commas between your objects.

Let's create our table in RedShift with the following sql command:

CREATE TABLE public.product (
   title varchar(256) NOT NULL,
   brand varchar(256) NOT NULL,
   productid int4 NOT NULL,
   salesamount float8 NOT NULL
);

We can use RedShift to copy data from a file in RJSON format directly from S3 into a RedShift table, like this:

let s3Path = 's3://bucket/file.json';
let targetTable = 'nome-da-tabela-no-redshift';
let region = 'aws-region';
let role = 'arn:aws:iam::ACCOUNTID:role/AWS-ROLE-WITH-ACCESS-TO-REDSHIFT';
let json_mode = 'auto';

The COPY command is used to copy data from a file to a table. In our case, we are copying data from a file that is in a S3 bucket.

let copyStatement = [
    'COPY ' + targetTable,
    'FROM ' + quote(s3Path),
    'region ' + quote(region),
    'iam_role ' + quote(role),
    'json ' + quote(json_mode)
].join('\n');

Running the query/command in the database:

pool.query(copyStatement, (err, res) => {
   if (err == undefined) {
       console.log('Data inserted successfully into database');
   } else {
       console.log(err);
   }
       pool.end()
   });

function quote (x) { return "'" + x + "'" }

You can download the file with the complete source code from GitHub.

Comments

How to use Splunk SPL commands to write better queries - Part I

Introduction As a software engineer, we are quite used to deal with logs in our daily lives, but in addition to ensuring that the necessary logs are being sent by the application itself or through a service mesh, we often have to go a little further and interact with some log tool to extract more meaningful data. This post is inspired by a problem I had to solve for a client who uses Splunk as their main data analysis tool and this is the first in a series of articles where we will delve deeper and learn how to use different Splunk commands. Running Splunk with Docker To run Splunk with docker, just run the following command: docker run -d —rm -p 8000:8000 -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=SOME_PASSWORD --name splunk splunk/splunk:latest Sample Data We are going to use the sample data provided by Splunk. You can find more information and download the zip file from their web site . How does it work? In order to be able to interact with Splunk t...

How to run OPA in Docker

From the introduction of the openpolicyagent.org site: OPA generates policy decisions by evaluating the query input against policies and data. In this post i am going to show you an easy and fast way to test your policies by running OPA in Docker. First, make sure you have already installed Docker and have it running: docker ps Inside your choosen directory, create two files. One called input.json file for your system representation and one file called example.rego for your rego policy rules. Add the following content to your json file: Add the following content for the example.rego: Each violation block represents the rule that you want to validate your system against. The first violation block checks if any of the system servers have the http protocol in it. If that is the case, the server id is added to the array. In the same way, the second violation block checks for the servers that have the telnet protocol in it and if it finds a match the server id is also...

How to create a REST API Pagination in Spring Boot with Spring HATEOAS using MongoDB

Introduction In this post we are going to see how we can create a REST API pagination in Spring Boot with Spring HATEOAS and Spring Data MongoDB . For basic queries, we can interact with MongoDB using the MongoRepository interface which is what we are going to use in this tutorial. For more advanced operations like update and aggregations we can use the MongoTemplate class. With Spring applications we start adding the needed dependencies to our pom file if using Maven as our build tool. For this project we are going to use the following dependencies: Spring Web , Spring Data MongoDB and Spring HATEOAS . To quickly create your Spring Boot project with all your dependencies you can go to the Spring Initializr web page. This is how your project should look like: As with any MVC application like Spring there are some minimal layers that we need to create in our application in order to make it accessible like the Controller , Service , Model and Repository layers . For this...

Daily Resource For Devs

Search This Blog