Sunday, July 4, 2021

Docker MongoDB terminates when it runs out of memory

When you have multiple services running in docker container it's quite possible that you have an issues with certain services when your docker container runs out of memory. MongoDB is one such service. 

On docker container when you have MongoDB running and when it starts storing huge data it starts consuming lots of memory and that's where you have an issue. MongoDB will crash after sometime where isn't much memory left. 

The reason behind this is the IO model of MongoDB, it tries to keep as much data as possible in cache so read and write operations are much faster. But this creates an issue with docker as we have limited memory and lots of services are sharing that. 

Starting from MongoDB 3.2 on words WiredTiger storage engine is the default one for MongoDB and it's recommended. 

There are various advantages of WiredTiger storage engine. For example,

  • Document Level Concurrency
  • Snapshots and Checkpoints
  • Journal
  • Compression
  • Memory Use
One of most useful feature is Memory use. 

With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache.

You can control it with --wiredTigerCacheSizeGB configuration.

The --wiredTigerCacheSizeGB limits the size of the WiredTiger internal cache. The operating system will use the available free memory for filesystem cache, which allows the compressed MongoDB data files to stay in memory. In addition, the operating system will use any free RAM to buffer file system blocks and file system cache.

With this setting you can enhance memory usage. MongoDB will not use excessive memory and with heavy data usage on docker container MongoDB will not crash on excessive memory usage.

Hope this helps you.

ReactJs Peer to Peer Communication

Recently I evaluated peer to peer communication approach for one of my project so here I am going to share it and how you can use it in your project in case you want to implement peer to peer communication in your project. 

We used library called PeerJS , it's simple peer to peer built on top of webRTC. For this first you have to create a server, which will act as only connection broker. No peer to peer data goes through this server. Let's just create a simple server.

Let's first install peer from npm. 

npm install peer

Now let's create NodeJs script. 

const { PeerServer } = require('peer');

const peerServer = PeerServer({ port: 9000, path: '/server' });

peerServer.on('connection', (client) => { 

console.log(client);

});

That's is now you can run this script through terminal and it will run your server on 9000 port.

Now let's connect to server from our ReactJS component. 

First lets install peerjs npm which is peer client. 

npm install peerjs

We can connect to server in componentDidMount method and add some callbacks function.

import Peer from 'peerjs';

componentDidMount = () => {

        this.peer = new Peer("USERNAME", {

          host: 'localhost',

          port: 9000,

          path: '/server'

        });

        this.peer.on("error", err => {

            console.log("error: ", err)

        })

        this.peer.on("open", id => {

            console.log(id)

        })

        this.peer.on("connection", (con) => {

            console.log("connection opened");

            con.on("data", i => {

                console.log(i)

            });

        })

}

In above code first function is the error callback function. Second once is when peer connection is opened. Third one is when you receive connection from some other peer and get some data. 

Now let's take an example of how you can connect to other peer and send data. 

const conn = this.peer.connect('REMOTE_PEER');

conn.on('open', () => {

          conn.send('DATA');

});

In above code we are connecting to some remote peer and sending some data to it.

Please note here peer to peer data goes through ICE server which you can setup and assign when you create peer server or else it will use PeerCloud server by default. For the development purpose that's ok but for the production you should create your on TURN or STUN server.

Hope this helps you in setting up 

Saturday, June 26, 2021

Create Question And Answer NLP Model With Bert

Hello,

Recently I worked on POC for chatbot where I evaluated Question Answering with Bert. Here in this blog we will see how you can create Question, Answering with Bert.

What is Bert?

According to team, who developed Bert

BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabelled text by jointly conditioning on both left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.”

Bert is pertained on massive dataset and large corpus of unlabelled text. That's the hidden power of Bert as it uses knowledge gained from pre-training and apply it to dataset given. 

For this POC we used HuggingFace's transformers. So first you have to install transformers. Using this model you can get advantage of pre trained data and then you can pass your reference text to it and this model will try to find answers from it.

pip install transformers

or 

pip3 install transformers

Because this models are very big and it takes time to load and download. Let's first save it. Create model.py file and add following code.

from transformers import BertForQuestionAnswering

from transformers import BertTokenizer

BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad').save_pretrained('./trainedModel')

BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad').save_pretrained('./trainedModel')

Now execute it with python command. It will create trainedModel directory and save model there with all required files. Now we can load this pertained saved model.

from transformers import BertForQuestionAnswering

from transformers import BertTokenizer

import torch

bertPreTrainedModel = BertForQuestionAnswering.from_pretrained('./trainedModel')

tokenizer = BertTokenizer.from_pretrained('./trainedModel')

encoded = tokenizer.encode('YOUR_QUESTION', 'YOUR_REFERENCE_TEXT')

tokens = tokenizer.convert_ids_to_tokens(encoded)

sepLocation = encoded.index(tokenizer.sep_token_id)

first_seg_len, second_seg_len = sepLocation + 1, len(encoded) - (sepLocation + 1)

seg_embedding = [0] * first_seg_len + [1] * second_seg_len

modelScores = bertPreTrainedModel(torch.tensor([encoded]), token_type_ids=torch.tensor([seg_embedding]))

ans_start_loc, ans_end_loc = torch.argmax(modelScores[0]), torch.argmax(modelScores[len(modelScores)-1])

result = ' '.join(tokens[ans_start_loc:ans_end_loc + 1])

result = result.replace(' ##', '')

Here you will get your answer in result. This way you can develop your model using BERT and transformers.