Saturday, June 26, 2021

Create Question And Answer NLP Model With Bert

Hello,

Recently I worked on POC for chatbot where I evaluated Question Answering with Bert. Here in this blog we will see how you can create Question, Answering with Bert.

What is Bert?

According to team, who developed Bert

BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from unlabelled text by jointly conditioning on both left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks.”

Bert is pertained on massive dataset and large corpus of unlabelled text. That's the hidden power of Bert as it uses knowledge gained from pre-training and apply it to dataset given. 

For this POC we used HuggingFace's transformers. So first you have to install transformers. Using this model you can get advantage of pre trained data and then you can pass your reference text to it and this model will try to find answers from it.

pip install transformers

or 

pip3 install transformers

Because this models are very big and it takes time to load and download. Let's first save it. Create model.py file and add following code.

from transformers import BertForQuestionAnswering

from transformers import BertTokenizer

BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad').save_pretrained('./trainedModel')

BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad').save_pretrained('./trainedModel')

Now execute it with python command. It will create trainedModel directory and save model there with all required files. Now we can load this pertained saved model.

from transformers import BertForQuestionAnswering

from transformers import BertTokenizer

import torch

bertPreTrainedModel = BertForQuestionAnswering.from_pretrained('./trainedModel')

tokenizer = BertTokenizer.from_pretrained('./trainedModel')

encoded = tokenizer.encode('YOUR_QUESTION', 'YOUR_REFERENCE_TEXT')

tokens = tokenizer.convert_ids_to_tokens(encoded)

sepLocation = encoded.index(tokenizer.sep_token_id)

first_seg_len, second_seg_len = sepLocation + 1, len(encoded) - (sepLocation + 1)

seg_embedding = [0] * first_seg_len + [1] * second_seg_len

modelScores = bertPreTrainedModel(torch.tensor([encoded]), token_type_ids=torch.tensor([seg_embedding]))

ans_start_loc, ans_end_loc = torch.argmax(modelScores[0]), torch.argmax(modelScores[len(modelScores)-1])

result = ' '.join(tokens[ans_start_loc:ans_end_loc + 1])

result = result.replace(' ##', '')

Here you will get your answer in result. This way you can develop your model using BERT and transformers.

No comments:

Post a Comment