Nlp How to embedd external plugin like "BWP Gazetteer" with GATE

I would like to use BWP Gazetteer instead of Default Gazetteer of GATE. For this, i added it as a resource in creole.xml and included its JAR as well in the workspace. Creole.xml <RESOURCE> <NAME>BWPGazetteer</NAME> <JAR>BWPGazetteer.jar</JAR> <CLASS>bwp.gate.gazetteer.BWPGazetteer</CLASS> <COMMENT>A BWPGazetteer.</COMMENT> <PARAMETER NAME="document" RUNTIME="true" COMMENT="The document to be processed"> gate.Document</PARA

Nlp What other inputs are there to Word Sense Disambiguation task?

In Natural Language Processing (NLP), the Word Sense Disambiguation (WSD) task computationally determines the meaning(s) or sense(s) or concept(s) of a polysemous word given a sentence that the word appears in. For example: "Some was stupid enough to rob the central bank*."* "The river bank is full of stones" Do anyone know on WSD performed in paragraph or document level? Other than disambiguate senses/meaning from context words in one sentence, what other input could be introduce to perfor

NLP: compare parsed and tagged sentences

Hello language programmers I'm studying online natural language processing and so far i have some understanding of how to parse a sentence including getting it's POS tags, SRL and so. my question is what to do with this data, or more precisely how to compare two different parsed sentences to see how similar they are. for example i got this tow parsed sentences and i want to be able to compare them 1. <sentence id="s0" parse_status="success" fom="11.6633"> <cons id="c0" cat="NP" xcat

Nlp Why does the Penn Treebank POS tagset have a separate tag for the word 'to'?

The Penn Treebank tagset has a separate tag TO for the word 'to', irrespective of whether it's used in the preposition sense (such as I went to school) or the infinitive sense (such as I want to eat). What purpose does this serve from an overall NLP perspective? Just tagging the infinitival 'to' separately makes intuitive sense, but I don't see the logic behind combining an infinitive and a preposition in a single tag. Thanks, and apologies if this doesn't fit the stack overflow guidelines.

Nlp Easiest way to find relevancy between two queries

I need to find the relevancy between given two queries in terms of distance for instance : Q1(Query1) = Computing Q2(Query2) = RAM Let's assume the relevancy path is something like this: Computing->Personal Computer->Computer Hardware->Computer Components->Random Access Memory->RAM The result should be given as 5. Now the problem is most of these graph databases like FreeBase does not support that feature. The only way is to recursively comparing one query with another one. Question

Nlp kernelized methods in natural language processing

I am new to NLP. I want to implement a matching approach for short sentences( e.g. questions in cQA). I want to use tree kernel function as a syntactic feature. I am wondering to know is there any implementation available in NLP tools, or elsewhere? Specifically, I like a method like this paper Collins and Duffy, Convolution kernels for natural language processing. Any suggestion is useful and appreciated.

Nlp Algorithms for similarity for words using wikipedia

I am looking to calculate the distance between two words Word1 - ManchesterUnited Word2 - RyanGiggs I feel that using wikipedia would be a really good option. I would try and determine the distance of both the words from a common category or topic. What algorithms can I use to determine the common topic? My next question is how would I get the heirarchy under the common topic so that I can calculate the distance of the words? I would also like to know if there are any other ways of calcula

CoreNLP API for N-grams with position

Does CoreNLP have an API for getting ngrams with position etc.? For example, I have a string "I have the best car ". if I am using mingrams=1 and maxgrams=2. I should get the following like below.I know stringutil with ngram function but how to get position. (I,0) (I have,0) (have,1) (have the,1) (the,2) (the best,2) etc etc based on the string I am passing. Any help is really appreciated. Thanks

Nlp Lemmatizing words after POS tagging produces unexpected results

I am using python3.5 with the nltk pos_tag function and the WordNetLemmatizer. My goal is to flatten words in our database to classify text. I am trying to test using the lemmatizer and I encounter strange behavior when using the POS tagger on identical tokens. In the example below, I have a list of three strings and when running them in the POS tagger every other element is returned as a noun(NN) and the rest are return as verbs (VBG). This affects the lemmatization. The out put looks like th

entities-only intents in NLP engines (LUIS/Wit/others)?

Say I need to build a simple order status bot. I wonder whats the best way to form the intents: I could have 2 intents like this a. "Hi, I'd like to know the status of my order", "where's my order" etc. - intent QuerySTatus b. "Joe Levi, +16463730044", "6463730044" etc - intent orderDetails - entities phone number, name or, just one intent: a. "Hi, I'd like to know the status of my order", "where's my order", "what is the status of Joe Levi order", "when order for phone 16463730044 ready"

Nlp How to implement BOT engine like WIT.AI for on an on-premise solution?

I want to build a chatbot for a customer service application. I tried SaaS services like Wit.Ai, Motion.Ai, Api.Ai, etc. These cognitive services find the "intent" and "entities" when trained with the typical interactions model. I need to build chatbot for on-premise solution, without using any of these SaaS services. e.g Typical conversation would be as following - Can you book me a ticket? Is my ticket booked? What is the status of my booking BK02? I want to cancel t

Nlp Methods of calculating text string similarity?

Let’s say I have an array of strings and I need to sort them into clusters. I am currently doing the analysis using n-grams, e.g.: Cluster 1: Pipe fixing Pipe fixing in Las Vegas Movies about Pipe fixing Cluster 2: Classical music Why classical music is great What is classical music etc. Let’s say within this array I have these two strings of text (among others): Japanese students Students from Japan Now, the N-gram method will obviously not put these two strings together, as

Nlp Gensim: Unable to train the LDA model

I have a list of sentences, and I follow the instructions at the tutorial to make a corpora from it: texts = [[word for word in document.lower().split() if word.isalpha()] for document in documents] corpus = corpora.Dictionary(texts) I want to train a LDA model on this corpora and extract the topics keywords. lda = models.LdaModel(corpus, num_topics=10) However, I receive an error while training: TypeError: 'int' object is not iterable. What am I doing wrong? What the format of a corpus sh

How to convert WebAnno Name Entity annotation to use in OpenNLP?

Based in this issue I need to export in XMI format and use DKPro Core to convert to Brat format: I tried this code but did not have success public void convert() throws Exception { SimplePipeline.runPipeline(CollectionReaderFactory .createReaderDescription(XmiReader.class, XmiReader.PARAM_SOURCE_LOCATION, "/tmp", XmiReader.PARAM_PATTERNS, XmiReader.INCLUDE_PREFIX + "*.xmi"), AnalysisEngineFactory

Nlp Mention Types and Mention Classes in Watson Knowledge Studio

How important are Mention Types and Mention Classes to training a machine learning annotator model? Will they get assigned automatically when entities are highlighted? For example, when you click on the Mention Type tab, “NONE” seems to be preselected. Likewise for “SPC” on the Mention Class tab. None of the videos in IBM's Watson Knowledge Studio playlist covers this aspect of using WKS and the official documentation's explanations of whether and how to properly annotate mentions with these att

Nlp How to interpret Python NLTK bigram likelihood ratios?

I'm trying to figure out how to properly interpret nltk's "likelihood ratio" given the below code (taken from this question). import nltk.collocations import nltk.corpus import collections bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words(nltk.corpus.brown.words()) scored = finder.score_ngrams(bgm.likelihood_ratio) # Group bigrams by first word in bigram. prefix_keys = collections.defaultdict(li

Nlp Does Google engine penalize pages containing (machine or human) translated content?

Google SE has zero-tolerance policy against duplicate and spun content, but I am not sure how it deals with translated text? Any guesses on how it might detect translated content? The first thing occurs to my mind is they use their own Google Translate to back-translate the translated content into the source language, but if that's the case do they have to try back-translating into all languages? Are there any specific similarity metrics for such a task? Thank you!

Nlp multiple intents from given a input text?

Can we have a model which distinguishes sentence with intent from a given input text like in the example below? "Turn the bathroom light off and remind me to take the trash out." There are two independent intents here: turn_lights, set_reminder. Similarly in another example: "Hey what's up? do you guys offer free trial?" there are two intents here : greetings , product_pricing Thanks & Regards, Achyuta nanda Sahoo Thanks.

Nlp Lookup table not working in training data of Rasa NLU

I have examples for a particular intent also showing the entity, and I want the model to recognize other words which could be entities for that particular intent, but it fails to recognize it. ## intent: frequency * what is the frequency of [region](field)? * what's the frequency of[region](field)? * frequency of [region](field)? * [region](field)s frequency? * [region](field) frequency? * frequency [region](field)? ## lookup: field * price * phone type * region So when I enter the text "Wha

Nlp How to get accuracy of an expanded query (User input an query which is expanded for better IR)?

Using an algorithm , I am taking an input user query and expanding it. Now I need to test accuracy for my algorithm that is I want to get accuracy ( precision and recall ) for my expanded query ? I have used terrier and taking a Trec Dataset(having collection of documents), I took a random query and retrieved relevant documents using terrier, then I used my algorithm to get expanded query for the random query and retrieved relevant documents. But I dont know how to get precision and recall us

Nlp Out of Memory; BERT

I am new in this area and trying to learn through the below Github link. However, I have encountered a runtime error. Despite tweaking the batch size and gradient accumulation to smaller values, and clearing the cache, the runtime error persists. Can anyone share any insight on this? Thanks very much. RuntimeError: CUDA out of memory. Tried to allocate 1024.00 KiB (GPU 0; 4.00 GiB

Nlp Text length exeeds maximum - How to increase it?

I'm trying to tokenize the data which is in the url and while running i'm getting the following error ValueError: [E088] Text of length 5190319 exceeds maximum of 1000000. The v2.x parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you're not using the parser or NER, it's probably safe to increase the nlp.max_length limit. The limit is in number of characters, so you can check whether you

Nlp Split Sentences at Bullets and Numbering?

I am trying to input text into my word processor to be split into sentences first and then into words. An example paragraph: When the blow was repeated,together with an admonition in childish sentences, he turned over upon his back, and held his paws in a peculiar manner. 1) This a numbered sentence 2) This is the second numbered sentence At the same time with his ears and his eyes he offered a small prayer to the child. Below are the examples - This an example of bullet point sentence - Th

Nlp Loading saved NER back into HuggingFace pipeline?

I am doing some research into HuggingFace's functionalities for transfer learning (specifically, for named entity recognition). To preface, I am a bit new to transformer architectures. I briefly walked through their example off of their website: from transformers import pipeline nlp = pipeline("ner") sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \ "close to the Manhattan Bridge which is visible

Nlp problem saving pre-trained fasttext vectors in "word2vec" format with _save_word2vec_format()

For a list of words I want to get their fasttext vectors and save them to a file in the same "word2vec" .txt format (word+space+vector in txt format). This is what I did: dict = open("word_list.txt","r") #the list of words I have path = "cc.en.300.bin" model = load_facebook_model(path) vectors = [] words =[] for word in dict: vectors.append(model[word]) words.append(word) vectors_array = np.array(vectors) *I want to take the list "

Nlp How to deal with a target variable containing nominal data?

Im working on an NLP project whose target variable contains seven unique sentences which are "inspirational and thought-provoking ", "informative", "acknowledgment and appreciations" and 4 others. As for my understanding, the target variable as we can't establish a quantitative comparison between them. So my question is what is the best way to encode such variables? And if I encode it using one Hot-encoding then the problem will be of multi-class classification?

Nlp Are there any Python libraries to convert a sentence in hiragana to kanji?

If I have a phrase or sentence written in Hiragana such as 「おふろはいる」 I would like to translate/guess at the appropriate Kanji for the string. I have found libraries for going from Kanji to Hiragana, Katakana, Romaji, or English, but I haven't found ones that go the other way. Dictionaries work fine for single words but not sentences.

Nlp Clustering metrics : how to get scores for my clustering method

I am working on clustering methods for textual data (sentences). Implemented an unsupervised clustering method . When I go through the output, it makes sense. I went through literatures to see what metrics would to tell us "how good the clusters are" and got confused. This will help me compare my methods to other methods out there and maybe tweak my method to perform better. I would like to know from the fellow researchers if there are methods which worked best for you which : gives a

Nlp In Spacy pattern matching, how do we get bounded Kleene operator?

In Spacy pattern matching, I know that we can use Kleene operator for ranges. For example, pattern = [{"LOWER": "hello"},{ "OP": "*"}]. Here the star, known as kleene operator, means match against zero or any number of tokens. How can I modify the rule such that only 4 or 5 tokens are matched after the token "hello"? In other NLP applications, for example,in GATE application, we can use some pattern like {Token.string == "hello"}({Token

Nlp Document Classification

Kindly suggest me a classifier that classifies the documents based on the requirements mentioned below. I have set of documents which are to be classified. For each classification label, I have the set of terms that are specific to that class label.

Nlp noun countability

Are there any recourses on determining the countability of nouns? Either some way to work it out or a dictionary that records whether a noun is likely to countable or not countable? I'm not interested in whether the noun can be countable but more is it likely to be countable. for instance rice can go to rices which means it can be countable but in most cases it wont be.

Nlp How to convert a verb to its (derived) noun form?

I am working on a project related to NLP, in which I would like to identify main verb (I can do that with a dependency parser) from a sentence and then convert the verb to its noun form (or we can say noun derived from verb), for example define to definition or sensitive to sensitivity whenever possible. Are there any resources similar to wordnet or verbnet that provides this?

Nlp How to do semantic analysis from POS tag?

suppose sentence is, "Vehicle does not start in cold weather and need to change windshield blades." I'm interested to find out what part of car is affected, and what is the reason for that. From above sentence, we can not infer that windshield blades does not start. In addition single sentence can contain multiple car parts. How to tackle this problem?

Nlp BerkeleyLM: Get n-gram probability

I have a BerkeleyLM language model and want to use it to get the probability in that model of a given n-gram (which comes from a sentence). I have tested both methods as listed below but they do not return probabilities but negative numbers, e.g., -1.111 or -5. What is done wrong here? getLogProb scoreSentence

Nlp Document Planning and MicroPlanning to build NLG Model using SimpleNLG

I am trying to build an NLG Model which would be domain specific. I came across SimpleNLG which I think is a good starting point but looks like it only supports Realisation and not "Document Planning and Micro Planning" as specified in the link below: Can anyone point me some links to get started to build an NLG Model, mainly on how to do Document and Micro Planning? Thanks in advance!

NLP short text marking approach

I am working on a project to evaluate short answer questions for an educational institution.  Here is what I need to do: Teacher has a sample answer (known to us in advance). Sample answer has 3-4 keywords. Student enters the answer.  The application should evaluate student's answer as below: Contextual meaning of those keywords should be present in the answer with same/similar relations as in sample answer. Students are expected to use the synonyms of the keywords. Proper relationships

Google Cloud NLP - No Entities Returned

We are having some issues with the Google NLP service. The service is intermittently refusing to return entities for certain terms. We use the NLP annotate API for free text answers to survey responses. A recent question was related to an image of a kids TV character in the UK called Zippy. Some example responses are below. Unfortunately we had thousands of responses like this and none of them detected "zippy" as an entity. Strangely "elmo", "zippie" and others were detected without any issue, o

Nlp Geo tersm in Luis synonyms generating

I'm facing this issue with geo places. For example. USA, United states, United states of America, US. Luis is able to detect them as built in entity of geographyV2 but I want NLP to pass all the similar terms of the user input. Ex user said US we send back geographyV2 entity and the similar synonyms of it , can luis do that by any chance? I'm desperate for this

Nlp BERT weight calculation

I am trying to understand the BERT weight calculation. Please suggest me some article which can help me to understand the internal workings of BERT. I have read articles from Medium. I am doing a small project to understand the Bert pretraining and fine-tuning from d

Nlp "Response 401: The key used is invalid, malformed, empty, or doesn't match the region" while working with Dispatch

I'm getting this error because Dispatch does not like my either my QnAKnowledgebaseId or QnAEndpointKey located in my .env file. I know the Id and Key are correct because I've tripled checked it and made sure the Id and Key came from my portal. Somehow when dispatchbot.js is loaded up, it does not like my Id or Key: const qnaMaker = new QnAMaker({ knowledgeBaseId: process.env.QnAKnowledgebaseId, endpointKey: process.env.QnAEndpointKey, host: process.env

Nlp Unsupervised Sentiment Analysis pycaret

Is it possible to conduct unsupervised sentiment analysis with Pycaret library if you have an unlabel dataset? Any valid alternative and suggestion will be appreciated too

NLP: Which are the dependency tags associated with a verb?

I need to identify all dependency tags associated with a verb. So far, I have identified: 'ROOT' 'xcomp' spacy.explain('xcomp') Out[72]: 'open clausal complement' 'aux' spacy.explain('aux') Out[73]: 'auxiliary' Are there others?

Nlp Using dependency parsing to predict custom entity labels

I'm looking for a way to use dependency parsing to improve the accuracy of predicting custom entity labels. Can someone point me to any resources? I have been googling and looking up documents mainly on spaCy but i haven't found anything useful.

Nlp Multi-class text classification with one training example per class

I am trying to solve a multi-class single-label document classification problem assigning a single class to a document. Documents are domain-specific technical documents, with technical terms: Train: I have 19 classes with a single document in each class. Target: I have 77 documents without labels I want to classify to the 19 known classes. Documents have between 60-3000 tokens after pre-processing. My entire corpus (19+77 documents) have 65k terms (uni/bi/tri-grams) with 4.5k terms in common (

Nlp Spacy taking too long to run as compared to before

The spacy module is taking too long to vectorize a sentence. for question in Question_Set: sentence = nlp(question) The dataset contains nearly 300k questions. Initially, this code was taking 15 minutes to run. However, now when I am running the same code, it is showing near about 4 hours. The spacy module is taking too long to vectorize a sentence.

  1    2   3   4   5   6  ... 下一页 最后一页 共 12 页