327 questions
-1
votes
0
answers
35
views
Tips for handling imbalance in an LSTM Sequential Multi Label Classification Task? [closed]
So, I've been struggling with this problem for a couple of months now. I have a dataset of protein sequences, which I have encoded three target labels across each sequence for, and I am training an ...
0
votes
1
answer
67
views
Brier Skill Score returns NaN in cross_val_score with imbalanced dataset
I’m trying to evaluate classification models on a highly imbalanced fraud dataset using the Brier Skill Score (BSS) as the evaluation metric.
The dataset has ~2133 rows and the target Fraud_Flag is ...
0
votes
0
answers
95
views
Loan Default Prediction - Kaggle
I am working on the loan default prediction data set available on Kaggle which has a highly skewed class distribution. The best model I have gotten so far is as follows using ExtraTreesClassifier:
...
0
votes
1
answer
226
views
Why is my BERT model producing NaN loss during training for multi-label classification on imbalanced data?
I’m running into a frustrating issue while training a BERT-based multi-label text classification model on an imbalanced dataset. After a few epochs, the training loss suddenly becomes NaN, and I can’t ...
0
votes
0
answers
38
views
How to Build a Neural Network for Predicting Loan Status Using Multi-Table Data from the Berka Dataset
I am working on a project using the Berka dataset, and I want to build a neural network to predict the loan status for accounts. The dataset contains multiple tables, and I want to avoid flattening ...
-2
votes
1
answer
51
views
Improving Accuracy [closed]
I am working on testing accuracy and performance using deep learning models on a complex dataset but I have reached a good accuracy but I need to improve it so any suggestions other than what I did(...
0
votes
0
answers
76
views
Understanding the `model.fit` function in keras and imbalanced datasets
As an exercise, I'm trying to translate a model written in Keras (https://github.com/CVxTz/ECG_Heartbeat_Classification/blob/master/code/baseline_mitbih.py) into Pytorch code. I realize in Keras much ...
1
vote
2
answers
472
views
Problem with Keras class weights and KeyError
I anticipate that I have seen the question: Keras class_weight error dictionary keys/values referring to the same problem, but the solution does not seem to help me.
With this code, where I just added ...
0
votes
0
answers
69
views
Weighted F1-score
I'm training and validating models for a binary classification problem in a dataset that has great class imbalance.
When searching for metrics for evaluating the performance of the models, I found ...
1
vote
1
answer
1k
views
Does XGBoost's scale_pos_weight correctly balance the positive samples if the training dataset has more positive than negative samples?
After researching, I realized that scale_pos_weight is typically calculated as the ratio of the number of negative samples to the number of positive samples in the training data. My dataset has 840 ...
1
vote
0
answers
80
views
Class_weight parameter not impacting results in imbalanced dataset with RandomForestClassifier
I'm fairly new to ML and now I'm in the process of predicting employee attrition in a medium sized dataset. I have been able to run everything smoothly, but, as the dataset is imbalanced, I've been ...
0
votes
0
answers
124
views
How do I add a bias to the last layer in my model if my model outputs logits and not probabilities?
I'm working on a medical image binary segmentation problem using a U-Net in tensorflow, and my classes are extremely unbalanced (about 1 in 10,000). As a result, my model wastes a ton of time going ...
0
votes
1
answer
43
views
Train and test split in such a way that each name and proportion of tartget class is present in both train and test
I am trying to solve a ML problem if a person will deliver an order or not. Highly Imbalance dataset. Here is the glimpse of my dataset
[{'order_id': '1bjhtj', 'Delivery Guy': 'John', 'Target': 0},
{'...
0
votes
0
answers
33
views
Questions of handling imbalance dataset classification
I am trying to predict number of members who will discontinue their membership. The whole dataset is about 12 millions rows of data with about 40 columns. A member status can be “Continue”, “Voluntary ...
-1
votes
1
answer
186
views
Kernel dies on fit_resample of SMOTE-NC from imblearn
I have a dataset for fraud detection (i can't disclose dataset) which is extremely imbalanced,
when i use SMOTE everything works, but as i have 9 categorical features i wanted to use SMOTE-NC but when ...
0
votes
0
answers
58
views
AttributeError: 'EasyEnsembleClassifier' object has no attribute 'fit_resample'
I am trying to perform a balancing between two classes, one majority and one minority. The majority class is a number of no landslide points and the minority class is landslide. I am trying to apply ...
0
votes
1
answer
358
views
Tidymodels and Imbalanced datasets - Subsampling when resampling
When dealing with imbalanced datasets, my understanding is possible solutions are subsampling or oversampling the training set. However, the test set should reflect the imbalance of the original ...
1
vote
0
answers
402
views
Python logistic regression in statsmodels using l1 penalty with class weights
I would like to run logistic regression in statsmodels using an l1 penalty (lasso) and class weights due to a class imbalance. There are several posts that explain how to either implement logistic ...
0
votes
1
answer
391
views
Identify the Synthetic Samples generated by SMOTE
I have a labeled dataset with X shape being 7000 x 2400 and y shape being 7000. The data is heavily imbalanced, so I am trying to generate synthetic samples using SMOTE. However I want to identify the ...
0
votes
1
answer
159
views
I'm trying to use SMOGN to balance my data but it's giving TypeError or UFuncTypeError how to solve this problem?
I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe.
here's the code:
r_labels=[]
...
0
votes
0
answers
111
views
Generate synthetic data for majority and minority classes
I am working on a classification problem where I try to generate synthetic data for both the Majority and Minority classes,as i want to train my model on synthetic data and test on actual data, i am ...
1
vote
0
answers
70
views
How to assign weights to monthly data to XGboost performance?
We are having a month on month data received for training, end of every month snapshot is used. XGboost model (binary classification) is used to perform well with one month in train test and not in ...
-2
votes
1
answer
117
views
Best way to deal with uneven data in text classification
I'm trying to run a text classification model on some text data (Tweets) using sklearn and Python. I have hand coded near 1.5k cases, however the data is imbalanced.
Cases are coded for themes. One of ...
-1
votes
2
answers
112
views
Highly imbalanced Alzheimer's Disease MRI image dataset
I am currently doing my final year project and I need your humble opinion. My dataset consists of 4 classes which contain :
Mild demented - 896 images
Moderate demented - 64 images
Non demented - 3200 ...
-1
votes
1
answer
185
views
How to choose the best technique for handling imbalanced data for binary classification?
I am working on my thesis on imbalanced dataset for binary classification problem. I need to handle the imbalance on data before make the classification, but I am not sure what technique is better to ...
-1
votes
1
answer
7k
views
How to Handle Imbalanced Data in a Classification Problem?
I am working on a binary classification problem using machine learning, where my target classes are imbalanced. I have approximately 80% of data points in Class A and only 20% in Class B.
I have tried ...
1
vote
0
answers
60
views
Sample Size Inconsistency Error with imblearn's classification_report_imbalanced
I'm encountering an error when using classification_report_imbalanced from imblearn.metrics on a classification task. The code runs smoothly until I add the classification_report_imbalanced function, ...
1
vote
0
answers
133
views
Implementing dynamic radius to radius neighbor classifier for better class imbalance handling
I am trying to create dynamic radius based radius neighbour classifier for one multiclass classification problem. This dataset is havig 7 classses. I am giving different radius to each class and then ...
0
votes
1
answer
115
views
Are mlr3 class weights applied to validation score calculations?
I have previously used mlr3 for imbalanced classification problems, and used PipeOpClassWeights to apply class weights to learners during training. This pipe op adds a column of observation weights to ...
0
votes
0
answers
61
views
Select the maximum number of rows so that the sum of the columns is balanced
Suppose I have a table with the following columns and much more rows:
Id
n_positive_class1
n_positive_class2
n_positive_class3
1
0
10
4000
2
122
0
0
3
4
5234
0
I'd like to select the maximum number of ...
4
votes
0
answers
351
views
How to implement undersampling techniques like NearMiss, TomekLinks, ClusterCentroids, ENN using PySpark?
I'm trying to work on a Fraud Detection dataset from kaggle
Credit Card Transactions Fraud Detection Dataset
I'm working on PySpark and wish to apply Undersampling techniques using PySpark. However, I ...
1
vote
2
answers
836
views
Under-sampling leads to poor results for no apparent reason
I am using Random Forest for a semantic segmentation task, with 3 classes, which are imbalanced. First, I just trained the algorithms on random subsets containing 20% of all the pixels (else my memory ...
0
votes
0
answers
162
views
Re-weight with WeightIt
I weighted my population using WeightIt package
library(WeightIt)
library(cobalt)
data("lalonde", package = "cobalt")
W.out <- weightit(treat ~ age + married + race,
...
1
vote
0
answers
34
views
is it bad to have a high precision, recall, and fbeta on a 1:5 imbalanced dataset?
i have a research using random forest to differentiate if data is bot or human generated. the machine learning model achieved an extremely high performance accuracy, here is the result:
Confusion ...
1
vote
0
answers
123
views
Using Class_weights for imbalance dataset in Mask RCNN
I have added Class_Weights to be used while training Mask RCNN on custome dataset. It is showing error :
ValueError: Unknown entries in class_weight dictionary: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...
2
votes
1
answer
816
views
Imblearn pipeline with SMOTE step - AttributeError: This 'Pipeline' has no attribute 'transform'
As part of an assignment, I have been trying to wipe up a pipeline to preprocess some data that I have. Said data has a total of five classes, one of which is imbalanced compared to the others, and ...
0
votes
0
answers
1k
views
Running LightGBM algorithm after the implementation of SMOTE function to mitigate the issue of managing imbalanced dataset
I need to run a lightGBM model with an imbalanced dataset. The dataset has a 'Target' variable with a binary result, "0" with 61471 registers and "1" with 4456 registers. To ...
1
vote
0
answers
103
views
Using Sample_weight on test set
I am using XGBoost for an imbalanced dataset ( ratio of positive samples to negatives is 1/14). I used the sklearn.utils.class_weight.compute_sample_weight to set sample_weight parameter.
To report ...
0
votes
0
answers
295
views
Can the SMOTE Method be used on Image datasets for image dataset imbalances?
Can the offset dataset handling method using the SMOTE (Synthetic Minority Oversampling Technique) method be applied to image datasets? because as far as I know SMOTE is only used for structured data, ...
0
votes
1
answer
416
views
Creating balanced bootstrap resamples in caret
I'm using caret to compare models for a classification problem with nested CV. Vfold in the outer loop and bootstrap (500 replicates) in the inner loop. I get this error after training knn:
Warning: ...
1
vote
1
answer
257
views
what is the correct way to apply a feature selection method to an imbalanced dataset?
I am new to data science & machine learning, so I'll write my question in detail.
I have an imbalanced dataset (binary classification dataset), and I want to apply these methods by using Weka ...
1
vote
1
answer
5k
views
How to combine X_train and y_train into one balanced dataframe?
I have imbalanced dataset: y has only 2% of 1. I want to balance only the train dataset and afterwards to perform on the balanced train dataset feature selection prior to the model.
After performing ...
0
votes
1
answer
346
views
NearMiss gives this error when an argument is passed: __init__() takes 1 positional argument but 2 were given
This is the code I was using for imbalanced data to do under sampling over dataset.
from collections import Counter
from imblearn.under_sampling import NearMiss
ns=NearMiss(0.8)
X_train_ns, y_train_ns ...
0
votes
1
answer
90
views
Random Forest Classifier predicts lower proportion of positive cases compared to the actual
I am using scikit-learn Random Forest Classifier for a binary classification problem with imbalanced classes (negative class: 80%, positive class: 20%). When I apply the model on the same training ...
0
votes
1
answer
567
views
Understand shap values for binary classification
I have trained my imbalanced dataset (binary classification) using CatboostClassifer. Now, I am trying to interpret the model using the SHAP library. Below is the code to fit the model and calculate ...
0
votes
1
answer
402
views
In imbalanced datasets: the positive class is the majority class
I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several ...
0
votes
1
answer
195
views
How does sklearn calculate accuracy on the validation set when XGBoost is given class weights?
I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to ...
1
vote
2
answers
712
views
StratifiedKFold and Over-Sampling together
I have a machine learning model and a dataset with 15 features about breast cancer. I want to predict the status of a person (alive or dead). I have 85% alive cases and only 15% dead. So, I want to ...
0
votes
1
answer
308
views
How to do these in weka: cross validation + imbalanced data + feature selection
I have an imbalanced dataset (classification dataset).
By using Weka platform, I want to apply these techniques: cross validation, balancing the training folds, feature selection
So, I did the ...
0
votes
0
answers
52
views
Error in checkMeasures(measures, learner) : object 'fbeta' not found
I am doing an imbalanced classification task, so I want to use f-beta as performance measure. I used the library(mlr) to set measures=fbeta, which follows:
library(mlr)
#create tasks
## Create ...