Newest 'imbalanced-data' Questions

-1 votes

0 answers

35 views

Tips for handling imbalance in an LSTM Sequential Multi Label Classification Task? [closed]

So, I've been struggling with this problem for a couple of months now. I have a dataset of protein sequences, which I have encoded three target labels across each sequence for, and I am training an ...

Alana Monks

1

asked Nov 19 at 13:33

0 votes

1 answer

67 views

Brier Skill Score returns NaN in cross_val_score with imbalanced dataset

I’m trying to evaluate classification models on a highly imbalanced fraud dataset using the Brier Skill Score (BSS) as the evaluation metric. The dataset has ~2133 rows and the target Fraud_Flag is ...

Br0k3nS0u1

139

asked Aug 28 at 12:20

0 votes

0 answers

95 views

Loan Default Prediction - Kaggle

I am working on the loan default prediction data set available on Kaggle which has a highly skewed class distribution. The best model I have gotten so far is as follows using ExtraTreesClassifier: ...

RenamedUser7008

57

asked Apr 16 at 8:59

0 votes

1 answer

226 views

Why is my BERT model producing NaN loss during training for multi-label classification on imbalanced data?

I’m running into a frustrating issue while training a BERT-based multi-label text classification model on an imbalanced dataset. After a few epochs, the training loss suddenly becomes NaN, and I can’t ...

Erhan Arslan

36

asked Jan 28 at 13:03

0 votes

0 answers

38 views

How to Build a Neural Network for Predicting Loan Status Using Multi-Table Data from the Berka Dataset

I am working on a project using the Berka dataset, and I want to build a neural network to predict the loan status for accounts. The dataset contains multiple tables, and I want to avoid flattening ...

Dmitrii Ponomarev

1

asked Dec 29, 2024 at 3:49

-2 votes

1 answer

51 views

Improving Accuracy [closed]

I am working on testing accuracy and performance using deep learning models on a complex dataset but I have reached a good accuracy but I need to improve it so any suggestions other than what I did(...

Menna

5

asked Dec 20, 2024 at 16:32

0 votes

0 answers

76 views

Understanding the `model.fit` function in keras and imbalanced datasets

As an exercise, I'm trying to translate a model written in Keras (https://github.com/CVxTz/ECG_Heartbeat_Classification/blob/master/code/baseline_mitbih.py) into Pytorch code. I realize in Keras much ...

user26579046

1

asked Sep 7, 2024 at 21:08

1 vote

2 answers

472 views

Problem with Keras class weights and KeyError

I anticipate that I have seen the question: Keras class_weight error dictionary keys/values referring to the same problem, but the solution does not seem to help me. With this code, where I just added ...

Pinguiz

123

asked Aug 14, 2024 at 17:15

0 votes

0 answers

69 views

Weighted F1-score

I'm training and validating models for a binary classification problem in a dataset that has great class imbalance. When searching for metrics for evaluating the performance of the models, I found ...

JS_ps

1

asked Jul 3, 2024 at 15:40

1 vote

1 answer

1k views

Does XGBoost's scale_pos_weight correctly balance the positive samples if the training dataset has more positive than negative samples?

After researching, I realized that scale_pos_weight is typically calculated as the ratio of the number of negative samples to the number of positive samples in the training data. My dataset has 840 ...

viji

487

asked Jun 6, 2024 at 14:27

1 vote

0 answers

80 views

Class_weight parameter not impacting results in imbalanced dataset with RandomForestClassifier

I'm fairly new to ML and now I'm in the process of predicting employee attrition in a medium sized dataset. I have been able to run everything smoothly, but, as the dataset is imbalanced, I've been ...

Raughar

13

asked Apr 26, 2024 at 8:03

0 votes

0 answers

124 views

How do I add a bias to the last layer in my model if my model outputs logits and not probabilities?

I'm working on a medical image binary segmentation problem using a U-Net in tensorflow, and my classes are extremely unbalanced (about 1 in 10,000). As a result, my model wastes a ton of time going ...

Thao Nguyen

11

asked Apr 22, 2024 at 3:47

0 votes

1 answer

43 views

Train and test split in such a way that each name and proportion of tartget class is present in both train and test

I am trying to solve a ML problem if a person will deliver an order or not. Highly Imbalance dataset. Here is the glimpse of my dataset [{'order_id': '1bjhtj', 'Delivery Guy': 'John', 'Target': 0}, {'...

DSR

631

asked Mar 29, 2024 at 7:21

0 votes

0 answers

33 views

Questions of handling imbalance dataset classification

I am trying to predict number of members who will discontinue their membership. The whole dataset is about 12 millions rows of data with about 40 columns. A member status can be “Continue”, “Voluntary ...

Anson

1

asked Mar 27, 2024 at 15:24

-1 votes

1 answer

186 views

Kernel dies on fit_resample of SMOTE-NC from imblearn

I have a dataset for fraud detection (i can't disclose dataset) which is extremely imbalanced, when i use SMOTE everything works, but as i have 9 categorical features i wanted to use SMOTE-NC but when ...

dsk4ch

1

asked Mar 27, 2024 at 12:54

0 votes

0 answers

58 views

AttributeError: 'EasyEnsembleClassifier' object has no attribute 'fit_resample'

I am trying to perform a balancing between two classes, one majority and one minority. The majority class is a number of no landslide points and the minority class is landslide. I am trying to apply ...

MM-

1

asked Mar 21, 2024 at 14:58

0 votes

1 answer

358 views

Tidymodels and Imbalanced datasets - Subsampling when resampling

When dealing with imbalanced datasets, my understanding is possible solutions are subsampling or oversampling the training set. However, the test set should reflect the imbalance of the original ...

GeorgeM

93

asked Feb 15, 2024 at 20:09

1 vote

0 answers

402 views

Python logistic regression in statsmodels using l1 penalty with class weights

I would like to run logistic regression in statsmodels using an l1 penalty (lasso) and class weights due to a class imbalance. There are several posts that explain how to either implement logistic ...

makemyDNA

73

asked Feb 14, 2024 at 19:42

0 votes

1 answer

391 views

Identify the Synthetic Samples generated by SMOTE

I have a labeled dataset with X shape being 7000 x 2400 and y shape being 7000. The data is heavily imbalanced, so I am trying to generate synthetic samples using SMOTE. However I want to identify the ...

Arindam

326

asked Jan 24, 2024 at 6:04

0 votes

1 answer

159 views

I'm trying to use SMOGN to balance my data but it's giving TypeError or UFuncTypeError how to solve this problem?

I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe. here's the code: r_labels=[] ...

Рим

11

asked Jan 23, 2024 at 12:37

0 votes

0 answers

111 views

Generate synthetic data for majority and minority classes

I am working on a classification problem where I try to generate synthetic data for both the Majority and Minority classes,as i want to train my model on synthetic data and test on actual data, i am ...

user286076

161

asked Jan 14, 2024 at 13:50

1 vote

0 answers

70 views

How to assign weights to monthly data to XGboost performance?

We are having a month on month data received for training, end of every month snapshot is used. XGboost model (binary classification) is used to perform well with one month in train test and not in ...

Josh mar

11

asked Dec 23, 2023 at 10:10

-2 votes

1 answer

117 views

Best way to deal with uneven data in text classification

I'm trying to run a text classification model on some text data (Tweets) using sklearn and Python. I have hand coded near 1.5k cases, however the data is imbalanced. Cases are coded for themes. One of ...

gdhp

27

asked Dec 13, 2023 at 13:22

-1 votes

2 answers

112 views

Highly imbalanced Alzheimer's Disease MRI image dataset

I am currently doing my final year project and I need your humble opinion. My dataset consists of 4 classes which contain : Mild demented - 896 images Moderate demented - 64 images Non demented - 3200 ...

Firzana Eiwany Mashi

1

asked Oct 5, 2023 at 13:29

-1 votes

1 answer

185 views

How to choose the best technique for handling imbalanced data for binary classification?

I am working on my thesis on imbalanced dataset for binary classification problem. I need to handle the imbalance on data before make the classification, but I am not sure what technique is better to ...

Shada Hamed

1

asked Sep 22, 2023 at 11:40

-1 votes

1 answer

7k views

How to Handle Imbalanced Data in a Classification Problem?

I am working on a binary classification problem using machine learning, where my target classes are imbalanced. I have approximately 80% of data points in Class A and only 20% in Class B. I have tried ...

Viper

9

asked Jul 17, 2023 at 10:05

1 vote

0 answers

60 views

Sample Size Inconsistency Error with imblearn's classification_report_imbalanced

I'm encountering an error when using classification_report_imbalanced from imblearn.metrics on a classification task. The code runs smoothly until I add the classification_report_imbalanced function, ...

yuyudss

11

asked Jun 20, 2023 at 13:16

1 vote

0 answers

133 views

Implementing dynamic radius to radius neighbor classifier for better class imbalance handling

I am trying to create dynamic radius based radius neighbour classifier for one multiclass classification problem. This dataset is havig 7 classses. I am giving different radius to each class and then ...

shel coop

45

asked Jun 13, 2023 at 10:38

0 votes

1 answer

115 views

Are mlr3 class weights applied to validation score calculations?

I have previously used mlr3 for imbalanced classification problems, and used PipeOpClassWeights to apply class weights to learners during training. This pipe op adds a column of observation weights to ...

AhmetZamanis

13

asked May 17, 2023 at 12:18

0 votes

0 answers

61 views

Select the maximum number of rows so that the sum of the columns is balanced

Suppose I have a table with the following columns and much more rows: Id n_positive_class1 n_positive_class2 n_positive_class3 1 0 10 4000 2 122 0 0 3 4 5234 0 I'd like to select the maximum number of ...

user11696358

478

asked May 5, 2023 at 8:16

4 votes

0 answers

351 views

How to implement undersampling techniques like NearMiss, TomekLinks, ClusterCentroids, ENN using PySpark?

I'm trying to work on a Fraud Detection dataset from kaggle Credit Card Transactions Fraud Detection Dataset I'm working on PySpark and wish to apply Undersampling techniques using PySpark. However, I ...

Sumit

51

asked Apr 28, 2023 at 13:07

1 vote

2 answers

836 views

Under-sampling leads to poor results for no apparent reason

I am using Random Forest for a semantic segmentation task, with 3 classes, which are imbalanced. First, I just trained the algorithms on random subsets containing 20% of all the pixels (else my memory ...

Droidux

358

asked Apr 24, 2023 at 8:37

0 votes

0 answers

162 views

Re-weight with WeightIt

I weighted my population using WeightIt package library(WeightIt) library(cobalt) data("lalonde", package = "cobalt") W.out <- weightit(treat ~ age + married + race, ...

user19745561

175

asked Apr 16, 2023 at 19:39

1 vote

0 answers

34 views

is it bad to have a high precision, recall, and fbeta on a 1:5 imbalanced dataset?

i have a research using random forest to differentiate if data is bot or human generated. the machine learning model achieved an extremely high performance accuracy, here is the result: Confusion ...

das

29

asked Apr 13, 2023 at 17:05

1 vote

0 answers

123 views

Using Class_weights for imbalance dataset in Mask RCNN

I have added Class_Weights to be used while training Mask RCNN on custome dataset. It is showing error : ValueError: Unknown entries in class_weight dictionary: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...

Tima

11

asked Apr 4, 2023 at 5:27

2 votes

1 answer

816 views

Imblearn pipeline with SMOTE step - AttributeError: This 'Pipeline' has no attribute 'transform'

As part of an assignment, I have been trying to wipe up a pipeline to preprocess some data that I have. Said data has a total of five classes, one of which is imbalanced compared to the others, and ...

Matheus de Oliveira

21

asked Mar 29, 2023 at 20:01

0 votes

0 answers

1k views

Running LightGBM algorithm after the implementation of SMOTE function to mitigate the issue of managing imbalanced dataset

I need to run a lightGBM model with an imbalanced dataset. The dataset has a 'Target' variable with a binary result, "0" with 61471 registers and "1" with 4456 registers. To ...

Guillermo Mansilla

1

asked Mar 27, 2023 at 1:44

1 vote

0 answers

103 views

Using Sample_weight on test set

I am using XGBoost for an imbalanced dataset ( ratio of positive samples to negatives is 1/14). I used the sklearn.utils.class_weight.compute_sample_weight to set sample_weight parameter. To report ...

SaD

123

asked Mar 20, 2023 at 21:21

0 votes

0 answers

295 views

Can the SMOTE Method be used on Image datasets for image dataset imbalances?

Can the offset dataset handling method using the SMOTE (Synthetic Minority Oversampling Technique) method be applied to image datasets? because as far as I know SMOTE is only used for structured data, ...

Anju Ucok Lubis

11

asked Feb 23, 2023 at 13:01

0 votes

1 answer

416 views

Creating balanced bootstrap resamples in caret

I'm using caret to compare models for a classification problem with nested CV. Vfold in the outer loop and bootstrap (500 replicates) in the inner loop. I get this error after training knn: Warning: ...

amr95

33

asked Feb 10, 2023 at 10:41

1 vote

1 answer

257 views

what is the correct way to apply a feature selection method to an imbalanced dataset?

I am new to data science & machine learning, so I'll write my question in detail. I have an imbalanced dataset (binary classification dataset), and I want to apply these methods by using Weka ...

Muneera

11

asked Feb 3, 2023 at 5:45

1 vote

1 answer

5k views

How to combine X_train and y_train into one balanced dataframe?

I have imbalanced dataset: y has only 2% of 1. I want to balance only the train dataset and afterwards to perform on the balanced train dataset feature selection prior to the model. After performing ...

Ella

13

asked Jan 26, 2023 at 11:48

0 votes

1 answer

346 views

NearMiss gives this error when an argument is passed: init() takes 1 positional argument but 2 were given

This is the code I was using for imbalanced data to do under sampling over dataset. from collections import Counter from imblearn.under_sampling import NearMiss ns=NearMiss(0.8) X_train_ns, y_train_ns ...

Rohit Bale

15

asked Jan 21, 2023 at 17:09

0 votes

1 answer

90 views

Random Forest Classifier predicts lower proportion of positive cases compared to the actual

I am using scikit-learn Random Forest Classifier for a binary classification problem with imbalanced classes (negative class: 80%, positive class: 20%). When I apply the model on the same training ...

Jurgita-ds

67

asked Jan 13, 2023 at 20:18

0 votes

1 answer

567 views

Understand shap values for binary classification

I have trained my imbalanced dataset (binary classification) using CatboostClassifer. Now, I am trying to interpret the model using the SHAP library. Below is the code to fit the model and calculate ...

Dhvani Shah

371

asked Jan 12, 2023 at 0:55

0 votes

1 answer

402 views

In imbalanced datasets: the positive class is the majority class

I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several ...

Muneera

11

asked Jan 8, 2023 at 16:22

0 votes

1 answer

195 views

How does sklearn calculate accuracy on the validation set when XGBoost is given class weights?

I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to ...

Eli

290

asked Jan 5, 2023 at 15:42

1 vote

2 answers

712 views

StratifiedKFold and Over-Sampling together

I have a machine learning model and a dataset with 15 features about breast cancer. I want to predict the status of a person (alive or dead). I have 85% alive cases and only 15% dead. So, I want to ...

Andreas

11

asked Dec 31, 2022 at 9:42

0 votes

1 answer

308 views

How to do these in weka: cross validation + imbalanced data + feature selection

I have an imbalanced dataset (classification dataset). By using Weka platform, I want to apply these techniques: cross validation, balancing the training folds, feature selection So, I did the ...

Muneera

11

asked Dec 23, 2022 at 9:22

0 votes

0 answers

52 views

Error in checkMeasures(measures, learner) : object 'fbeta' not found

I am doing an imbalanced classification task, so I want to use f-beta as performance measure. I used the library(mlr) to set measures=fbeta, which follows: library(mlr) #create tasks ## Create ...

ebrahimi

926

asked Dec 17, 2022 at 6:03

Collectives™ on Stack Overflow