Skip to main content
Filter by
Sorted by
Tagged with
-1 votes
0 answers
35 views

So, I've been struggling with this problem for a couple of months now. I have a dataset of protein sequences, which I have encoded three target labels across each sequence for, and I am training an ...
Alana Monks's user avatar
0 votes
1 answer
67 views

I’m trying to evaluate classification models on a highly imbalanced fraud dataset using the Brier Skill Score (BSS) as the evaluation metric. The dataset has ~2133 rows and the target Fraud_Flag is ...
Br0k3nS0u1's user avatar
0 votes
0 answers
95 views

I am working on the loan default prediction data set available on Kaggle which has a highly skewed class distribution. The best model I have gotten so far is as follows using ExtraTreesClassifier: ...
RenamedUser7008's user avatar
0 votes
1 answer
226 views

I’m running into a frustrating issue while training a BERT-based multi-label text classification model on an imbalanced dataset. After a few epochs, the training loss suddenly becomes NaN, and I can’t ...
Erhan Arslan's user avatar
0 votes
0 answers
38 views

I am working on a project using the Berka dataset, and I want to build a neural network to predict the loan status for accounts. The dataset contains multiple tables, and I want to avoid flattening ...
Dmitrii Ponomarev's user avatar
-2 votes
1 answer
51 views

I am working on testing accuracy and performance using deep learning models on a complex dataset but I have reached a good accuracy but I need to improve it so any suggestions other than what I did(...
Menna's user avatar
  • 5
0 votes
0 answers
76 views

As an exercise, I'm trying to translate a model written in Keras (https://github.com/CVxTz/ECG_Heartbeat_Classification/blob/master/code/baseline_mitbih.py) into Pytorch code. I realize in Keras much ...
user26579046's user avatar
1 vote
2 answers
472 views

I anticipate that I have seen the question: Keras class_weight error dictionary keys/values referring to the same problem, but the solution does not seem to help me. With this code, where I just added ...
Pinguiz's user avatar
  • 123
0 votes
0 answers
69 views

I'm training and validating models for a binary classification problem in a dataset that has great class imbalance. When searching for metrics for evaluating the performance of the models, I found ...
JS_ps's user avatar
  • 1
1 vote
1 answer
1k views

After researching, I realized that scale_pos_weight is typically calculated as the ratio of the number of negative samples to the number of positive samples in the training data. My dataset has 840 ...
viji's user avatar
  • 487
1 vote
0 answers
80 views

I'm fairly new to ML and now I'm in the process of predicting employee attrition in a medium sized dataset. I have been able to run everything smoothly, but, as the dataset is imbalanced, I've been ...
Raughar's user avatar
  • 13
0 votes
0 answers
124 views

I'm working on a medical image binary segmentation problem using a U-Net in tensorflow, and my classes are extremely unbalanced (about 1 in 10,000). As a result, my model wastes a ton of time going ...
Thao Nguyen's user avatar
0 votes
1 answer
43 views

I am trying to solve a ML problem if a person will deliver an order or not. Highly Imbalance dataset. Here is the glimpse of my dataset [{'order_id': '1bjhtj', 'Delivery Guy': 'John', 'Target': 0}, {'...
DSR's user avatar
  • 631
0 votes
0 answers
33 views

I am trying to predict number of members who will discontinue their membership. The whole dataset is about 12 millions rows of data with about 40 columns. A member status can be “Continue”, “Voluntary ...
Anson's user avatar
  • 1
-1 votes
1 answer
186 views

I have a dataset for fraud detection (i can't disclose dataset) which is extremely imbalanced, when i use SMOTE everything works, but as i have 9 categorical features i wanted to use SMOTE-NC but when ...
dsk4ch's user avatar
  • 1
0 votes
0 answers
58 views

I am trying to perform a balancing between two classes, one majority and one minority. The majority class is a number of no landslide points and the minority class is landslide. I am trying to apply ...
MM-'s user avatar
  • 1
0 votes
1 answer
358 views

When dealing with imbalanced datasets, my understanding is possible solutions are subsampling or oversampling the training set. However, the test set should reflect the imbalance of the original ...
GeorgeM's user avatar
  • 93
1 vote
0 answers
402 views

I would like to run logistic regression in statsmodels using an l1 penalty (lasso) and class weights due to a class imbalance. There are several posts that explain how to either implement logistic ...
makemyDNA's user avatar
0 votes
1 answer
391 views

I have a labeled dataset with X shape being 7000 x 2400 and y shape being 7000. The data is heavily imbalanced, so I am trying to generate synthetic samples using SMOTE. However I want to identify the ...
Arindam's user avatar
  • 326
0 votes
1 answer
159 views

I have data as images(arrays) with their labels uploaded from folders. the data is imbalanced and i'm trying to balance it using smgon after creating dataframe. here's the code: r_labels=[] ...
Рим's user avatar
  • 11
0 votes
0 answers
111 views

I am working on a classification problem where I try to generate synthetic data for both the Majority and Minority classes,as i want to train my model on synthetic data and test on actual data, i am ...
user286076's user avatar
1 vote
0 answers
70 views

We are having a month on month data received for training, end of every month snapshot is used. XGboost model (binary classification) is used to perform well with one month in train test and not in ...
Josh mar's user avatar
-2 votes
1 answer
117 views

I'm trying to run a text classification model on some text data (Tweets) using sklearn and Python. I have hand coded near 1.5k cases, however the data is imbalanced. Cases are coded for themes. One of ...
gdhp's user avatar
  • 27
-1 votes
2 answers
112 views

I am currently doing my final year project and I need your humble opinion. My dataset consists of 4 classes which contain : Mild demented - 896 images Moderate demented - 64 images Non demented - 3200 ...
Firzana Eiwany Mashi's user avatar
-1 votes
1 answer
185 views

I am working on my thesis on imbalanced dataset for binary classification problem. I need to handle the imbalance on data before make the classification, but I am not sure what technique is better to ...
Shada Hamed's user avatar
-1 votes
1 answer
7k views

I am working on a binary classification problem using machine learning, where my target classes are imbalanced. I have approximately 80% of data points in Class A and only 20% in Class B. I have tried ...
Viper's user avatar
  • 9
1 vote
0 answers
60 views

I'm encountering an error when using classification_report_imbalanced from imblearn.metrics on a classification task. The code runs smoothly until I add the classification_report_imbalanced function, ...
yuyudss's user avatar
  • 11
1 vote
0 answers
133 views

I am trying to create dynamic radius based radius neighbour classifier for one multiclass classification problem. This dataset is havig 7 classses. I am giving different radius to each class and then ...
shel coop's user avatar
0 votes
1 answer
115 views

I have previously used mlr3 for imbalanced classification problems, and used PipeOpClassWeights to apply class weights to learners during training. This pipe op adds a column of observation weights to ...
AhmetZamanis's user avatar
0 votes
0 answers
61 views

Suppose I have a table with the following columns and much more rows: Id n_positive_class1 n_positive_class2 n_positive_class3 1 0 10 4000 2 122 0 0 3 4 5234 0 I'd like to select the maximum number of ...
user11696358's user avatar
4 votes
0 answers
351 views

I'm trying to work on a Fraud Detection dataset from kaggle Credit Card Transactions Fraud Detection Dataset I'm working on PySpark and wish to apply Undersampling techniques using PySpark. However, I ...
Sumit 's user avatar
  • 51
1 vote
2 answers
836 views

I am using Random Forest for a semantic segmentation task, with 3 classes, which are imbalanced. First, I just trained the algorithms on random subsets containing 20% of all the pixels (else my memory ...
Droidux's user avatar
  • 358
0 votes
0 answers
162 views

I weighted my population using WeightIt package library(WeightIt) library(cobalt) data("lalonde", package = "cobalt") W.out <- weightit(treat ~ age + married + race, ...
user19745561's user avatar
1 vote
0 answers
34 views

i have a research using random forest to differentiate if data is bot or human generated. the machine learning model achieved an extremely high performance accuracy, here is the result: Confusion ...
das's user avatar
  • 29
1 vote
0 answers
123 views

I have added Class_Weights to be used while training Mask RCNN on custome dataset. It is showing error : ValueError: Unknown entries in class_weight dictionary: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ...
Tima's user avatar
  • 11
2 votes
1 answer
816 views

As part of an assignment, I have been trying to wipe up a pipeline to preprocess some data that I have. Said data has a total of five classes, one of which is imbalanced compared to the others, and ...
Matheus de Oliveira's user avatar
0 votes
0 answers
1k views

I need to run a lightGBM model with an imbalanced dataset. The dataset has a 'Target' variable with a binary result, "0" with 61471 registers and "1" with 4456 registers. To ...
Guillermo Mansilla's user avatar
1 vote
0 answers
103 views

I am using XGBoost for an imbalanced dataset ( ratio of positive samples to negatives is 1/14). I used the sklearn.utils.class_weight.compute_sample_weight to set sample_weight parameter. To report ...
SaD's user avatar
  • 123
0 votes
0 answers
295 views

Can the offset dataset handling method using the SMOTE (Synthetic Minority Oversampling Technique) method be applied to image datasets? because as far as I know SMOTE is only used for structured data, ...
Anju Ucok Lubis's user avatar
0 votes
1 answer
416 views

I'm using caret to compare models for a classification problem with nested CV. Vfold in the outer loop and bootstrap (500 replicates) in the inner loop. I get this error after training knn: Warning: ...
amr95's user avatar
  • 33
1 vote
1 answer
257 views

I am new to data science & machine learning, so I'll write my question in detail. I have an imbalanced dataset (binary classification dataset), and I want to apply these methods by using Weka ...
Muneera's user avatar
  • 11
1 vote
1 answer
5k views

I have imbalanced dataset: y has only 2% of 1. I want to balance only the train dataset and afterwards to perform on the balanced train dataset feature selection prior to the model. After performing ...
Ella's user avatar
  • 13
0 votes
1 answer
346 views

This is the code I was using for imbalanced data to do under sampling over dataset. from collections import Counter from imblearn.under_sampling import NearMiss ns=NearMiss(0.8) X_train_ns, y_train_ns ...
Rohit Bale's user avatar
0 votes
1 answer
90 views

I am using scikit-learn Random Forest Classifier for a binary classification problem with imbalanced classes (negative class: 80%, positive class: 20%). When I apply the model on the same training ...
Jurgita-ds's user avatar
0 votes
1 answer
567 views

I have trained my imbalanced dataset (binary classification) using CatboostClassifer. Now, I am trying to interpret the model using the SHAP library. Below is the code to fit the model and calculate ...
Dhvani Shah's user avatar
0 votes
1 answer
402 views

I use Weka platform. I am working on an imbalanced dataset, and the majority class is the positive class. I aim to apply different classifiers and evaluate their performance by using several ...
Muneera's user avatar
  • 11
0 votes
1 answer
195 views

I am using XGBoost's sklearn API with sklearn's RandomizedSearchCV() to train a boosted tree model with cross validation. My problem is imbalanced, so I've supplied the scale_pos_weight parameter to ...
Eli's user avatar
  • 290
1 vote
2 answers
712 views

I have a machine learning model and a dataset with 15 features about breast cancer. I want to predict the status of a person (alive or dead). I have 85% alive cases and only 15% dead. So, I want to ...
Andreas's user avatar
  • 11
0 votes
1 answer
308 views

I have an imbalanced dataset (classification dataset). By using Weka platform, I want to apply these techniques: cross validation, balancing the training folds, feature selection So, I did the ...
Muneera's user avatar
  • 11
0 votes
0 answers
52 views

I am doing an imbalanced classification task, so I want to use f-beta as performance measure. I used the library(mlr) to set measures=fbeta, which follows: library(mlr) #create tasks ## Create ...
ebrahimi's user avatar
  • 926

1
2 3 4 5
7