top of page

Dataset Cleaning

 Basic Usages 

  • Combing method to go over images and remove bad images

  • Methods to extend datasets

  • Salt and Peppering

  • Image Rotation

Cleaning

The images gathered from the scraper will need to be cleaned in order to actually be used. Many images may be duplicates, not related to the topic of just generally not of high quality.

All the user has to do is import the file Dataset_Creator.py and use the comb method. This will allow the user to see multiple images at once based on the 'inc' parameter

Ex.
  Example output of comb method for a directory of apples

combEx.PNG
  • To queue up an image for deletion press the image index (in this example it would be 1, 2, 3, 4)

  • To abort deletions press b

  • To commit the deletions press 'c'

  • To move onto the next subset of images press 'spacebar'

Because about 60% of the images will be unsuitable, the framework comes with other techniques to fluff out the datasets by reusing modified images

Salt & Peppering

Framework comes with ability to salt and pepper images. 

All the user has to do is import the file Dataset_Creator.py and use the s&p method. This will distort a batch of images based on the probability from dataset[lb:ub].

Ex.
  Example output of s&p with the default probability

Image Rotation

Allows for rotation of images in a dataset

All the user has to do is import the file Dataset_Creator.py and use the rotation method. This will allow the user to rotate a batch of images.

Ex.
  Example output of rotation

  • Different orientations have unique macro definitions from 0-3

bottom of page