Wednesday, January 10, 2018

Mushroom Classification with Keras and TensorFlow

Context
Although this dataset was originally contributed to the UCI Machine Learning repository nearly 30 years ago, mushroom hunting (otherwise known as "shrooming") is enjoying new peaks in popularity. Learn which features spell certain death and which are most palatable in this dataset of mushroom characteristics. And how certain can your model be?

Content

This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.
  • Time period: Donated to UCI ML 27 April 1987
I decided to give [this kaggle dataset] a shot. Here, we have over 8,000 samples of mushrooms which need to be modeled to determine whether or not they're edible. I used a Dense Neural Network to model and classify them, after encoding the variables and conducting a 75/25 training/test split. 





Here's how our data started. The following key can be used to understand the dataset entries.



  • cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
  • cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s
  • cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
  • bruises: bruises=t,no=f
  • odor: almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s
  • gill-attachment: attached=a,descending=d,free=f,notched=n
  • gill-spacing: close=c,crowded=w,distant=d
  • gill-size: broad=b,narrow=n
  • gill-color: black=k,brown=n,buff=b,chocolate=h,gray=g,green=r, orange=o,pink=p,purple=u,red=e,white=w,yellow=y
  • stalk-shape: enlarging=e,tapering=t
  • stalk-root: bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?
  • stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
  • stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
  • stalk-color-above-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
  • stalk-color-below-ring: brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
  • veil-type: partial=p,universal=u
  • veil-color: brown=n,orange=o,white=w,yellow=y
  • ring-number: none=n,one=o,two=t
  • ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z
  • spore-print-color: black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y
  • population: abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y
  • habitat: grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

The two surprising results that I found were actually really cool. First, I found that by using the Keras module, I did not need to one-hot encode my 22 categorical features. As seen in the encoding section below, just encoding the data was grueling enough, and I was not looking forward to separating them out and removing individual columns to control for dummy variables. What a relief! For each variable, the characters were encoded to numerical values, and the Neural network took it from there.




The second great finding was that the model completely trains with only one hidden layer in only twenty epochs! This means that the data is easily classified, but also suggests that the model should be tested on a completely independent dataset, just in case there is overfitting. Because the model is trained and tested on separate subsets of data, we can be fairly confident in our 100% accuracy.



Here's a link to the github:

https://github.com/SLPeoples/Python-Excercises/tree/master/MachineLearningPractice/02-Classification

If you're interested in the Nueral Network and how things came together, check out the Jupyter Notebook below!




3 comments:

  1. Thank you for helping people get the information they need. Great stuff as usual. Keep up the great work!!! buy dmt online

    ReplyDelete
  2. Welcome to Magic mushrooms center. Here you can buy magic mushrooms and stand the chance of 100% guaranteed delivery. Moreover, you can get the best quality magic mushrooms of different types. Our main motive is to prevent the hazards from buying shrooms from the street. We have the best supply of magic mushrooms in the market like Liberty caps, Psilocybe cyanescens, Psilocybe Mexicana,penis envy and many more. Feel free to contact us for more inquiries.Liberty caps mushrooms online

    ReplyDelete
  3. https://mushroom-classification.blogspot.com/

    ReplyDelete