My Thoughts and Story Towards Data

Kowshik Sarker
4 min readJun 14, 2021

The journey started around 7 months back. Biggest catastrophe of the century hit our civilization and the globe went to lockdown mode. This was the time when I devoted myself to learn new things and it was so clear to me that I will learn the art of Data Science and Analytics. I went through lots of courses in various online platform which are very good for a beginner like me. Trust me, if you indulge yourself into these learning resources you will be beneficial for sure. And of course, the internet is the best resource in our modern era. Kudos to all the learning platforms and online content creators. You shaped out the future of learning.

Coming back to my learning and takeaway, So I have started with the very basic knowledge of Regression analysis. Also, my currently ongoing MBA curriculums helps me a lot in this area. Gradually I got the introductory ideas about Data Collection, Data Cleaning and Pre-processing, Exploratory Data Analysis, Feature Engineering, Hyperparameter Tuning and at last the Modelling and Prediction using various Algorithms like Regression, Logistic Regression, Decision Trees, Random Forest, Bagging and Boosting, KNN, SVM, Naïve Bayes, ANN,DNN, K-Means clustering etc. depending on the types of problem. As a non-programming background candidate, it was really very difficult for me to catch up the coding language i.e. Python or R, but gradually it work out in a good manner (At least now I know how to store a dataset and do some slice and dice and some fancy graphs using it 😉 😉).

Now here my vital takeaway from this month-long learning. There are 2 kinds of explorations

1. I have no idea about what I am looking for. Maybe I will find it out through trials and error and observations.

2. We think we know what is going on, and we want to test our hypothesis.

For both of these “DATA” is the main input which gives us the ultimate result we want to achieve. The line “Data is the new oil” is surely not the line of this decades but rather it was always there in the universe itself. And to understand the Data we have to understand the problem statement or the hypothesis that we are going to search for. Throughout my learning I always get the readymade data and the problem statement which I think makes the task at least 20% easier, because the only things are left is to explore the data, finding out some meaningful information from that and then do the prediction or future forecasting. Most of the times I saw almost everywhere the jargons words are very appreciated rather than the fundamental statistical data analysis. In my opinion what I have felt during my journey of learning is, the most important part of data science is to understand the data, collecting the relevant data for the analysis and doing the exploration on top of that. Trust me a basic correlation graph or a bar chart will give you much more insights rather that running a model just by separating the feature variables and the outcome variable. It is the realization and the observation towards the problem statement which drives you to decide which parameters needs to be collected and then we can do the further mathematical analysis on top of that.

All I want to say is that, coding is not the most important part to be a successful Data Scientist, although I am not a data scientist yet but what I am trying to portray is we have to look around very clearly to solve the problems. We can even do the regression analysis of our daily expenditure. Just we need to track our daily expenses and budget records in a excel sheet and from that data you surely can identify in near future what will be your expenditure looks like. Doing so, will put you in a position to analyse various odd events (which we called “Outliers” statistically) that makes your data somehow not distributed normally. And here is the point when you started the exploration into your data.

My suggestion towards all my fellow friends and learners is to start your learning of data science from very fundamental grassroot concepts Statistics, Matrix Algebra, Partial Differentiation and Linear Regression in detail. These 4 things as per me is the backbone of all the data analysis that are happening around the world. And last but not the least the most important part of our life “OBSERVATION”. From the invention of Galileo Galilei till the Higgs Boson, everything is the outcome of the intimation of observation. So, look around you, try to find out the pain point and figure it out through the observational data using the beautiful tool of Data Science.

Science and Mathematics are always free over there in the air. You just need to feel it, realize it and questioning it…. What……When……How……and Why. The circle will complete. A very good luck to all towards your never-ending journey of learning. 😊

Before ending one of my favorite quotes from Sir Neil deGrasse Tyson –

Knowing how to think empowers you far beyond those who know only what to think”

--

--