Why Hadoop you may ask… I want to learn to pipe data from big data.
Why Cassandra you may ask…I have to anyway learn this for my work.
Why PySpark you may ask…I want to learn python which is an all rounder.
Also if someone can help me with setting the environment in public cloud platform, it would be of great help.
I’m no expert on the big data stuff, but people have been saying hadoop is obsolete for several years now…
With your background you are likely aiming at a data engineer role, probably a good idea to browse some job postings and look at the buzzwords they’re asking for.
One can use tensorflow.dallasmakerspace.org for ML work and hadoop / blockchain (geth) / faas is available via the docker cluster.
MapReduce (heart of hadoop) can be done in python so one doesn’t just need to learn individual “technologies” they’re really just suites of the same tooling set (ie clustering, MapReduce, data science, nosql and sql).
At the end of the day if one is trying to introduce machine learning into their big data stack then one’s just trying to do predictive modeling on existing datasets. Casandra would only be there as an intermediary datastore while the actual data lives in a “data lake”.
If I may suggest; take our Data Science course:
Almost forgot; data science is not as sexy as its portrayed: