
Oreilly – Mastering Large Datasets with Python, Video Edition 2024-11
Published on: 2024-12-12 17:52:35
Categories: 28
Description
Mastering Large Datasets with Python Video Edition. This course teaches you how to scale your big data analytics projects using Python. By learning key concepts like map and reduce and using Python’s powerful tools, you’ll be able to run your algorithms in parallel and take advantage of vast computing resources like cloud clusters.
In today’s world, data is growing rapidly, and analyzing this big data requires special tools and methods. This course will help you meet this challenge using Python, one of the most popular programming languages in the field of data mining. By learning parallelization and distributed techniques, you can perform complex analyses on very large data sets and obtain more accurate results in less time.
What you will learn
- Concept of map and reduce: Deep understanding of these two fundamental concepts in parallel data processing
- Parallelization with multiprocessing and pathos: Learn how to run multiple processes simultaneously to speed up computations
- Hadoop and Spark: Using these two powerful tools for large-scale distributed data processing
- AWS: Run data processing tasks on Amazon cloud services to take advantage of vast computing resources
This course is suitable for people who:
- Python programmers who need to work with larger volumes of data
- People looking to learn advanced data analysis techniques
- Those who want to work in the field of data mining and machine learning
Software Telemetry Video Edition Course Specifications
- Publisher: Oreilly
- Instructor: John Wolohan
- Training level: Beginner to advanced
- Training duration: 7 hours and 43 minutes
Course headings
- Part 1.
- Chapter 1. Introduction
- Chapter 1. Why large datasets?
- Chapter 1. What is parallel computing?
- Chapter 1. The map and reduced style
- Chapter 1. Distributed computing for speed and scale
- Chapter 1. Hadoop: A distributed framework for map and reduce
- Chapter 1. Spark for high-powered map, reduce, and more
- Chapter 1. AWS Elastic MapReduce—Large datasets in the cloud
- Chapter 1. Summary
- Chapter 2. Accelerating large dataset work: Map and parallel computing
- Chapter 2. Parallel processing
- Chapter 2. Putting it all together: Scraping a Wikipedia network
- Chapter 2. Exercises
- Chapter 2. Summary
- Chapter 3. Function pipelines for mapping complex transformations
- Chapter 3. Unmasking hacker communications
- Chapter 3. Twitter demographic projections
- Chapter 3. Exercises
- Chapter 3. Summary
- Chapter 4. Processing large datasets with lazy workflows
- Chapter 4. Some lazy functions to know
- Chapter 4. Understanding iterators: The magic behind lazy Python
- Chapter 4. The poetry puzzle: Lazily processing a large dataset
- Chapter 4. Lazy simulations: Simulating fishing villages
- Chapter 4. Exercises
- Chapter 4. Summary
- Chapter 5. Accumulation operations with reduction
- Chapter 5. The three parts of reduction
- Chapter 5. Reductions you’re familiar with
- Chapter 5. Using map and reduce together
- Chapter 5. Analyzing car trends with reduction
- Chapter 5. Speeding up map and reduce
- Chapter 5. Exercises
- Chapter 5. Summary
- Chapter 6. Speeding up map and reduce with advanced parallelization
- Chapter 6. Solving the parallel map and reduce paradox
- Chapter 6. Summary
- Part 2.
- Chapter 7. Processing truly big datasets with Hadoop and Spark
- Chapter 7. Hadoop for batch processing
- Chapter 7. Using Hadoop to find high-scoring words
- Chapter 7. Spark for interactive workflows
- Chapter 7. Document word scores in Spark
- Chapter 7. Exercises
- Chapter 7. Summary
- Chapter 8. Best practices for large data with Apache Streaming and mrjob
- Chapter 8. Tennis analytics with Hadoop
- Chapter 8. mrjob for Pythonic Hadoop streaming
- Chapter 8. Tennis match analysis with mrjob
- Chapter 8. Exercises
- Chapter 8. Summary
- Chapter 9. PageRank with map and reduce in PySpark
- Chapter 9. Tennis rankings with Elo and PageRank in PySpark
- Chapter 9. Exercises
- Chapter 9. Summary
- Chapter 10. Faster decision-making with machine learning and PySpark
- Chapter 10. Machine learning basics with decision tree classifiers
- Chapter 10. Fast random forest classifications in PySpark
- Chapter 10. Summary
- Part 3.
- Chapter 11. Large datasets in the cloud with Amazon Web Services and S3
- Chapter 11. Storing data in the cloud with S3
- Chapter 11. Exercises
- Chapter 11. Summary
- Chapter 12. MapReduce in the cloud with Amazon’s Elastic MapReduce
- Chapter 12. Machine learning in the cloud with Spark on EMR
- Chapter 12. Exercises
- Chapter 12. Summary
Images of the Mastering Large Datasets with Python Video Edition course

Sample course video
Installation Guide
After Extract, view with your favorite player.
Subtitles: None
Quality: 1080p
Download link
Download file – 0.98 GB
File(s) password: www.downloadly.ir
File size
0.98 GB
Leave a Comment (Please sign to comment)