Inicio > > Bases de datos > Diseño y teoría de bases de datos > Hands-On Big Data Analytics with PySpark
Hands-On Big Data Analytics with PySpark

Hands-On Big Data Analytics with PySpark

Bartłomiej Potaczek / Colibri Digital / Rudy Lai

37,34 €
IVA incluido
Disponible
Editorial:
Packt Publishing
Año de edición:
2019
Materia
Diseño y teoría de bases de datos
ISBN:
9781838644130
37,34 €
IVA incluido
Disponible

Selecciona una librería:

  • Librería Samer Atenea
  • Librería Aciertas (Toledo)
  • Kálamo Books
  • Librería Perelló (Valencia)
  • Librería Elías (Asturias)
  • Donde los libros
  • Librería Kolima (Madrid)
  • Librería Proteo (Málaga)

Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs Key Features:- Work with large amounts of agile data using distributed datasets and in-memory caching- Source data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 - Employ the easy-to-use PySpark API to deploy big data Analytics for production Book Description:Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs.You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark.By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively. What You Will Learn:- Get practical big data experience while working on messy datasets - Analyze patterns with Spark SQL to improve your business intelligence - Use PySpark s interactive shell to speed up development time - Create highly concurrent Spark programs by leveraging immutability - Discover ways to avoid the most expensive operation in the Spark API: the shuffle operation - Re-design your jobs to use reduceByKey instead of groupBy - Create robust processing pipelines by testing Apache Spark jobs Who this book is for:This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you’re tasked with creating your company’s business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you.

Artículos relacionados

  • Hands-On Machine Learning on Google Cloud Platform
    Alexis Perrier / Giuseppe Ciaburro / Kishore Ayyadevara
    Enhance your understanding of Computer Vision and image processing by developing real-world projects in OpenCV 3Key FeaturesGet to grips with the basics of Computer Vision and image processingThis is a step-by-step guide to developing several real-world Computer Vision projects using OpenCV 3This book takes a special focus on working with Tesseract OCR, a free, open-source libr...
    Disponible

    72,22 €

  • MLOps with Red Hat OpenShift
    Faisal Masood / Ross Brigoli
    Build and manage MLOps pipelines with this practical guide to using Red Hat OpenShift Data Science, unleashing the power of machine learning workflowsKey FeaturesGrasp MLOps and machine learning project lifecycle through concept introductionsGet hands on with provisioning and configuring Red Hat OpenShift Data ScienceExplore model training, deployment, and MLOps pipeline buildi...
    Disponible

    64,10 €

  • Data Labeling in Machine Learning with Python
    Vijaya Kumar Suda
    Take your data preparation, machine learning, and GenAI skills to the next level by learning a range of Python algorithms and tools for data labelingKey FeaturesGenerate labels for regression in scenarios with limited training dataApply generative AI and large language models (LLMs) to explore and label text dataLeverage Python libraries for image, video, and audio data analysi...
    Disponible

    86,16 €

  • Data Engineering with Scala and Spark
    David Radford / Eric Tome / Rupam Bhattacharjee
    Take your data engineering skills to the next level by learning how to utilize Scala and functional programming to create continuous and scheduled pipelines that ingest, transform, and aggregate dataKey Features- Transform data into a clean and trusted source of information for your organization using Scala- Build streaming and batch-processing pipelines with step-by-step expla...
    Disponible

    54,33 €

  • Mastering Snowflake Platform
    Pooja Kelgaonkar
    Embark on the data journey with the ultimate guide to Snowflake masteryDESCRIPTION Handling ever evolving data for business needs can get complex. Traditional methods create bulky and costly-to-maintain data systems. Here, Snowflake emerges as a cost-effective solution, catering to both traditional and modern data needs with zero or minimal maintenance costs.This book helps you...
    Disponible

    50,54 €

  • BASI DI DATI - PROGETTAZIONE, REALIZZAZIONE E PROGRAMMAZIONE
    Roberto Bandiera
    Il lettore viene guidato nelle diverse fasi della progettazione e realizzazione di un database relazionale.Nelle numerose esemplificazioni pratiche viene utilizzato MySQL come software di gestione database.Viene poi trattato il linguaggio SQL per interrogare ed aggiornare il database. Infine vengono presentate le tecniche e gli strumenti per realizzare una applicazione gestiona...
    Disponible

    34,65 €