[Newest Version] Free BDS-C00 PDF and Exam Questions Download 100% Pass Exam - Actual Certifications Exam Q&A

Attention please! Here is the shortcut to pass your BDS-C00 exam! Get yourself well prepared for the AWS Certified Specialty BDS-C00 AWS Certified Big Data – Speciality (BDS-C00) exam is really a hard job. But don’t worry! We We, provides the most update BDS-C00 exam questions. With We latest BDS-C00 dumps, you’ll pass the AWS Certified Specialty BDS-C00 AWS Certified Big Data – Speciality (BDS-C00) exam in an easy way

Visit our site to get more BDS-C00 Q and As:https://www.pass4itsure.com/aws-certified-big-data-specialty.html (264 QAs Dumps)
Question 1:

A new algorithm has been written in Python to identify SPAM e-mails. The algorithm analyzes the free text contained within a sample set of 1 million e-mails stored on Amazon S3. The algorithm must be scaled across a production dataset of

5 PB, which also resides in Amazon S3 storage.

Which AWS service strategy is best for this use case?

A. Copy the data into Amazon ElastiCache to perform text analysis on the in-memory data and export the results of the model into Amazon Machine Learning.

B. Use Amazon EMR to parallelize the text analysis tasks across the cluster using a streaming program step.

C. Use Amazon Elasticsearch Service to store the text and then use the Python Elasticsearch Client to run analysis against the text index.

D. Initiate a Python job from AWS Data Pipeline to run directly against the Amazon S3 text files.

Correct Answer: C

Reference: https://aws.amazon.com/blogs/database/indexing-metadata-in-amazon-elasticsearch-service-using-aws-lambda-and-python/

Question 2:

A data engineer chooses Amazon DynamoDB as a data store for a regulated application. This application must be submitted to regulators for review. The data engineer needs to provide a control framework that lists the security controls from

the process to follow to add new users down to the physical controls of the data center, including items like security guards and cameras.

How should this control mapping be achieved using AWS?

A. Request AWS third-party audit reports and/or the AWS quality addendum and map the AWS responsibilities to the controls that must be provided.

B. Request data center Temporary Auditor access to an AWS data center to verify the control mapping.

C. Request relevant SLAs and security guidelines for Amazon DynamoDB and define these guidelines within the application\’s architecture to map to the control framework.

D. Request Amazon DynamoDB system architecture designs to determine how to map the AWS responsibilities to the control that must be provided.

Correct Answer: A

Question 3:

An administrator needs to design a distribution strategy for a star schema in a Redshift cluster. The administrator needs to determine the optimal distribution style for the tables in the Redshift schema. In which three circumstances would choosing Key-based distribution be most appropriate? (Select three.)

A. When the administrator needs to optimize a large, slowly changing dimension table.

B. When the administrator needs to reduce cross-node traffic.

C. When the administrator needs to optimize the fact table for parity with the number of slices.

D. When the administrator needs to balance data distribution and collocation data.

E. When the administrator needs to take advantage of data locality on a local node for joins and aggregates.

Correct Answer: ACD

Question 4:

A large grocery distributor receives daily depletion reports from the field in the form of gzip archives od CSV files uploaded to Amazon S3. The files range from 500MB to 5GB. These files are processed daily by an EMR job. Recently it has been observed that the file sizes vary, and the EMR jobs take too long. The distributor needs to tune and optimize the data processing workflow with this limited information to improve the performance of the EMR job. Which recommendation should an administrator provide?

A. Reduce the HDFS block size to increase the number of task processors.

B. Use bzip2 or Snappy rather than gzip for the archives.

C. Decompress the gzip archives and store the data as CSV files.

D. Use Avro rather than gzip for the archives.

Correct Answer: B

Question 5:

A web-hosting company is building a web analytics tool to capture clickstream data from all of the websites hosted within its platform and to provide near-real-time business intelligence. This entire system is built on AWS services. The web-hosting company is interested in using Amazon Kinesis to collect this data and perform sliding window analytics.

What is the most reliable and fault-tolerant technique to get each website to send data to Amazon Kinesis with every click?

A. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the sessionID as a partition key and set up a loop to retry until a success response is received.

B. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis Producer Library .addRecords method.

C. Each web server buffers the requests until the count reaches 500 and sends them to Amazon Kinesis using the Amazon Kinesis PutRecord API call.

D. After receiving a request, each web server sends it to Amazon Kinesis using the Amazon Kinesis PutRecord API. Use the exponential back-off algorithm for retries until a successful response is received.

Correct Answer: A

Question 6:

A company has several teams of analysts. Each team of analysts has their own cluster. The teams need to run SQL queries using Hive, Spark-SQL, and Presto with Amazon EMR. The company needs to enable a centralized metadata layer to expose the Amazon S3 objects as tables to the analysts.

Which approach meets the requirement for a centralized metadata layer?

A. EMRFS consistent view with a common Amazon DynamoDB table

B. Bootstrap action to change the Hive Metastore to an Amazon RDS database

C. s3distcp with the outputManifest option to generate RDS DDL

D. Naming scheme support with automatic partition discovery from Amazon S3

Correct Answer: A

Question 7:

A company operates an international business served from a single AWS region. The company wants to expand into a new country. The regulator for that country requires the Data Architect to maintain a log of financial transactions in the country within 24 hours of the product transaction. The production application is latency insensitive. The new country contains another AWS region.

What is the most cost-effective way to meet this requirement?

A. Use CloudFormation to replicate the production application to the new region.

B. Use Amazon CloudFront to serve application content locally in the country; Amazon CloudFront logs will satisfy the requirement.

C. Continue to serve customers from the existing region while using Amazon Kinesis to stream transaction data to the regulator.

D. Use Amazon S3 cross-region replication to copy and persist production transaction logs to a bucket in the new country\’s region.

Correct Answer: B

Question 8:

A game company needs to properly scale its game application, which is backed by DynamoDB. Amazon Redshift has the past two years of historical data. Game traffic varies throughout the year based on various factors such as season, movie release, and holiday season. An administrator needs to calculate how much read and write throughput should be provisioned for DynamoDB table for each week in advance.

How should the administrator accomplish this task?

A. Feed the data into Amazon Machine Learning and build a regression model.

B. Feed the data into Spark Mlib and build a random forest modest.

C. Feed the data into Apache Mahout and build a multi-classification model.

D. Feed the data into Amazon Machine Learning and build a binary classification model.

Correct Answer: B

Question 9:

A data engineer is about to perform a major upgrade to the DDL contained within an Amazon Redshift cluster to support a new data warehouse application. The upgrade scripts will include user permission updates, view and table structure

changes as well as additional loading and data manipulation tasks.

The data engineer must be able to restore the database to its existing state in the event of issues.

Which action should be taken prior to performing this upgrade task?

A. Run an UNLOAD command for all data in the warehouse and save it to S3.

B. Create a manual snapshot of the Amazon Redshift cluster.

C. Make a copy of the automated snapshot on the Amazon Redshift cluster.

D. Call the waitForSnapshotAvailable command from either the AWS CLI or an AWS SDK.

Correct Answer: B

Reference: https://docs.aws.amazon.com/redshift/latest/mgmt/working-with-snapshots.html#working-with-snapshot-restore-table-from-snapshot

Question 10:

A large oil and gas company needs to provide near real-time alerts when peak thresholds are exceeded in its pipeline system. The company has developed a system to capture pipeline metrics such as flow rate, pressure, and temperature using millions of sensors. The sensors deliver to AWS IoT.

What is a cost-effective way to provide near real-time alerts on the pipeline metrics?

A. Create an AWS IoT rule to generate an Amazon SNS notification.

B. Store the data points in an Amazon DynamoDB table and poll if for peak metrics data from an Amazon EC2 application.

C. Create an Amazon Machine Learning model and invoke it with AWS Lambda.

D. Use Amazon Kinesis Streams and a KCL-based application deployed on AWS Elastic Beanstalk.

Correct Answer: C

Question 11:

A company is using Amazon Machine Learning as part of a medical software application. The application will predict the most likely blood type for a patient based on a variety of other clinical tests that are available when blood type knowledge

is unavailable.

What is the appropriate model choice and target attribute combination for this problem?

A. Multi-class classification model with a categorical target attribute.

B. Regression model with a numeric target attribute.

C. Binary Classification with a categorical target attribute.

D. K-Nearest Neighbors model with a multi-class target attribute.

Correct Answer: A

Question 12:

A data engineer is running a DWH on a 25-node Redshift cluster of a SaaS service. The data engineer needs to build a dashboard that will be used by customers. Five big customers represent 80% of usage, and there is a long tail of dozens of smaller customers. The data engineer has selected the dashboarding tool.

How should the data engineer make sure that the larger customer workloads do NOT interfere with the smaller customer workloads?

A. Apply query filters based on customer-id that can NOT be changed by the user and apply distribution keys on customer-id.

B. Place the largest customers into a single user group with a dedicated query queue and place the rest of the customers into a different query queue.

C. Push aggregations into an RDS for Aurora instance. Connect the dashboard application to Aurora rather than Redshift for faster queries.

D. Route the largest customers to a dedicated Redshift cluster. Raise the concurrency of the multi-tenant Redshift cluster to accommodate the remaining customers.

Correct Answer: D

Question 13:

A customer needs to determine the optimal distribution strategy for the ORDERS fact table in its Redshift schema. The ORDERS table has foreign key relationships with multiple dimension tables in this schema. How should the company determine the most appropriate distribution key for the ORDERS table?

A. Identify the largest and most frequently joined dimension table and ensure that it and the ORDERS table both have EVEN distribution.

B. Identify the largest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

C. Identify the smallest dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

D. Identify the largest and the most frequently joined dimension table and designate the key of this dimension table as the distribution key of the ORDERS table.

Correct Answer: D

Reference: https://aws.amazon.com/blogs/big-data/optimizing-for-star-schemas-and-interleaved-sorting-on-amazon-redshift/

Question 14:

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into 5-minute chunks stored in Amazon S3.

Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries always reference a single IP address. Data must be optimized for querying based on IP address using Hive running on Amazon EMR.

What is the most efficient method to query the data with Hive?

A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS.

B. Store the Amazon S3 objects with the following naming scheme: bucket_name/source=ip_address/year=yy/month=mm/day=dd/hour=hh/filename.

C. Store the data in an HBase table with the IP address as the row key.

D. Store the events for an IP address as a single file in Amazon S3 and add metadata with keys: Hive_Partitioned_IPAddress.

Correct Answer: A

Question 15:

A company that manufactures and sells smart air conditioning units also offers add-on services so that customers can see real-time dashboards in a mobile application or a web browser. Each unit sends its sensor information in JSON format

every two seconds for processing and analysis. The company also needs to consume this data to predict possible equipment problems before they occur. A few thousand pre-purchased units will be delivered in the next couple of months. The

company expects high market growth in the next year and needs to handle a massive amount of data and scale without interruption.

Which ingestion solution should the company use?

A. Write sensor data records to Amazon Kinesis Streams. Process the data using KCL applications for the end-consumer dashboard and anomaly detection workflows.

B. Batch sensor data to Amazon Simple Storage Service (S3) every 15 minutes. Flow the data downstream to the end-consumer dashboard and to the anomaly detection application.

C. Write sensor data records to Amazon Kinesis Firehose with Amazon Simple Storage Service (S3) as the destination. Consume the data with a KCL application for the end-consumer dashboard and anomaly detection.

D. Write sensor data records to Amazon Relational Database Service (RDS). Build both the end-consumer dashboard and anomaly detection application on top of Amazon RDS.

Correct Answer: C

Visit our site to get more BDS-C00 Q and As:https://www.pass4itsure.com/aws-certified-big-data-specialty.html (264 QAs Dumps)