灵活掌握java和scala搭建大数据系统
运用实时数据处理框架 Storm, Kafka Streaming搭建real time pipeline
利用AWS EMR, Scalding搭建大数据处理平台
自主开发ETL工具,并利用各种column store database(Redshift, Aster and MemSQL)搭建数据仓库
大数据平台,通过AWS EMR和Scalding框架每天处理超过百亿条的信息。处理生成的数据存在S3上,并通过自主开发的etl tool导入aws redshift数据库以供客户查询,实现全程自动化,无需人工干预。
Drafted AWS CloudFormation template to create and manage AWS resources (AWS Redshift, EC2, VPC).
Migrated existing reports from Oracle to Redshift, designed table layout, and schema, tested distribution and sort keys, analyzed storage efficiency and query performance.
Benchmarked different databases (MemSQL, Redshift, Aurora) to find the best solution for our reporting platform.
Developed and configured various big data workflows to run on top of AWS EMR cluster, which comprise of heterogeneous jobs like Scalding, Spark, Sqoop.
Launched and tuned our own MemSQL cluster in a sandbox environment, which drastically reduced the query responding time in massive concurrent read scenarios.
Developed custom Spark job to speed up exporting data from Oracle to S3.
可兼职时间
可兼职地点
0条评论 雇主评价