Kseniia Samarina

is working from home. 🏡

Angestellt, Big Data Engineer, VK

Über mich

• Data Engineer with 3+ years of experience, seeking for full-time Data Engineer, Software Engineer roles. • I have a broad outlook in technologies and tools due to work experience and study. • Always open to new experience and knowledge. • Interested in Big Data and Distributed Systems.

Fähigkeiten und Kenntnisse

Java
Tableau
Statistics
ML
Algorithms
GCP
Docker
Git
PostgreSQL
Clickhouse
Apache Airflow
Data Warehouse
Apache Hadoop
SQL
Python
Big Data
Apache Spark
ETL

Werdegang

Berufserfahrung von Kseniia Samarina

  • Bis heute 1 Jahr und 9 Monate, seit Sep. 2022

    Big Data Engineer

    VK

    Worked at Russian social network Odnoklassniki. • Created Data Detail Store for video events log data and migrated about 10 data pipelines to this data source; • Wrote more than 15 complex ETL-pipelines with custom transformations using Apache Spark's Java API; • Integrated with several data sources: MySQL, ClickHouse, hdfs; • Researched and verified data with Spark SQL and DataFrame API using Apache Zeppelin. Technologies: Java, Apache Spark, hdfs, hive, ClickHouse, Apache Zeppelin.

  • 6 Monate, Jan. 2022 - Juni 2022

    Data Engineer

    SEMrush

    Worked on Traffic Analytics: tool to benchmark your website traffic against competitors. • Migrated Scala written clickstream data parser into Java using Apache Spark’s Java API. • Performed clickstream data validation with analytical metrics calculation and statistical hypothesis testing in Jupyter Lab. • Created about 5 new ETL-pipelines in Python with Luigi. Technologies: Python, Java, Apache Spark, ClickHouse, numpy, pandas, GCP.

  • 1 Jahr und 8 Monate, Feb. 2020 - Sep. 2021

    Data Analyst

    Karuna

    Worked at huge Fintech project. • Created BI-reports using Tableau. Reduced reports rendering time from minutes to seconds; • Developed new Data Marts with Apache Airflow using multiple data sources: Vertica, PostgreSQL, ClickHouse; • Wrote and optimized complex analytical SQL queries; • Performed data preprocessing and statistical hypothesis testing in Jupyter Lab with Python packages: numpy, pandas, statsmodels. Technologies: Python, SQL, Apache Airflow, Tableau, Vertica, PostgreSQL, ClickHouse.

Sprachen

  • Deutsch

    Grundlagen

  • Russisch

    Muttersprache

  • Englisch

    Fließend

21 Mio. XING Mitglieder, von A bis Z