jump to navigation

Data Science in Your Business (Notes from Week of April 15 – 21, 2019) May 8, 2019

Posted by Anthony in Automation, experience, finance, Founders, Hiring, questions, Stacks, Strategy, training, Uncategorized.
Tags: , , , , , , , , , , , , , , ,
trackback

It feels appropriate to have the week of Google I/O’s conference to be the one that aligned with my notes where data was the primary focus, especially when Google was pushing ease of technology (centered around giving them access to more data). There were some excellent memes/pictures around for the differences of Facebook asking (hah) for data compared to Google (where they have a ton of it already but they mention the stuff coming).

Kaggle, a research data competition site, held a conference centered around hiring and careers in data with guest speakers from some of the most interesting companies working with data, including Google. Listening to career-based or hiring podcasts related to the field gives insights to how corporations or orgs focus on the spectrum of people vs skills. The other side of this would be a discussion on how data science teams can impact the business at value. What can be done with the data? Is it helpful? Which metrics are measurable and important?

A few episodes went into the business and general application of research in the data. Research on how personality and music are interrelated or satellite imagery from NASA to provide various live solutions – and to what extent they can be designed to be used. A few non-data science-specific podcasts dealt with FinTech, HealthTech and marketplace tuning. How do startups fight against incumbents in various marketplaces? Are their offerings sustainable or do they break the model of what we have seen?

Hopefully my notes provide some incentive to go back and listen to one or each of the podcasts. Or connect via Twitter to talk more!

  • Validating D/S with QuantHub, Matt Cowell CEO (BDB 4/2/19)
    e9qpszxn_400x400

    • Also with Nathan Black, Chief Data Scientist at QuantHub
    • Talking data science – math with business and IT skillsets
    • Companies are manually doing tech assessments with candidates / roles – programmer-based primarily
      • QuantHub looks at a comprehensive, scientific approach to assessment of the stack of what may be necessary
    • NLP of resume, and then Bayes’ updating for input and results from that
      • Assessment platform at the core and using it for hiring (natural use-case) but then benchmarking organizational skills
      • Aggregators of content and matchmaking (say, data analyst up to data engineer – wrangling, SQL improvement)
    • Assessments done and individuals won’t be charged – overall value in helping talent
      • Building the training side in the next quarter
    • How do companies engage QuantHub? 5min to get running – align the incentives for using it / relationship.
      • What are the challenges? What are the skillsets? What do you mean that you want them to do?
      • Requirements changing by different statistical methods (along with computing power, designing algorithms vs latest research)
      • Knowing/vetting data scientists as having to do the role / job – can you mirror the actual job requirements? (try vs buy, potentially)
    • Cloud computing or hardware innovation as ‘cool’ in a world of software – highly critical, depending on certain organizations
      • Some orgs NEED the data improvement there (Kubernetes, Docker, Cloud, Spark vs Power Excel user)
    • Matt as a product strategy guy – book “Monetizing Innovation”
      • How do you determine what the market and customers want
    • Nathan’s book – “Make It Stick” on how you can improve learning methods
  • James Martin (Staffing Lead, D/S at Google) – Getting Noticed in D/S (Kaggle CareerCon 4/17/2019)
    google-cloud-platform-for-data-science-teams-4-638

    • Looking at field – ML Engineer to Quant/Statistician to Product Analyst to Data Analyst
    • Research tips: Open source projects (understanding current trends, gain experience, make connections)
      • Job descriptions (take time to research the differences, tailor approach)
      • Market research (professional networking, connect dots between companies you’d consider)
    • Resume tips: Concise (focus on telling a story on the experiences to highlight outcomes)
      • Factual (if listing skills or strengths, use examples to support them)
      • Related experience (highlight specific projects related to area you’re applying)
    • Networking tips: Professional profile (be detailed but concise about the skills you use and experiences)
      • Targeted outreach (connect after a conference, target approach for conversation)
      • Conferences (meet/greet if possible, follow up via email, LinkedIn, twitter)
  • Gidi (Gideon) Nave (@gidin), Assistant Professor of Marketing (Marketing Matters)
    • Cambridge Analytica before Cambridge – music research and how it relates to certain traits
      • Extroversion and openness were 2 big ones that they could pull from 5 traits (MUSIC)
      • MUSIC: Unpretentious, Sophistication were 2 of them
    • Could pull personality cues from 20 second, unreleased clips based on scores of 1 to 7, also
      • More agreeable people had higher scores in general
    • Personality on 5 (OCEAN – Openness, Conscientiousness, Extroverted, Agreeable, Neuroticism)
      • Questioned whether they could use music to test for the personality (as opposed to the other direction)
      • Personality is established at a young age, so can music likes on Facebook give you a personality side – as mentioned, it did ~2 better than others
  • Fintech for Startups and Incumbents (a16z 4/7/2019)
    • With GP Alex Rampell (@arampell) of CEO/cofounder of TrialPay and partner Frank Chen
      tumblr_nkfx32192w1tq3551o1_640
    • Assembling a risk pool (good and okay drivers subsidizing the bad drivers, or healthcare – same)
      • No economic model for skipping a segment – psychology for half price insurance (say, going to gym)
      • Half the number of customers – taking the ‘good’ ones, profitable ones
      • Insurance has mandatory loss ratios for different industries
    • HealthIQ – mechanism for exploitation on ‘health’ – in FinTech, it was SoFi on HENRYs
      • Positive vs adverse selection – debt settlement company ads, for instance – negotiate on your behalf to settle
      • Healthier people as living longer than non-healthy people – left them more profitable for proving being better
        • Gives them good customers (adverse selection for ‘quick, no blood test, 1 min’)
    • SoFi as stealing customers from the normal distribution – better marketing message “you’re getting ripped off, come to us”
    • Branch as investment – collect as much data as possible and look for correlations – small, mini-loans
      • Induction as pattern is a willingness to pay (credit is remembered) – went and got data from your phone
      • How many apps did you have? Did you look like you went to work? Are you gambling?
        • Counterintuitive potentially: battery goes dead leaned default, gambling app meant more likely to pay, etc…
    • Earnin – phone in pocket for 8 hours, last paycheck and RTS data confirming – will give money that you have earned but don’t have yet
      • Can tip interest or not – can you encourage people for positive community and people driving safely
      • Nurture better behavior – helping to turn customers into correctly priced customer (vs bank that doesn’t want them)
    • Vouch as company that failed but your social network had to vouch for you – Person X is okay, so you can even put up $
    • Tiffany & Co for a long time was owned by Avon lady – but its brand was massive and one of most renown jewelers
      • Could make sense for acquiring more customers, though
    • Killing Geico – take 20% of customers but only take the good ones
      • Selling negative gross widgets, for instance – probabilistic ones (and the bad ones aren’t needed)
    • Turndown traffic strategy – Chase turns down a lot of people for problems (can’t profitably do $400 loan, for instance)
      • Here’s a friend after they rejected them (but see traffic) – Chase will tell you to go to a startup for better underwriting
      • Amazon got right – HP book, for instance – had ad for B&N right next to Amazon (bought) – would make $1 on the ad click at 100% profit
        • Used this to reduce the price on their site and wasn’t sharing it
    • Rapid fire: “Always invest super early” – 9 weeks to decision vs 1 day – can’t get good deals at length
      • Best things aren’t cheap – they’re often expensive – better strategy can be plowing in late (“Can’t believe we’re putting this much $”)
      • Gating item for entrance into a space or into different models – cost of capital and distribution as often the unique thing
        • Geico could easily add additional traffic to start-ups
      • M&A strategy early? Encouraged and used Facebook – buy existential threat (surrender 1% of market cap to buy Instagram)
        • Facebook spent 7% of market cap for WhatsApp, Oculus, etc…
        • Buy the guys that failed trying – courage to build something new -> take them and put him in charge for person that was successful (at big co)
          • Trying to build this thing for ~10 years vs start-up that built something in 1 year (put this one in charge)
          • Ex: AmTrak buys Tesla – worse thing “You work for us” but you want products to push distribution and talent for understanding
        • Only difference was distribution and the possibility to do that
  • Jennifer Dulski, author of Purposeful, Head of Groups & Community at Facebook (Wharton XM)
    • Talked about their group initiatives at Facebook – communities policing themselves as well as methods to flag content
    • Mentioned example of having an employee that came up to her and asked if she had done a good job, she just wanted a bonus or something $
      • Taught her about incentives and why people do what they do / good to know the motivators
      • What drives people?
  • Anth Georgiades, CEO of Zumper (Bay Area Ventures, Wharton XM)
    z-pm

    • Purchase / hire of Padmapper in 2016 that added quite a bit of Canada business (size of California, real estate-wise)
    • How to match both sides for a marketplace – suppliers vs customers
      • Chicken and egg – focus on one, improve other and repeat
  • Word2vec (Data Skeptic 2/1/2019)
    • Produces word embeddings – autoencoders as NN for something like compression to retrieve output successfully
      • m down to n via mathematical representation (m < n)
      • Language compression for vector rep
    • Running the algorithm training on Google’s full internet, Facebook’s news article, Wikipedia, etc… to achieve similar words/spaces
      • Not super adaptive – nonsense place for words it hasn’t seen
    • Real world application – king for word2vec and subtract male – then add in female and you get queen
      • 300 dimensional space, semantics of that example
      • Bad example: training on entirety of internet results in something like doctor – male + female = nurse (gender neutral data)
    • Feature engineering for bag of words, good example for transfer learning, also (train model on text and then use parts of it on smaller area)
      • Very large corpora for NLP but can use pre-trained models of word2vec and use it in other models
  • Sean Law (@seanmylaw), D/S Research and Dev at TD Ameritrade (DataFramed #59 4/1/2019)
    sean_law_quote_card_3

    • Colleagues thinking he tends to ask lots of interesting, hard ?s – hopefully with answers
    • If he’s a hard worker, then he’ll do great – being in industry for 3 months time – has to juggle effective time spend
    • Molecular dynamics is short time scale and lots of computing power – parallel computing before and now the growth / usage of GPUs within days
    • Hypothetical example for alternative data solutions – driving to work and listening to NPR where NASA had a new dataset that was sat imagery
      • Pollution ORA dataset for air quality – area of high commodity necessity with pollution joining
    • If building ML as a binary classifier – but don’t know where the data is (do we have to collect? 3rd party API? Internal?)
      • How much effort to get it usable in the pipeline? Then, what’s the reasonable accuracy level – better than 50-50?
      • Some signal in the noise
    • Exploring chat/voice – query account balance, stock price, news articles via Alexa/Echo
      • Headless / device-agnostic option – audio to parsing of text, understanding what customer wants (NLP) and then what it means
      • Following PoC and into production
      • PoCs can miss: scalability (unless claim is to get scalability), model accuracy (not best model immediately), real-world applications (use case in mind)
    • Interpretability standpoint – regularization, L1 and linear – constraining coefficient can be very useful (background noise from video, for instance)
      • Time-series pattern-matching as non-traditional
    • Calls to action – data failures of things that didn’t work or negative results

Comments»

1. Order cbd oil oral spray - May 20, 2019

CBD Oil Oral Spray

Data Science in Your Business (Notes from Week of April 15


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: